How IAS solved Web Service Timeouts caused by Java Garbage Collection Pauses

IAS Tech Blog
IAS Tech Blog
Published in
14 min readDec 10, 2020

--

by Bruce Rudolph

At IAS, we have Java-based web services that require a response as low as 10 to 20 milliseconds for several billion requests per hour. Therefore, having a deterministic time for Garbage Collection (GC) is essential to avoid service request/response timeouts.

Repurposing a Java-based Web Service

One such web service had an average response time of 50 ms with periodic spikes greater than 200 ms. The consumers of this application tolerated these response times. However, for a new set of web service consumers, the response time requirement is 20 ms and we needed to find the solution without a major redesign and implementation effort.

The application was built using JDK 8 and was launched using the Shenandoah GC, a low pause time garbage collector. For the repurposed web service the Shenandoah GC had an unacceptable number of stop-the-world (STW) pauses greater than 20 ms. There was no other option available in Java 8 to reduce these STW pauses further.

Research brought our attention to the ZGC, a new GC available in JDK 11. ZGC, with its lofty goals related to STW pauses among others piqued our interest. The goals of ZGC were:

  • GC pause times should not exceed 10 ms
  • Handle heaps ranging from a few hundreds of megabytes to many terabytes
  • No more than 15% application throughput reduction compared to using G1
  • Lay a foundation for future GC features and optimizations leveraging colored pointers and load barriers
  • Initially supported platform: Linux/x64

The goal of “GC pause times not exceeding 10 ms” was most important for our repurposed application. The application build was changed to use JDK 11 and the launch script was updated to specify ZGC as the garbage collector. Testing the new application build we noted that there were no STW pauses greater than 2 ms!

See the following section which compares the results of GC STW pauses between Java 8 and 11 under the same request load.

Java GC Tests Setup and Results

For one instance of our application, we used the following subset of Java 8 command-line arguments

java -server -Xms32g -Xmx32g -XX:+UseShenandoahGC

and in parallel in a separate instance, we used the following subset of Java 11 command-line arguments

java -server -Xms32g -Xmx32g -XX:+UnlockExperimentalVMOptions -XX:+UseZGC

The following chart depicts a comparison of the GC STW events between the two versions of Java running different garbage collectors. The detailed data (GC log events) for this comparison are provided in the subsections that follow.

During the monitored period, 1.69% of GC STW events in the application using Java 8 and Shenandoah GC would cause an application timeout, while 0% did so using Java 11 and ZGC.

Test with Java 8 and Shenandoah GC

Using the Java 8 GC logging arguments the following depicts the format that was analyzed.

OpenJDK 64-Bit Server VM (25.242-b08) for linux-amd64 JRE (1.8.0_242-b08), built on Jan 28 2020 14:28:22 by “mockbuild” with gcc 4.8.5 20150623 (Red Hat 4.8.5–39)Memory: 4k page, physical 64299320k(62995280k free), swap 0k(0k free)CommandLine flags: -XX:InitialHeapSize=34359738368 -XX:MaxHeapSize=34359738368 -XX:OnOutOfMemoryError=kill -9 %p -XX:+PrintGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseShenandoahGCHeuristics ergonomically sets -XX:+ExplicitGCInvokesConcurrentHeuristics ergonomically sets -XX:+ShenandoahImplicitGCInvokesConcurrent2020–10–07T17:41:39.231+0000: 0.087: Application time: 0.0425134 seconds2020–10–07T17:41:39.231+0000: 0.087: Total time for which application threads were stopped: 0.0001056 seconds, Stopping threads took: 0.0000373 seconds2020–10–07T17:41:39.234+0000: 0.090: Application time: 0.0029894 seconds2020–10–07T17:41:39.234+0000: 0.090: Total time for which application threads were stopped: 0.0000446 seconds, Stopping threads took: 0.0000290 seconds2020–10–07T17:41:39.399+0000: 0.256: Application time: 0.1653125 seconds2020–10–07T17:41:39.399+0000: 0.256: Total time for which application threads were stopped: 0.0002464 seconds, Stopping threads took: 0.0000552 seconds2020–10–07T17:41:39.522+0000: 0.379: Application time: 0.1230124 seconds

The logged events with “Total time for which application threads were stopped: “ contains the time the application is paused during a GC STW phase. Application running time events are marked with “Application time:” and GC concurrent operations are marked with the word “Concurrent.” Also, note the “stop” and “Application” times are shown in seconds.

After this test, the search of the GC log file shows how many STW events there were within a specific time range. For example, the first command searched for any pauses that were at least 1 second or greater. There were none as expected.

The second search shows there were 7,617 application pause events less than 1 second.

The subsequent searches looked for decreasing ranges of pause events starting with any > 900 ms, then > 800 ms, and so on down to > 40 ms which had 8 occurrences which are shown. These 8 events are more than double the response threshold of 20 ms and thus would generate a timeout in the consumer applications.

[Java8 logs]# grep “Total time for which application threads were stopped: 1” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0” gc.log | wc -l7617[Java8 logs]# grep “Total time for which application threads were stopped: 0.9” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.8” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.7” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.6” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.5” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.4” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.3” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.2” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.1” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.09” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.08” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.07” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.06” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.05” gc.log | wc -l0[Java8 logs]# grep “Total time for which application threads were stopped: 0.04” gc.log | wc -l8[Java8 logs]# grep “Total time for which application threads were stopped: 0.04” gc.log2020–10–07T17:44:36.035+0000: 176.891: Total time for which application threads were stopped: 0.0469193 seconds, Stopping threads took: 0.0000589 seconds2020–10–07T17:46:01.318+0000: 262.174: Total time for which application threads were stopped: 0.0436874 seconds, Stopping threads took: 0.0001132 seconds2020–10–07T17:47:17.174+0000: 338.030: Total time for which application threads were stopped: 0.0424398 seconds, Stopping threads took: 0.0000824 seconds2020–10–07T17:48:42.601+0000: 423.457: Total time for which application threads were stopped: 0.0424536 seconds, Stopping threads took: 0.0000932 seconds2020–10–07T17:50:10.757+0000: 511.614: Total time for which application threads were stopped: 0.0411949 seconds, Stopping threads took: 0.0001364 seconds2020–10–07T17:51:37.043+0000: 597.899: Total time for which application threads were stopped: 0.0422972 seconds, Stopping threads took: 0.0001120 seconds2020–10–07T17:53:01.443+0000: 682.300: Total time for which application threads were stopped: 0.0400846 seconds, Stopping threads took: 0.0001085 seconds2020–10–07T18:42:29.596+0000: 3650.452: Total time for which application threads were stopped: 0.0449612 seconds, Stopping threads took: 0.0000892 seconds

There were 99 GC STW pauses lasting 30–40 ms.

[Java8 logs]# grep “Total time for which application threads were stopped: 0.03” gc.log | wc -l99[Java8 logs]# grep “Total time for which application threads were stopped: 0.03” gc.log2020–10–07T17:54:23.682+0000: 764.538: Total time for which application threads were stopped: 0.0391524 seconds, Stopping threads took: 0.0001125 seconds2020–10–07T17:55:36.246+0000: 837.103: Total time for which application threads were stopped: 0.0385141 seconds, Stopping threads took: 0.0001667 seconds2020–10–07T17:56:44.562+0000: 905.419: Total time for which application threads were stopped: 0.0374047 seconds, Stopping threads took: 0.0001078 seconds2020–10–07T17:57:54.683+0000: 975.540: Total time for which application threads were stopped: 0.0352844 seconds, Stopping threads took: 0.0001039 seconds2020–10–07T17:58:54.540+0000: 1035.397: Total time for which application threads were stopped: 0.0337969 seconds, Stopping threads took: 0.0001439 seconds2020–10–07T17:59:58.729+0000: 1099.585: Total time for which application threads were stopped: 0.0339610 seconds, Stopping threads took: 0.0001214 seconds2020–10–07T18:01:05.175+0000: 1166.031: Total time for which application threads were stopped: 0.0309555 seconds, Stopping threads took: 0.0000959 seconds2020–10–07T18:02:05.209+0000: 1226.066: Total time for which application threads were stopped: 0.0341115 seconds, Stopping threads took: 0.0001090 seconds2020–10–07T18:03:21.695+0000: 1302.552: Total time for which application threads were stopped: 0.0351645 seconds, Stopping threads took: 0.0000975 seconds2020–10–07T18:04:29.755+0000: 1370.612: Total time for which application threads were stopped: 0.0336928 seconds, Stopping threads took: 0.0001269 seconds2020–10–07T18:05:38.805+0000: 1439.662: Total time for which application threads were stopped: 0.0326981 seconds, Stopping threads took: 0.0001142 seconds2020–10–07T18:06:39.818+0000: 1500.674: Total time for which application threads were stopped: 0.0322423 seconds, Stopping threads took: 0.0001355 seconds

There were 22 GC STW pauses lasting 20–30 ms.

[Java8 logs]# grep “Total time for which application threads were stopped: 0.02” gc.log | wc -l22[Java8 logs]# grep “Total time for which application threads were stopped: 0.02” gc.log2020–10–07T17:42:58.238+0000: 79.095: Total time for which application threads were stopped: 0.0273037 seconds, Stopping threads took: 0.0001391 seconds2020–10–07T18:23:13.965+0000: 2494.821: Total time for which application threads were stopped: 0.0298962 seconds, Stopping threads took: 0.0005097 seconds2020–10–07T18:25:10.100+0000: 2610.957: Total time for which application threads were stopped: 0.0294772 seconds, Stopping threads took: 0.0000556 seconds2020–10–07T18:29:05.227+0000: 2846.083: Total time for which application threads were stopped: 0.0297417 seconds, Stopping threads took: 0.0001013 seconds2020–10–07T18:33:47.875+0000: 3128.731: Total time for which application threads were stopped: 0.0295710 seconds, Stopping threads took: 0.0001040 seconds2020–10–07T18:34:42.392+0000: 3183.249: Total time for which application threads were stopped: 0.0298937 seconds, Stopping threads took: 0.0001087 seconds2020–10–07T18:35:33.082+0000: 3233.938: Total time for which application threads were stopped: 0.0296991 seconds, Stopping threads took: 0.0001975 seconds2020–10–07T18:36:23.589+0000: 3284.446: Total time for which application threads were stopped: 0.0291975 seconds, Stopping threads took: 0.0001752 seconds2020–10–07T18:45:48.375+0000: 3849.231: Total time for which application threads were stopped: 0.0297771 seconds, Stopping threads took: 0.0001637 seconds2020–10–07T18:47:24.960+0000: 3945.817: Total time for which application threads were stopped: 0.0297673 seconds, Stopping threads took: 0.0000786 seconds2020–10–07T18:51:07.718+0000: 4168.574: Total time for which application threads were stopped: 0.0283744 seconds, Stopping threads took: 0.0001245 seconds2020–10–07T18:52:03.994+0000: 4224.850: Total time for which application threads were stopped: 0.0295066 seconds, Stopping threads took: 0.0001062 seconds2020–10–07T18:56:58.708+0000: 4519.564: Total time for which application threads were stopped: 0.0278329 seconds, Stopping threads took: 0.0001267 seconds2020–10–07T18:59:12.556+0000: 4653.413: Total time for which application threads were stopped: 0.0289844 seconds, Stopping threads took: 0.0001000 seconds2020–10–07T19:00:04.441+0000: 4705.297: Total time for which application threads were stopped: 0.0298543 seconds, Stopping threads took: 0.0001521 seconds2020–10–07T19:08:31.053+0000: 5211.910: Total time for which application threads were stopped: 0.0291812 seconds, Stopping threads took: 0.0001109 seconds2020–10–07T19:09:50.356+0000: 5291.213: Total time for which application threads were stopped: 0.0292514 seconds, Stopping threads took: 0.0001377 seconds2020–10–07T19:11:07.263+0000: 5368.120: Total time for which application threads were stopped: 0.0297125 seconds, Stopping threads took: 0.0001001 seconds2020–10–07T19:16:25.461+0000: 5686.317: Total time for which application threads were stopped: 0.0298023 seconds, Stopping threads took: 0.0001191 seconds2020–10–07T19:18:26.876+0000: 5807.732: Total time for which application threads were stopped: 0.0296275 seconds, Stopping threads took: 0.0000896 seconds2020–10–07T19:23:01.676+0000: 6082.533: Total time for which application threads were stopped: 0.0287367 seconds, Stopping threads took: 0.0000951 seconds2020–10–07T19:32:34.377+0000: 6655.233: Total time for which application threads were stopped: 0.0292421 seconds, Stopping threads took: 0.0000719 seconds

There were 3 GC STW pauses lasting 10–20 ms.

[Java8 logs]# grep “Total time for which application threads were stopped: 0.01” gc.log | wc -l3[Java8 logs]# grep “Total time for which application threads were stopped: 0.01” gc.log2020–10–07T17:42:25.430+0000: 46.286: Total time for which application threads were stopped: 0.0127827 seconds, Stopping threads took: 0.0000985 seconds2020–10–07T19:34:33.271+0000: 6774.127: Total time for which application threads were stopped: 0.0165908 seconds, Stopping threads took: 0.0001143 seconds2020–10–07T19:51:06.216+0000: 7767.073: Total time for which application threads were stopped: 0.0176901 seconds, Stopping threads took: 0.0001101 seconds

There were 7,485 GC STW pauses less than 10 ms.

[Java8 logs]# grep “Total time for which application threads were stopped: 0.00” gc.log | wc -l7485

Test with Java 11 and ZGC

Using the Java 11 GC logging arguments the following depicts the format that was analyzed.

[4.117s][info][gc] Using The Z Garbage Collector[5.013s][info][gc,phases] GC(0) Pause Mark Start 0.302ms[5.027s][info][gc,phases] GC(0) Concurrent Mark 14.261ms[5.027s][info][gc,phases] GC(0) Pause Mark End 0.474ms[5.028s][info][gc,phases] GC(0) Concurrent Process Non-Strong References 0.301ms[5.028s][info][gc,phases] GC(0) Concurrent Reset Relocation Set 0.000ms[5.028s][info][gc,phases] GC(0) Concurrent Destroy Detached Pages 0.000ms[5.033s][info][gc,phases] GC(0) Concurrent Select Relocation Set 4.840ms[5.034s][info][gc,phases] GC(0) Concurrent Prepare Relocation Set 1.295ms[5.035s][info][gc,phases] GC(0) Pause Relocate Start 0.702ms[5.043s][info][gc,phases] GC(0) Concurrent Relocate 8.303ms[5.043s][info][gc ] GC(0) Garbage Collection (Metadata GC Threshold) 96M(0%)->66M(0%)[7.662s][info][gc,phases] GC(1) Pause Mark Start 0.315ms[7.832s][info][gc,phases] GC(1) Concurrent Mark 169.920ms[7.833s][info][gc,phases] GC(1) Pause Mark End 0.317ms

This GC log contains events that identify the time of a STW pause event with the word “Pause.” Note that each GC phase is identified numerically beginning with “0” and the events in each phase are logged. Also note the events marked with “Concurrent” are events that occur in parallel to the running application. Also note the times are shown in milliseconds (ms).

After the test, searches on the GC log file showed how many STW events there were within a specific time range. For example, the first command searched for how many pause events occurred in the 2+ hours of the run. There were 1,287 STW pauses during the run.

The second command searched for any pauses that were between 9ms and 10ms. There were none. The subsequent searches looked for decreasing ranges of pause events starting with any > 8ms, then > 7ms, and so on down to > 1ms which had 3 occurrences which are shown.

The last two searches in the block confirm of the 1,287 STW pause events, 1,284 were < 1ms. The blocks of commands following this show ranges of STW pauses in the hundreds of microseconds.

[Java11 logs]# grep Pause zgc.log | wc -l1287[Java11 logs]# grep Pause zgc.log | grep “ 9” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 8” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 7” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 6” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 6” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 5” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 4” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 3” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 2” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 1” | wc -l3[Java11 logs]# grep Pause zgc.log | grep “ 1”[7.853s][info][gc,phases] GC(1) Pause Relocate Start 1.043ms[25.235s][info][gc,phases] GC(2) Pause Relocate Start 1.155ms[6514.784s][info][gc,phases] GC(366) Pause Relocate Start 1.030ms[Java11 logs]# grep Pause zgc.log | grep “ 0” | wc -l1284[Java11 logs]# grep Pause zgc.log | grep “ 0.” | wc -l1284

There were 3 GC STW pauses lasting 900µs to 1ms.

[Java11 logs]# grep Pause zgc.log | grep “ 0.9” | wc -l3[Java11 logs]# grep Pause zgc.log | grep “ 0.9”[1046.244s][info][gc,phases] GC(58) Pause Relocate Start 0.903ms[3621.265s][info][gc,phases] GC(207) Pause Relocate Start 0.966ms[6697.507s][info][gc,phases] GC(376) Pause Relocate Start 0.908msThere were 5 GC STW pauses lasting 800–900µs.[Java11 logs]# grep Pause zgc.log | grep “ 0.8” | wc -l5[Java11 logs]# grep Pause zgc.log | grep “ 0.8”[1097.881s][info][gc,phases] GC(61) Pause Relocate Start 0.837ms[1115.064s][info][gc,phases] GC(62) Pause Relocate Start 0.822ms[1237.376s][info][gc,phases] GC(69) Pause Relocate Start 0.809ms[2369.643s][info][gc,phases] GC(132) Pause Relocate Start 0.813ms[6801.682s][info][gc,phases] GC(382) Pause Relocate Start 0.829ms

There were 16 GC STW pauses lasting 700–800µs.

[Java11 logs]# grep Pause zgc.log | grep “ 0.7” | wc -l16[Java11 logs]# grep Pause zgc.log | grep “ 0.7”[5.035s][info][gc,phases] GC(0) Pause Relocate Start 0.702ms[46.638s][info][gc,phases] GC(3) Pause Relocate Start 0.799ms[1256.371s][info][gc,phases] GC(70) Pause Relocate Start 0.775ms[2387.045s][info][gc,phases] GC(133) Pause Relocate Start 0.751ms[2402.823s][info][gc,phases] GC(134) Pause Relocate Start 0.767ms[2445.715s][info][gc,phases] GC(136) Pause Relocate Start 0.748ms[3405.328s][info][gc,phases] GC(194) Pause Relocate Start 0.784ms[5518.563s][info][gc,phases] GC(309) Pause Relocate Start 0.773ms[5713.759s][info][gc,phases] GC(320) Pause Relocate Start 0.717ms[6533.887s][info][gc,phases] GC(367) Pause Relocate Start 0.710ms[6583.568s][info][gc,phases] GC(370) Pause Relocate Start 0.700ms[6601.162s][info][gc,phases] GC(371) Pause Relocate Start 0.783ms[6620.805s][info][gc,phases] GC(372) Pause Relocate Start 0.748ms[6714.114s][info][gc,phases] GC(377) Pause Relocate Start 0.774ms[6730.710s][info][gc,phases] GC(378) Pause Relocate Start 0.763ms[6783.680s][info][gc,phases] GC(381) Pause Relocate Start 0.772ms

There were 22 GC STW pauses lasting 600–700µs.

[Java11 logs]# grep Pause zgc.log | grep “ 0.6” | wc -l22[Java11 logs]# grep Pause zgc.log | grep “ 0.6”[67.125s][info][gc,phases] GC(4) Pause Relocate Start 0.617ms[1062.958s][info][gc,phases] GC(59) Pause Relocate Start 0.621ms[1080.441s][info][gc,phases] GC(60) Pause Relocate Start 0.632ms[1133.456s][info][gc,phases] GC(63) Pause Relocate Start 0.650ms[1149.796s][info][gc,phases] GC(64) Pause Relocate Start 0.666ms[1169.972s][info][gc,phases] GC(65) Pause Relocate Start 0.653ms[1187.300s][info][gc,phases] GC(66) Pause Relocate Start 0.660ms[1204.432s][info][gc,phases] GC(67) Pause Relocate Start 0.614ms[1222.323s][info][gc,phases] GC(68) Pause Relocate Start 0.641ms[2348.497s][info][gc,phases] GC(131) Pause Relocate Start 0.663ms[2423.037s][info][gc,phases] GC(135) Pause Relocate Start 0.648ms[4433.091s][info][gc,phases] GC(251) Pause Relocate Start 0.601ms[4719.269s][info][gc,phases] GC(263) Pause Relocate Start 0.652ms[4739.598s][info][gc,phases] GC(264) Pause Relocate Start 0.691ms[5503.683s][info][gc,phases] GC(308) Pause Relocate Start 0.659ms[6500.203s][info][gc,phases] GC(365) Pause Relocate Start 0.604ms[6549.093s][info][gc,phases] GC(368) Pause Relocate Start 0.646ms[6566.495s][info][gc,phases] GC(369) Pause Relocate Start 0.637ms[6638.025s][info][gc,phases] GC(373) Pause Relocate Start 0.652ms[6656.676s][info][gc,phases] GC(374) Pause Relocate Start 0.668ms[6748.935s][info][gc,phases] GC(379) Pause Relocate Start 0.630ms[6764.792s][info][gc,phases] GC(380) Pause Relocate Start 0.666ms

There were 30 GC STW pauses lasting 500–600µs.

[Java11 logs]# grep Pause zgc.log | grep “ 0.5” | wc -l30[Java11 logs]# grep Pause zgc.log | grep “ 0.5”[24.945s][info][gc,phases] GC(2) Pause Mark Start 0.518ms[85.044s][info][gc,phases] GC(5) Pause Mark Start 0.507ms[85.334s][info][gc,phases] GC(5) Pause Relocate Start 0.519ms[184.145s][info][gc,phases] GC(11) Pause Mark Start 0.521ms[202.144s][info][gc,phases] GC(12) Pause Mark Start 0.511ms[222.945s][info][gc,phases] GC(13) Pause Mark Start 0.500ms[305.944s][info][gc,phases] GC(17) Pause Mark Start 0.508ms[325.144s][info][gc,phases] GC(18) Pause Mark Start 0.513ms[4761.185s][info][gc,phases] GC(265) Pause Relocate Start 0.527ms[4777.873s][info][gc,phases] GC(266) Pause Relocate Start 0.519ms[4812.673s][info][gc,phases] GC(268) Pause Relocate Start 0.507ms[5245.744s][info][gc,phases] GC(292) Pause Mark Start 0.502ms[5878.044s][info][gc,phases] GC(329) Pause Relocate Start 0.522ms[5895.708s][info][gc,phases] GC(330) Pause Relocate Start 0.502ms[5930.864s][info][gc,phases] GC(332) Pause Relocate Start 0.540ms[5948.654s][info][gc,phases] GC(333) Pause Relocate Start 0.531ms[5982.279s][info][gc,phases] GC(335) Pause Relocate Start 0.540ms[5999.387s][info][gc,phases] GC(336) Pause Relocate Start 0.512ms[6016.142s][info][gc,phases] GC(337) Pause Relocate Start 0.533ms[6031.486s][info][gc,phases] GC(338) Pause Relocate Start 0.524ms[6126.815s][info][gc,phases] GC(344) Pause Relocate Start 0.521ms[6157.730s][info][gc,phases] GC(346) Pause Relocate Start 0.525ms[6227.411s][info][gc,phases] GC(350) Pause Relocate Start 0.514ms[6245.185s][info][gc,phases] GC(351) Pause Relocate Start 0.515ms[6263.586s][info][gc,phases] GC(352) Pause Relocate Start 0.543ms[6338.094s][info][gc,phases] GC(356) Pause Relocate Start 0.529ms[6414.293s][info][gc,phases] GC(360) Pause Relocate Start 0.549ms[6464.887s][info][gc,phases] GC(363) Pause Relocate Start 0.509ms[6548.344s][info][gc,phases] GC(368) Pause Mark Start 0.509ms[6678.676s][info][gc,phases] GC(375) Pause Relocate Start 0.588ms

The following search shows there were 501 STW pauses between 400µs and 500µs, 280 STW pauses between 300µs and 400µs, 0 STW pauses between 200µs and 300µs, 419 STW pauses between 100µs and 200µs and 8 STW pauses < 100µs.

There were 501 GC STW pauses lasting 400–500µs, 280 GC STW pauses lasting 300–400µs, 0 GC STW pauses lasting 300–400µs, 419 GC STW pauses lasting 100–200µs, 8 GC STW pauses less than 100µs.

[Java11 logs]# grep Pause zgc.log | grep “ 0.4” | wc -l501[Java11 logs]# grep Pause zgc.log | grep “ 0.3” | wc -l280[Java11 logs]# grep Pause zgc.log | grep “ 0.2” | wc -l0[Java11 logs]# grep Pause zgc.log | grep “ 0.1” | wc -l419[Java11 logs]# grep Pause zgc.log | grep “ 0.0” | wc -l8[Java11 logs]# grep Pause zgc.log | grep “ 0.0”[3965.658s][info][gc,phases] GC(225) Pause Mark End 0.099ms[4882.586s][info][gc,phases] GC(272) Pause Mark End 0.098ms[4935.494s][info][gc,phases] GC(275) Pause Mark End 0.099ms[5136.309s][info][gc,phases] GC(286) Pause Mark End 0.098ms[5552.639s][info][gc,phases] GC(311) Pause Mark End 0.097ms[5930.853s][info][gc,phases] GC(332) Pause Mark End 0.099ms[6801.674s][info][gc,phases] GC(382) Pause Mark End 0.099ms[6874.610s][info][gc,phases] GC(386) Pause Mark End 0.099ms

Conclusion

For Java to compete in the real-time application space, significant advancements were needed in the evolution of the language and one of these features is to offer a better low-latency Garbage Collector.

New response time requirements for an existing Java-based web service pressured the IAS project’s engineering team to reduce GC STW pauses with minimal rework. The solution was to upgrade the application to Java 11 and use ZGC as the garbage collector.

The results of the A/B test of using Java 8/Shenandoah GC versus Java 11/ZGC GC confirmed the latter solved the goal of reducing web service timeouts related to garbage collection STW pauses. Java 11 and the ZGC GC was the clear winner.

--

--