Upgrade to Pro — share decks privately, control downloads, hide ads and more …

May the force be with your Java applications - ...

May the force be with your Java applications - they can start more rapidly and run faster! [Devoxx Morocco 2023]

"May the force be with your Java Applications"
- they can start more rapidly and run faster! -

These slide decks were used at Devoxx Morocco 2023.

Akihiro Nishikawa

October 12, 2023
Tweet

More Decks by Akihiro Nishikawa

Other Decks in Technology

Transcript

  1. May the force be with your Java applications - they

    can start more rapidly and run faster! NISHIKAWA, Akihiro (@logico_jp) Cloud Solution Architect Microsoft
  2. Who am I? { "name": "Akihiro Nishikawa", "country": "Japan", "favourites":

    [ "JVM", "GraalVM", "Azure" ], "expertise": [ "Application integration", "Container and Serverless" ] }
  3. Agenda  Why startup gets important  Startup procedure of

    Java Applications  Options to improve startup time  Future
  4. Situation changed I understand all aspects are important, of course

     Startup Latency Throughput Footprint Java applications (Run on App Servers)) △ ◎ ◎ ◦ Serverless applications and autoscaling containers ◎ ◎ ◎ ◎
  5. Serverless adoption Source : The Future of Java by Mark

    Little – YouTube [Devoxx UK 2022] 1. Node.js (62.9%) 2. Python (20.8%) 3. Go (6.4%) 4. Java (6.1%) 5. C# (3.8%)
  6. Startup What happens in starting Java applications? JVM • Load

    and Initialize • Generate bytecode templates JVM • Load application classes • Initialize application classes • Application specific initialization JVM • Compile/deoptimize/recompile Application • Process specific workloads JVM Startup Application Startup Application Warmup Fast Quick Long time
  7. Tiered compilation C1 (a.k.a. client compiler)  Shorter time to

    compile  Not so highly optimized  Not so better throughput C2 (a.k.a. server compiler)  Longer time to compile  Highly optimized  Better throughput
  8. Compilation Level C1 full optimization (no profiling) C1 with invocation

    and back-edge counters C1 full profiling (level2 + MDO: MethodDataOop) 0 1 2 3 4 C1 Interpreter C2
  9. Compilation Level 0 1 2 3 4 Normal path Delayed

    due to C2 capacity C1 Interpreter C2 0 1 2 3 4 Deoptimization
  10. Method compilation life cycle Run Interpreter C1 C2 Code cache

    Save C1 compiled code(s) Save C2 compiled code(s) Profiling Profiling Deoptimization Deoptimize compiled code Interpret and profile
  11. Thresholds static bool apply_scaled(const methodHandle& method, CompLevel cur_level, int i,

    int b, double scale) { double threshold_scaling; if (CompilerOracle::has_option_value(method, CompileCommand::CompileThresholdScaling, threshold_scaling)) { scale *= threshold_scaling; } switch(cur_level) { case CompLevel_none: case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale); default: return true; } } jdk/src/hotspot/share/compiler/compilationPolicy.cpp at jdk-21+35 · openjdk/jdk (github.com)
  12. Thresholds static bool apply_scaled(const methodHandle& method, CompLevel cur_level, int i,

    int b, double scale) { double threshold_scaling; if (CompilerOracle::has_option_value(method, CompileCommand::CompileThresholdScaling, threshold_scaling)) { scale *= threshold_scaling; } switch(cur_level) { case CompLevel_none: case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale); default: return true; } } jdk/src/hotspot/share/compiler/compilationPolicy.cpp at jdk-21+35 · openjdk/jdk (github.com) case CompLevel_limited_profile: return (i >= Tier3InvocationThreshold * scale) || (i >= Tier3MinInvocationThreshold * scale && i + b >= Tier3CompileThreshold * scale); case CompLevel_full_profile: return (i >= Tier4InvocationThreshold * scale) || (i >= Tier4MinInvocationThreshold * scale && i + b >= Tier4CompileThreshold * scale);
  13. Thresholds # of Executions > TierXInvocationThreshold * Scale # of

    Executions > TierXMinInvocationThreshold * Scale AND # of Executions + # of Iterations > TierXCompileThreshold * Scale OR Level 3 Level 4 TierXInvocationThreshold 200 5_000 TierXMinInvocationThreshold 100 600 TierXCompileThreshold 2_000 15_000
  14. $ java -XX:+PrintFlagsFinal -version | grep Threshold intx CompileThreshold =

    10000 {pd product} {default} double CompileThresholdScaling = 1.000000 {product} {default} double G1PeriodicGCSystemLoadThreshold = 0.000000 {manageable} {default} uintx G1SATBBufferEnqueueingThresholdPercent = 60 {product} {default} uintx IncreaseFirstTierCompileThresholdAt = 50 {product} {default} uintx InitialTenuringThreshold = 7 {product} {default} size_t LargePageHeapSizeThreshold = 134217728 {product} {default} uintx MaxTenuringThreshold = 15 {product} {default} size_t PretenureSizeThreshold = 0 {product} {default} uint StringDeduplicationAgeThreshold = 3 {product} {default} double SweeperThreshold = 15.000000 {product} {default} uintx ThresholdTolerance = 10 {product} {default} intx Tier2BackEdgeThreshold = 0 {product} {default} intx Tier2CompileThreshold = 0 {product} {default} intx Tier3BackEdgeThreshold = 60000 {product} {default} intx Tier3CompileThreshold = 2000 {product} {default} intx Tier3InvocationThreshold = 200 {product} {default} intx Tier3MinInvocationThreshold = 100 {product} {default} intx Tier4BackEdgeThreshold = 40000 {product} {default} intx Tier4CompileThreshold = 15000 {product} {default} intx Tier4InvocationThreshold = 5000 {product} {default} intx Tier4MinInvocationThreshold = 600 {product} {default} openjdk version "21" 2023-09-19 OpenJDK Runtime Environment (build 21+35-2513) OpenJDK 64-Bit Server VM (build 21+35-2513, mixed mode, sharing)
  15. Startup time and performance - Fibonacci numbers // java Fib.java

    45 --> 45th number is 1_134_903_170 // -XX:+UseG1GC -Xmx2g -Xms2g -XX:+UseStringDeduplication public class Fib { public static void main(String... args) { long num = 0; if(args.length != 1) return; num = Long.valueOf(args[0]); System.out.printf("%d(st/nd/rd/th) >> %d\n", num, fib(num)); } static long fib(long n) { if(n < 2) return n; return fib(n - 2) + fib(n - 1); } }
  16. Results (seconds) AMD 3rd EPYCTM 7763v (8 vcpus, 32 GiB

    memory) Java 17.0.8 Java 21 GraalVM 23.1 (Java 21) Compile Run Compile Run Compile Run Interpreter only N/A 250.315 N/A 143.185 N/A 143.750 C1 Only (no profiling in Interpreter) 0.390 4.597 0.381 5.183 0.163 4.945 C2 Only (no profiling in C1) 3.961 7.803 5.585 10.117 3.655 6.941 Tiered compilation (Interpreter  C1) 0.009 4.141 0.012 4.717 0.022 4.758 Tiered compilation (Interpreter  C1  C2) C1: 0.009 C2: 0.002 3.542 C1: 0.011 C2: 0.005 4.094 C1: 0.052 C2*: 0.053 3.254 (*) Regarding GraalVM, not C2 but JVMCI-native compiler is used.
  17. Custom JRE C1 only AOT compilation Warm up in advance

    Code cache Checkpoint CDS Archive JIT Centralization
  18. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

    100% 110% Base jlink CDS CDS Combined C1 Native CRaC Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average
  19. Benchmark environment Allocation per container vCore: 2 RAM: 4GiB JDK

    17 (17.0.8.1) GC : G1 Max heap : 75% allocation Application framework Micronaut 4.1.3 Option +UseStringDeduplication Other options might be specified in each case. Measurement Run 1000 times Average / Percentile (50, 90, 95, 99)
  20. Custom JRE Reduce the number of classes to be loaded.

    jdeps jdeps –R \ -cp "target/lib/*" \ --print-module-deps \ --ignore-missing-deps \ --multi-release 17 \ target/App.jar # java.base,java.compiler, # java.desktop,java.management, # java.naming,java.sql,java.xml, # jdk.unsupported jlink jlink --compress=2 \ --module-path $JAVA_HOME/jmods \ --add-modules \ java.base, java.compiler,\ java.desktop, java.management,\ java.naming, java.sql, \ java.xml, jdk.unsupported \ --no-header-files \ --no-man-pages \ --output linked Custom JRE
  21. Benefits and cautions Benefits  Startup time and memory footprint

    are improved since the number of classes to be loaded is decreased. Cautions  A little bit efforts are required to create custom JRE (e.g., Multi-stage build to create container image).  Note that jdeps sometimes does not find dependency modules like jdk.crypto.ec. Custom JRE
  22. CDS Archive Change the way to load classes  App

    CDS (JEP 310 / JDK 10)  Application Class Data Sharing (AppCDS) stores classes used by your applications in an archive file. (The java Command (oracle.com))  Default CDS (JEP 341 / JDK 12)  Created at the JDK build time by running -Xshare:dump, using G1 GC and 128M Java heap (Oracle JDK / Class Data Sharing (oracle.com))  Dynamic CDS (JEP 350 / JDK 13)  Dynamic CDS archive extends application class-data sharing (AppCDS) to allow dynamic archiving of classes when a Java application exits. (Class Data Sharing (oracle.com)) CDS Archive
  23. # Create Static CDS archive $java -Xshare:off \ -XX:DumpLoadedClassList=<ClassFileList> -jar

    app.jar $java -Xshare:dump -XX:SharedArchiveFile=<Archive> \ -XX:SharedClassListFile=<ClassFileList> # Create Dynamic CDS archive at exiting application $ java -XX:ArchiveClassesAtExit=<Archive> -jar app.jar # Use the CDS archive with application $ java -XX:SharedArchiveFile=<Archive> -jar app.jar # Create CDS Archive automatically (since JDK 19) $ java -XX:+AutoCreateSharedArchive \ –XX:SharedArchiveFile=<Archive> -jar app.jar CDS Archive Other options are found in The java Command (oracle.com)
  24. Result (Static CDS only) 75% 80% 85% 90% 95% 100%

    105% Base jlink CDS 99P 95P 90P 50P Average CDS Archive
  25. Result (Static & Dynamic CDS w/ training) 30% 40% 50%

    60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined 99P 95P 90P 50P Average CDS Archive
  26. Benefits and cautions Benefits  Improve time to load classes.

     Available in any platforms  Can coexist Dynamic CDS and static CDS.  Can use CDS archives with custom JRE. Cautions  As applications are updated, we have to recreate CDS archive. CDS Archive
  27. Note  Neither -Xverify:none nor –noverify is used  Deprecated

    since JDK 13 and will be removed in the future release. [JDK-8218003] Release Note: Deprecated Java Options -Xverify:none and -noverify - Java Bug System (openjdk.org)  For users who need to run without startup verification  AppCDS allows us to archive their classes. The classes are verified during archiving and avoid verification at runtime.
  28. Use only C1 without profiling -XX:TieredStopAtLevel=1  JVM selects C2

    by default when CPU on the platform is multi-core processors or using 64-bit VMs.  If using only C1,  There is no profiling overhead.  You will get better performance than when profiling is enabled. C1 only
  29. Result 30% 40% 50% 60% 70% 80% 90% 100% 110%

    Base jlink CDS CDS Combined C1 99P 95P 90P 50P Average C1 only
  30. Benefits and cautions Benefits  Short-live applications can gain benefits.

     As no profiling occurs, startup time will be reduced.  Custom JRE, CDS archive, and this can be used together. Cautions  This setting is not useful for long running applications, since such applications should leverage code generated by C2, which is highly optimized. C1 only
  31. AOT (Ahead of time) compilation  Resolve dependencies and compile

    codes at build time.  JDK 9-17: experimental (deprecated and removed)  JDK Support  GraalVM (Native Image)  Azul Zulu  OpenJ9, etc.  Development framework support  Micronaut  Spring Boot AOT compilation
  32. Generic Micronaut Spring Boot $ mvn -Pnative spring-boot:build-image $ gradle

    bootBuildImage # Using Native Build Tools $ mvn -Pnative native:compile $ gradle nativeCompile $ native-image App.class $ native-image -jar App.jar $ mvn package -Dpackaging=native-image $ gradle nativeCompile
  33. Result GraalVM Native Image 0% 10% 20% 30% 40% 50%

    60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average AOT compilation
  34. Result GraalVM Native Image 0% 10% 20% 30% 40% 50%

    60% 70% 80% 90% 100% 110% Base jlink CDS CDS Combined C1 Native 99P 95P 90P 50P Average AOT compilation 1.50% 1.55% 1.60% 1.65%
  35. Result AOT enabled Framework 0% 10% 20% 30% 40% 50%

    60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation
  36. Result AOT enabled Framework (Base) 0% 10% 20% 30% 40%

    50% 60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation 99.0% 99.2% 99.4% 99.6% 99.8% 100.0% 100.2% Base Base+framework AOT support
  37. Result AOT enabled Framework (Native) 0% 10% 20% 30% 40%

    50% 60% 70% 80% 90% 100% 110% Base Base+framework AOT support jlink CDS CDS Combined C1 Native Native+ framework AOT support 99P 95P 90P 50P Average AOT compilation 1.35% 1.40% 1.45% 1.50% 1.55% 1.60% 1.65% 1.70% Native Native+ framework AOT support
  38. Benefits and cautions Benefits  Applications can start rapidly. 

    Lower memory footprint and other advantages Cautions  AOT compilation support  Neither all application development frameworks nor all distributions support AOT.  Especially GraalVM Native Image,  Hardware/Platform (CPU/OS) specific  Long build time  As of now, generated executables are not suitable for long running.  A little bit effort is required for reflection support. AOT compilation
  39. 2) Centralized JIT  Offloading JIT compilation to other environment,

    which returns compiled codes to runtime environment (e.g., containers), to improve startup time of applications.  OpenJ9 JITServer (Eclipse OpenJ9)  JITServer technology - (eclipse.dev)  Azul Cloud Native Compiler  Java Compilation in the Cloud | Cloud Native Compiler (azul.com) JIT Centralization
  40. Concept  Ordinally JIT compilation runs in each JVM. VM

    or Containers VM or Containers VM or Containers VM or Containers VM or Containers VMs or Containers Java Application JVM JIT Compilation JIT Centralization
  41. Concept  JIT compilation runs in dedicated JVM instance. 

    Each JVM instance communicates with the JIT JVM instance. VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers VMs or Containers Java Application JVM JIT Compilation Dedicated JVM instance(s) for JIT compilation JIT Compilation Request compilation   Return generated codes JIT Centralization
  42. Benefits and cautions Benefits  Java applications could run on

    smaller resources.  Especially useful for apps running on containers.  Might allocate smaller CPU core and memory to each container  With caching compiled code in the dedicated JIT server instance, JIT compilation might be faster. Cautions  Network latency (recommends to use along with Kubernetes)  Might not be suitable for super short-live applications  Not all distributions are supported. JIT Centralization
  43. Use profiled data to warmup applications  JWarmup (Alibaba Dragonwell)

    JEP draft: JWarmup precompile java hot methods at application startup (openjdk.org)  Azul ReadyNow! (Azul) ReadyNow!® - Azul | Better Java Performance, Superior Java Support Train applications
  44. Benefits and cautions Benefits  No code change is required

    since characteristics of Java JIT compiler are leveraged to increase startup time. Cautions  Not all distributions are supported.  Depending upon distributions, how to provide/collect profile log/data is different. [ReadyNow!]  -XX:ProfileLogIn=<file>  -XX:ProfileLogOut=<file> [JWarmup]  -XX:CompilationWarmUpLogFile=<file> Train applications
  45. Use code cache  Compile Stashing (Azul) Using Compile Stashing

    (azul.com)  Dynamic AOT and Shared Class Cache (OpenJ9) AOT Compiler - (eclipse.dev) Introduction - (eclipse.dev) Use code cache
  46. Benefits and cautions Benefits  Reduce startup time, especially compilation

    time.  Code cache along with warm up feature might allow us to run applications faster and gain optimized codes. Cautions  Not all distributions are supported. Use code cache
  47. Use checkpoints  CRIU (Checpoint/Restore in Userspace) CRIU support -

    (eclipse.dev)  CRaC (Coordinate Restore at Checkpoint) Java on CRaC - Optimize JVM Start-Up | Azul Use checkpoint
  48. Please note that... "CRaC implementation creates the checkpoint only if

    the whole Java instance state can be stored in the image. Resources like open files or sockets are cannot, so it is required to release them when checkpoint is made. CRaC emits notifications for an application to prepare for the checkpoint and return to operating state after restore." https://github.com/CRaC/docs Use checkpoint
  49. CRaC # 1. Start an application in the checkpoint mode.

    $JAVA_HOME/bin/java \ -XX:CRaCCheckpointTo=<CheckPointFileDir> -jar App.jar # 2. After warm up, Request checkpoint jcmd App.jar JDK.checkpoint # 3. Restore the snapshot $JAVA_HOME/bin/java -XX:CRaCRestoreFrom=<CheckPointFileDir> Use checkpoint
  50. Result 0% 10% 20% 30% 40% 50% 60% 70% 80%

    90% 100% 110% Base jlink CDS CDS Combined C1 Native CRaC 99P 95P 90P 50P Average Use checkpoint
  51. Result 0% 10% 20% 30% 40% 50% 60% 70% 80%

    90% 100% 110% Base jlink CDS C1 CRaC Native 99P 95P 90P 50P Average Use checkpoint 0.00% 2.00% 4.00% 6.00% 8.00% Native CRaC
  52. Benefits and cautions Benefits  Work well for containers. 

    Startup time is quite short. Cautions  Strictly same dependencies and environment between executions is required.  Project is undergoing.  Privilege operation is required.  Some efforts to capture checkpoint (Automation is a key...) Use checkpoint
  53. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

    100% 110% Base jlink CDS CDS Combined C1 Native CRaC Startup time ratio (Base=100, smaller is better) 99P 95P 90P 50P Average
  54. Project Leyden openjdk.org/projects/leyden  Goal  Improve the startup time,

    time to peak performance, and footprint of Java programs.  Focus  Standardize AOT for Hotspot JVM  Start native, but support and optimize dynamic stuff later  Resources  Project Leyden - Capturing Lightning in a Bottle - YouTube  202308-Leyden-JVMLS.pdf (openjdk.org)  leyden-premain-petclinic-2023-09-12.pdf (openjdk.org)  Project Leyden By Brian Goetz - YouTube
  55. Concept: Shifting computation Using both the existing features and newly

    added ones Dynamic CDS Archive Cached Code Archive classes and heap snapshot Static CDS Archive training data pre-compiled machine code We can use these techniques now!
  56. Takeaways  We have several options to improve startup time.

     Updating Java version is also another option.  Several projects to improve startup time are now on-going.  It is the most important to choose the most suitable technique based on characteristics and requirements of applications.