Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2023_OracleJavaDeveloperSummit.pdf

 2023_OracleJavaDeveloperSummit.pdf

This deck was created for Java Developers Summit Online 2023, hosted by Oracle Japan.
All contents in this deck are in English but I provide the presentation is in Japanese.

Akihiro Nishikawa

February 28, 2023
Tweet

More Decks by Akihiro Nishikawa

Other Decks in Technology

Transcript

  1. Who am I? { "name": "NISHIKAWA, Akihiro", "job": "Cloud Solution

    Architect@Microsoft", "love": [ "JVM", "GraalVM", "Joining technical conference (speaker as well as audience)" ], "expertise": [ "Application integration", "Container and Serverless solution" ] }
  2. Agenda ž Importance of startup time ž Startup procedure ž

    Options to improve startup time ž Future
  3. Situation has been changing All aspects are expected, of course

    J Startup Latency Throughput Footprint Remarks △ ◦à◎ ◦à◎ ◦ No need to worry startup in case of applications resided on application server. Requirements for serverless applications and autoscaling containers Characteristics of typical Java applications Startup Latency Throughput Footprint Remarks !!! !! !! !! In case of short-live applications, startup time is much more important.
  4. Serverless adoption 1. Node.js (62.9%) 2. Python (20.8%) 3. Go

    (6.4%) 4. Java (6.1%) 5. C# (3.8%) Source : The Future of Java by Mark Little – YouTube Devoxx UK 2022
  5. Startup What happens in starting Java applications ? JVM •

    Load and Initialize • Generate bytecode templates JVM • Load application classes • Initialize application classes • Application specific initialization JVM • Compile/deoptimize/recompile Application • Process specific workloads JVM Startup Application Startup Application Warmup 1st operation 2nd operation or later Fast Quick Long time
  6. Tiered compilation Introduced in Java 7 and enabled by default

    in Java 8. C1 (a.k.a. client compiler) ž Shorter time to compile ž Not so highly optimized ž Not so better throughput C2 (a.k.a. server compiler) ž Longer time to compile ž Highly optimized ž Better throughput
  7. Compilation Level 0 1 2 3 4 C1 Interpreter C2

    C1 full optimization (no profiling) C1 with invocation and back-edge counters C1 full profiling (level2 + MDO: MethodDataOop)
  8. Compilation Level 0 1 2 3 4 Normal path Delayed

    due to C2 capacity 0 1 2 3 4 Optimization is not valuable C1 Interpreter C2
  9. Default thresholds / JDK 17 $ java -XX:+PrintFlagsFinal -version |

    grep CompileThreshold intx CompileThreshold = 10000 {pd product} {default} double CompileThresholdScaling = 1.000000 {product} {default} uintx IncreaseFirstTierCompileThresholdAt = 50 {product} {default} intx Tier2CompileThreshold = 0 {product} {default} intx Tier3CompileThreshold = 2000 {product} {default} intx Tier4CompileThreshold = 15000 {product} {default} java version "17.0.6" 2023-01-17 LTS Java(TM) SE Runtime Environment (build 17.0.6+9-LTS-190) Java HotSpot(TM) 64-Bit Server VM (build 17.0.6+9-LTS-190, mixed mode, sharing) Note: Tiered compilation is enabled by default since Java 8 (You can disable it, of course). When tiered compilation is enabled, JVM does not use CompileThreshold parameter.
  10. Method compilation life cycle Run Interpreter C1 C2 Code cache

    Save C1 compiled code(s) Save C2 compiled code(s) Profiling Profiling >= Tier3CompileThreshold Deoptimization >= Tier4CompileThreshold Deoptimize compiled code Interpret and profile
  11. Startup time and performance - Fibonacci numbers // java Fib.java

    40 --> 40th number is 102,334,155 // -XX:+UseG1GC -Xmx2g -Xms2g -XX:+UseStringDeduplication public class Fib { public static void main(String... args) { long num = 0; if(args.length != 1) return; num = Long.valueOf(args[0]); System.out.printf("%d(st/nd/rd/th) >> %d¥n", num, fib(num)); } static long fib(long n) { if(n < 2) return n; return fib(n - 2) + fib(n - 1); } }
  12. Results run on Intel Core i7 2.80 GHz (4 core,

    Hyper-thread enabled) / 16GB RAM Compilation time (sec) Execution time (sec) Interpreter only N/A 22.890 C1 Only (no profiling in Interpreter) C1: 1.912 2.073 C2 Only (no profiling in C1) C2: 16.745 2.511 Tiered compilation (Interpreter > C1) C1: 0.248 0.756 Tiered compilation (Interpreter > C1 > C2) C1: 0.261 C2: 0.526 0.127
  13. Spec and configuration Hardware Intel Core i7 2.80 GHz (4

    core, Hyper-thread enabled) 16GB RAM OS Ubuntu 22.04 JDK 17 (17.0.6) GC : G1 Heap : Max/Min Heap: 2g Application framework Micronaut 3.8.5 Option +UseStringDeduplication Other options might be specified in each case. Measurement Run 100 times Average / Percentile (50, 90, 95, 99)
  14. Note -Xverify:none and –noverify ž Deprecated since JDK 13 and

    will be removed in the future release. [JDK-8218003] Release Note: Deprecated Java Options -Xverify:none and -noverify - Java Bug System (openjdk.org) For users who need to run without startup verification ž AppCDS allows you to archive their classes. The classes are verified during archiving and avoid verification at runtime.
  15. 1) Custom JRE Reduce the number of classes to be

    loaded. jdeps jlink jlink --compress=2 ¥ --module-path $JAVA_HOME/jmods ¥ --add-modules ¥ java.base, java.compiler,¥ java.desktop, java.management,¥ java.naming, java.sql, ¥ java.xml, jdk.unsupported ¥ --no-header-files ¥ --no-man-pages ¥ --output linked jdeps –R ¥ -cp "target/dependency/*" ¥ --print-module-deps ¥ --ignore-missing-deps ¥ --multi-release 17 ¥ target/App.jar # java.base,java.compiler, # java.desktop,java.management, # java.naming,java.sql,java.xml, # jdk.unsupported
  16. 1) Custom JRE Slightly improved, but not so better. 0

    0.2 0.4 0.6 0.8 1 1.2 Average 50 percentile 90 percentile 95 percentile 99 percentile Baseline Custom JRE
  17. 1) Custom JRE Benefits and drawbacks Benefits ž Startup time

    and memory footprint are improved since the number of classes to be loaded is decreased. Drawbacks ž A little bit efforts are required to create custom JRE. ž Multi-stage build to create container images ž Note that jdeps sometimes does not find dependency modules like jdk.crypto.ec.
  18. 2) CDS Archive Change the way to load classes ž

    CDS was introduced in 8u40. ž Default CDS (JEP 341 / JDK 12) ž App CDS (JEP 310 / JDK 10) ž Dynamic CDS (JEP 350 / JDK 13) ž Some distributions don't include default CDS archive. ž e.g. Microsoft Build of OpenJDK Release Notes for the Microsoft Build of OpenJDK | Microsoft Learn
  19. 2) CDS Archive Create and use an archive to run

    an application # Create Dynamic CDS archive at exiting application $ java -XX:ArchiveClassesAtExit=<Archive> -jar App.jar # Use the CDS archive with application $ java -XX:SharedArchiveFile=<Archive> -jar App.jar CDS archive and custom JRE can be used together.
  20. 2) CDS Archive Improved than custom JRE case 0 0.2

    0.4 0.6 0.8 1 1.2 Average 50 percentile 90 percentile 95 percentile 99 percentile Baseline Custom JRE CDS CDS+custom JRE
  21. 2) CDS Archive Benefits and drawbacks Benefits ž Improve time

    to load classes. ž Available in any platforms ž Dynamic CDS and custom JRE can be used together. Drawbacks ž As the application is updated, CDS archive should be created.
  22. 🤔 If not only application startup time but also throughput

    and latency are required, which option can we take?
  23. Use only C1 -XX:TieredStopAtLevel=1 JVM selects C2 by default when

    CPU on the platform is multi-core processors or using 64-bit VMs. If choosing to use only C1, ž There is no profiling overhead. ž You will get better performance than if profiling is enabled.
  24. Use only C1 Startup time 0 0.2 0.4 0.6 0.8

    1 1.2 Average 50 percentile 90 percentile 95 percentile 99 percentile Baseline Custom JRE CDS CDS+custom JRE C1 C1 + CDS + custom JRE
  25. Use only C1 Benefits and drawbacks Benefits ž Short-live applications

    can gain benefits. ž As no profiling occurs, startup time will be reduced. ž Custom JRE, CDS archive, and this can be used together. Drawbacks ž This setting is not useful for long running applications, since such applications should leverage generated code by C2, which is highly optimized.
  26. 🤔 If resources such as CPU and memory are quite

    restricted, which option can we take?
  27. 1) AOT compilation Resolve dependencies in advance and package into

    standalone executables. AOT Compilation (Ahead of time) ž JDK 9-17: experimental ž GraalVM Native Image ž OpenJ9 AOT, etc.
  28. 1) AOT compilation GraalVM Native Image Generic Micronaut Spring $

    mvn -Pnative spring-boot:build-image $ gradle bootBuildImage # Using Native Build Tools $ mvn -Pnative native:compile $ gradle nativeCompile $ native-image App.class $ native-image -jar App.jar $ mvn package -Dpackaging=native-image $ gradle nativeCompile
  29. 1) AOT compilation Startup time (GraalVM Enterprise 22.3.1 JDK 17

    / PGO is not used) 0 0.2 0.4 0.6 0.8 1 1.2 Average 50 percentile 90 percentile 95 percentile 99 percentile Baseline Custom JRE CDS CDS+custom JRE C1 C1 + CDS + custom JRE AOT
  30. 1) AOT compilation Benefits and drawbacks Benefits ž Start rapidly

    ž Lower memory footprint Drawbacks ž Hardware/Platform (CPU/OS) specific ž Long compilation time ž Generated executable is ž bigger than the original jar file. ž not suitable for long running (as of now). ž A little bit effort is required for reflection support.
  31. 2) Distributed JIT If JIT compilation is offloaded to other

    environment and generated codes returned from JIT compilation environment are used in runtime environment, would performance be improved? Distributed JIT ž OpenJ9 JITServer (Eclipse OpenJ9) ž IBM Semeru Runtime - Resources and Tools - IBM Developer - IBM Developer ž Azul Cloud Native Compiler ž Java Compilation in the Cloud | Cloud Native Compiler (azul.com)
  32. 2) Distributed JIT Concept JIT compilation runs in each JVM

    . VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers Java Application JVM JIT Compilation
  33. 2) Distributed JIT Concept JIT compilation runs in dedicated JVM

    instance. ž Each JVM instance communicates with the JIT JVM instance. VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers VM or Containers Java Application JVM JIT Compilation JIT Server or Cloud Native Compiler Service JIT Compilation Request compilation à ß Return generated codes
  34. 2) Distributed JIT Benefits and drawbacks Benefits ž Java applications

    could run on smaller resources. ž Especially useful for apps running on containers. ž JIT compilation might be faster (depending upon circumstances). Drawbacks ž Network latency ž Maybe not suitable for super short- live applications ž Not standardized yet ž OpenJ9 ž Azul Cloud Native Compiler
  35. 🤔 If special needs arise to gain high throughput at

    any expenses since the beginning, which action can we take?
  36. JIT Caching Ordinally, JIT compilation runs with profiling data collected

    in interpreter (and/or C1) phase. If we can take snapshots and persist them in storages to restore, C2 generated hot codes might provide high performance.
  37. JIT Caching JWarmup (Alibaba Dragonwell) ž JEP draft: JWarmup precompile

    java hot methods at application startup (openjdk.org) Azul ReadyNow! / Compile Stashing (Azul) ž ReadyNow!® - Azul | Better Java Performance, Superior Java Support ž Using Compile Stashing (azul.com) CRaC (Coordinated Restore at Checkpoint) ž Based on CRIU (Checkpoint/Restore In Userspace). ž Azul Provides the CRaC in AWS SnapStart Builds | Foojay.io (Java 11 based) Dynamic AOT / CRIU support (OpenJ9) ž Fast JVM startup with OpenJ9 CRIU Support – Eclipse OpenJ9 Blog
  38. JIT Caching How to take a snapshot (checkpoint) “CRaC implementation

    creates the checkpoint only if the whole Java instance state can be stored in the image. Resources like open files or sockets are cannot, so it is required to release them when checkpoint is made. CRaC emits notifications for an application to prepare for the checkpoint and return to operating state after restore.” https://github.com/CRaC/docs
  39. JIT Caching How to take a snapshot Examples: CRaC/docs (github.com)

    ž Tomcat / Spring boot ž Quarkus ž Micronaut # 1. Start the sample application in the checkpoint mode. $JAVA_HOME/bin/java -XX:CRaCCheckpointTo=<CheckPointFileDir> -jar App.jar # 2. After warm up, Request checkpoint (take a snapshot) jcmd App.jar JDK.checkpoint # 3. Restore the snapshot $JAVA_HOME/bin/java -XX:CRaCRestoreFrom=<CheckPointFileDir>
  40. JIT Caching Performance 0 0.2 0.4 0.6 0.8 1 1.2

    Average 50 percentile 90 percentile 95 percentile 99 percentile Baseline Custom JRE CDS CDS+custom JRE C1 + CDS + custom JRE AOT CRaC
  41. JIT Caching Benefits and drawbacks Benefits ž Well warmed up

    codes are available whenever the application starts. ž Startup time is almost the same as AOT case. Drawbacks ž Platform dependencies ž Not standardized yet ž Require persistent storage ž The same dependencies and environment between runs. ž Some efforts to capture checkpoint (Development framework would cover them in future…)
  42. Project Leyden openjdk.org/projects/leyden Goal ž Improve the startup time, time

    to peak performance, and footprint of Java programs. Focus ž Standardize AOT for Hotspot JVM ž Start native, but support and optimize dynamic stuff later
  43. Project Galahad proposed by Douglas Simon (Oracle Labs) Goal ž

    Java-related GraalVM tech and help to prepare the JDK community for potential incubation into the main release in the future. Focus ž Contributing the latest version of the GraalVM just-in-time (JIT) compiler and integrating it as an alternative to the existing JIT compiler of the HotSpot VM. ž Bring in the necessary ahead-of-time (AOT) compilation technology to make this new JIT compiler written in Java available instantly on JVM start. ž Galahad will pay close attention to Leyden and track the Leyden specification as it evolves.
  44. Key takeaways You have several options to run faster and

    improve performance! Several improvements help Java applications start faster. Several projects are on-going or being proposed.