Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling a Java Application @DevDays 2023

Profiling a Java Application @DevDays 2023

The challenge in performance tuning is tracing the root cause of the bottleneck. Optimizing by ‘guessing’ the cause (aka Shotgun Tuning) leads to messy code, wasted effort, and lots of frustration. Black- and Gray-box monitoring tools like Zipkin and Micrometer allow to easily zoom in to the problem, but point out where your time is lost in the actual code. This talk explores the capabilities of the Java Flight Recorder (JFR), a profiler built-in into your JVM that can help you locate issues to a finer level of detail than those detected by any other tools. Based on JFR recordings, we'll trace issues like connection/thread pool starvation, hot methods, repeated queries, TLAB issues, lock contention, and more, in a live-coding session.

Victor Rentea

May 24, 2023
Tweet

More Decks by Victor Rentea

Other Decks in Technology

Transcript

  1. Profiling your Java Applica2on [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro

    Code: https://github.com/victorrentea/performance-profiling.git Branch: DevDays_2023_05
  2. 👉 victorrentea.ro/training-offer 👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS) Java

    Champion, 17 years of code, code, code, code, code.... Consultant & Trainer: 5000 developers of 100+ EU companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Java Frameworks: Spring, Hibernate, WebFlux ⚡ Java Performance, Secure Coding 🔐 Conference Speaker – many educaGve talks on YouTube Founder of European SoMware CraMers (5800+ members) 🔥 Free monthly webinars, 1 hour a@er work 👉 victorrentea.ro/community Past events on my channel: youtube.com/vrentea Father of 👧👦, host of a 🐈, weekend gardener 🌼 @victorrentea h,ps://VictorRentea.ro
  3. 3 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    The response time of your endpoint is too slow 😱 🎉 Congratulations! 🎉 You made it to production, and you're taking serious load! Where to start?
  4. 4 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Log timestamps log.info("Before call"); !// 2023-05-23 00:01:50.000 !!... !// suspect code log.info("After call"); !// 2023-05-23 00:01:53.100 !!... Measuring Method Runtime - Traditional Style t1-t0 long t0 = currentTimeMillis(); List<CommentDto> comments = client.fetchComments(loanId); // suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));
  5. 5 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Micrometer @Timed !// an aspect wraps this method monitoring spent time List<CommentDto> fetchComments(Long id) { !!==> see code !// suspect code } then, GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400.0 è avg = 250 ms/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 method_timed_seconds_count{method="fetchComments",} 500.0 è avg = 50s / 100 = 500 ms/call (spike) time average[ms] 250 500
  6. 6 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Other ways to report metrics https://www.baeldung.com/micrometer // programatic alternative to aspects meterRegistry.timer("name").record(() -> { /*suspect code*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); // accumulator meterRegistry.counter("earnings").increment(purchaseDollars); // instant value meterRegistry.gauge("player-count", currentPlayers.size()); // eg. "95% took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)
  7. 7 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    §Micrometer automatically introspects framework internals - Exposing metrics for typical issues - You can enable more: Tomcat metrics, decorate executors, ... §This data is regularly polled/pushed to other tools - eg every 5s, Prometheus retrieves and stores this data §Using that data, (eg) Grafana can: - Display impressive charts - Apply custom aggregation formulas - Raise alerts to Ops - ... Some examples è Observing a System with Micrometer
  8. 8 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    § http_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 http_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connections_current_connections{...} 298 //<-98 requests waiting tomcat_threads_busy_threads{...} 200 // = default max size => worker thread pool starvation? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_after_gc_percent{...} 45.15 // >30% => increase max heap? § hikaricp_connections_acquire_seconds_sum{pool="HikariPool-1",} 80.37 // conn pool starvation? hikaricp_connections_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 // 99% missè useless cache or config issue cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 // + 1000 than 5 seconds ago => 😱 § my_own_metric xyz Example Micrometer Metrics available at /actuator/prometheus
  9. 10 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Monitoring Microservices is Harder! Performance problems are usually Distributed J 1st challenge: trace requests A B C D E
  10. 11 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    §TraceID is read from the incoming HTTP request/MQ message, or generated §TraceID is propagated to: - Method calls (via ThreadLocal) - Upstream operator in a reacDve chain (via ReactorContext in WebFlux) - Async methods submiLed to a Spring-managed *TaskExecutor §TraceID is sent as a header in any outgoing HTTP request / MQ message - ... sent via a Spring-managed bean: @Autowired RestTemplate, RabbitTemplate, ... §Every log line prints the TraceID (from MDC on thread) 👏 Distributed Request Tracing (Sleuth) https://www.baeldung.com/spring-cloud-sleuth-single-application A B <header> TraceId
  11. 12 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community

    §The TraceId is available for any processing we do §Every system reports the time spent working on a TraceId (= span) - (1% of spans are reported, for performance) §... to a central system (eg Zipkin) that builds a time chart of all systems example è Distributed Request Sampling
  12. 15 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    §The TraceId is available for any processing we do §Every system reports the Gme spent working on a TraceId (= span) - (1% of spans are reported, for performance) §... to a central system (eg Zipkin) that builds a Gme chart of all systems ⚠ Look out for anA-paCerns like: §Repeated API calls => avoid repeated calls / fetch in batches §Long chains of REST calls => avoid system-in-the-middle §Independent calls => parallelize §Long spans not waiGng for other systems => zoom in 🔎 Distributed Request Sampling
  13. 16 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community

    Profiling Blackbox: inter-system communication (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè)
  14. 17 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    How to find methods that take a long Dme? 💡 Idea: Instrument your code with a -javaagent and measure the execuDon of every executed Java method Naive idea: the added overhead makes this approach impracDcal
  15. 19 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Java Flight Recorder (JFR) Main features: §Regularly capture stack traces of all running threads every second §Record Internal JVM events about locks, files, sockets, GC, ... or your own §Heavily op9mized to extra-low overhead (<2%), usable in prod, for free since Java 11 Usage: §Local JVM => Start/stop from Java Mission Control (JMC) or IntelliJ ⬆ §Remote JVM => generate a .jfr file => download => analyze with JMC or IntelliJ First, enable JFR via -XX:+FlightRecorder, then start the recording: 1) From startup: -XX:StartFlightRecording=.... 2) Via command line: jcmd <PID> JFR.start 3) Via JMX 4) Via tools, eg Glowroot
  16. 27 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR records every second the stack traces of all running threads
  17. 28 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Flame Graphs - Explained
  18. 29 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Data is aggregated by line number (merged) Flame Graphs - Explained 2 samples (33%) 4 samples (66%)
  19. 30 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community

    Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Data is aggregated by line number (merged) Flame Graphs - Explained 2 samples (33%) 4 samples (66%) IN PRACTICE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ...
  20. 31 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community

    ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker http://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy installed locally localhost:55432 +3ms network latency JFR
  21. 33 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Glowroot.org §Extremely simple applica<on profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open hep://localhost:4000 §Monitors - Methods => Flame Graphs - SQLs - API calls - Typical boelenecks - Data per endpoint §Free - Not sure if produckon-ready, more for learning/debugging
  22. 37 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiDng for SQL server Connection Pool Starvation
  23. 38 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    False Split due to Just-In-Time Compiler (JIT) Op>miza>on Same method is called via 2 paths !! Because JIT optimized (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor for calling a method using reflection Fix: Warmup JVM (eg. re-run load test)
  24. 39 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Time-consuming Aspects By default, TransacDonInterceptor (@TransacDonal) acquires a connecDon before entering the annotated method :382 :388
  25. 40 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Lock ContenSon AutomaDc analysis by Java Mission Control of the recorded .jfr file: hAps://wiki.openjdk.org/display/jmc/Releases ß WaiDng to enter a synchronized method à (naEve C code is not recorded by JFR)
  26. 41 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    HashSet<Long> hashSet = ...; // .size() = N hashSet.remove(13); // = O(1) time = constant time List<Long> list = ...; // .size() = M hashSet.removeAll(list); // expected = O(M) x O(hashSet.remove()) = O(M) x O(1) I expected < 100 ms è if M > N => N * list.contains() = N * M: Unknown Library Details Actual = 30k x 29k iterations!! in Production 1800 ms!! 😱 😱 😱 😱
  27. 43 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20
  28. 44 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    JFR Profiling – Lessons Learned ✅ Can "see" inside unknown code / library ❌ Can't see into na>ve C code ⚠ 1 sample/second è requires load from tests or from prod ✅ Can be used in produc/on (low overhead) ⚠ May report wrong line numbers in JIT-op>mized code (CPU+++) ⚠ Tuning requires pa>ence, judgment, and library awareness ❌ Cannot follow async calls/reac>ve chains (swapping threads) è Structured Concurrency[JEP-428 in Java19+] makes forks explicit: hierarchical threads ✅ Can record Virtual Threads[JEP-444 in Java 19+]
  29. 46 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Conclusions §Zoom in on the issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If boWleneck is s>ll unclear, or you cannot use above tools ⬆, use the JFR profiler in prod or under load tests - Also: analyze the .jfr recording using JMC §Op>mize code & configura>on, measuring the benefit vs. risk/pain
  30. Profiling your Java Application [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro

    Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community