Upgrade to Pro — share decks privately, control downloads, hide ads and more …

1BRC--Nerd Sniping the Java Community

1BRC--Nerd Sniping the Java Community

Your mission, should you decide to accept it, is the following: aggregate temperature values from a CSV file and group them by weather station name. There’s only one caveat: the file has one 1,000,000,000 rows!

This is the task of the “One Billion Row Challenge” which went viral within the Java community earlier this year. Come and join me for this talk where I’ll dive into some of the tricks employed by the fastest solutions for processing the challenge’s 13 GB input file within less than two seconds. Parallelization and efficient memory access, optimized parsing routines using SIMD and SWAR, as well as custom map implementations are just some of the topics which we are going to discuss.

I will also share some of the personal experiences and learnings which I made while running this challenge for and with the community.

Gunnar Morling

April 09, 2024
Tweet

More Decks by Gunnar Morling

Other Decks in Programming

Transcript

  1. #1BRC | @gunnarmorling • Software engineer at Decodable • Former

    project lead of Debezium • kcctl 🧸, JfrUnit, ModiTect, MapStruct • Java Champion • 1⃣ 🐝 🏎 Gunnar Morling
  2. #1BRC | @gunnarmorling The Goals Learn something new Have some

    fun along the way Inspire others to do the same
  3. #1BRC | @gunnarmorling The Rules • Java-only; Version of your

    choice • No dependencies • No caching • 10K stations, -99.9°C - +99.9°C • Copying allowed
  4. #1BRC | @gunnarmorling Evaluation Environment • 32 core AMD EPYC™

    7502P (Zen2), 8 Cores used • 128 GB RAM • File on RAM disk • Five runs, slowest and fastest discarded Image © Joe Haupt https://flic.kr/p/2mG3vWA (CC BY-SA 2.0)
  5. #1BRC | @gunnarmorling JEP 454 Foreign Function & Memory API

    https://openjdk.org/jeps/454 Introduce an API by which Java programs can interoperate with code and data outside of the Java runtime. By efficiently invoking foreign functions (i.e., code outside the JVM), and by safely accessing foreign memory (i.e., memory not managed by the JVM), the API enables Java programs to call native libraries and process native data without the brittleness and danger of JNI.
  6. #1BRC | @gunnarmorling Parsing – SWAR SIMD Within a Register

    https://richardstartin.github.io/posts/finding-bytes
  7. #1BRC | @gunnarmorling Parsing – SIMD Single Instruction, Multiple Data

    https://speakerdeck.com/gunnarmorling/to-the-moon-and-beyond-with-java-17-apis
  8. #1BRC | @gunnarmorling Further Tricks & Techniques • GraalVM as

    JIT Compiler • GraalVM native binary • The “spawn trick” • Unsafe • EpsilonGC • Super-scalar execution
  9. #1BRC | @gunnarmorling Parsing Perfect Hashing • Can’t use perfect

    hashing for keys… but for values! • Only 1,999 distinct values: -99.9, -99.8, ..., 0.0, ..., 99.8, 99.9
  10. #1BRC | @gunnarmorling Lessons Learned • Precise rules, covering corner

    cases • Automate, automate, automate • Build a community • Use dedicated hardware
  11. #1BRC | @gunnarmorling • Repository https://github.com/gunnarmorling/1brc/ • Show & Tell

    https://github.com/gunnarmorling/1brc/ discussions/categories/show-and-tell • The Billion Row Challenge–Step-by-step from 71s to 1.7s https://questdb.io/blog/billion-row-challenge-step-by-step/ Learn More
  12. #1BRC | @gunnarmorling Resources Relevant JEPs and more • Vector

    API (Incubator): https://openjdk.org/jeps/469 • Foreign Function and Memory API: https://openjdk.org/jeps/454 • CRaC: https://openjdk.org/projects/crac/