Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profile and benchmark every change - RubyKaigi ...

Avatar for osyoyu osyoyu
April 17, 2025

Profile and benchmark every change - RubyKaigi 2025

Avatar for osyoyu

osyoyu

April 17, 2025
Tweet

More Decks by osyoyu

Other Decks in Programming

Transcript

  1. Also a profiler author (osyoyu/pf2) Recent Pf2 updates: Changed sample

    format and reduced memory by 90% Expanded features for comparing profiles Rewritten core in C (from Rust) I also realized that profilers don't make programs fast Just like how debuggers don't find bugs
  2. Outline "Benchmark-Driven Development (BenchDD)" A walkthrough: Building Xinatra, a faster

    Sinatra The toolset for doing BenchDD Building a 100x fast Sinatra ⽇本語字幕です。英語より情報量が多いことはないです 䢀 JA mini-translation. You won't miss a thing by skipping this
  3. Xinatra: A faster Sinatra Zee-na-tra A drop-in replacement for Sinatra

    apps, 100x faster "Cinatra" ? C = 100 in Roman numerals was hard to distinguish in verbal comms, so "Xinatra" class MyApp < Xinatra::Base before do ... end get '/' do return "Hello, world!" end end 100倍速いSinatraを作ってみようと思ったんですよね
  4. Is it really 100x fast? It depends on the definition

    of "fast". The routing/handling logic is 100x fast = "Hello world" apps are 100x fast In real-world-ish benchmarks, it's about 1.02x fast ルーティングとハンドリングが100倍速いことをもって100倍といってます
  5. How to make it 100x faster? That's the main part

    ! I'm going to introduce A technique called "Benchmark-Driven Development" and a toolset to practice BenchDD. ベンチマーク駆動開発
  6. Benchmarking require 'benchmark/ips' Benchmark.ips do |x| x.report("sumup") { (1..100).inject(&:+) }

    end Warming up -------------------------------------- sumup 39.181k i/100ms Calculating ------------------------------------- sumup 392.296k (± 0.2%) i/s (2.55 μs/i) 1.998M in 5.093700s Run code and measure its time
  7. Write failing tests Make it pass Refactor TDD BenchDD Set

    a measurable perf goal Write code Measure & improve
  8. Working Broken Slow Fast It's hard to make slow code

    fast. Instead, write fast code from the beginning.
  9. Performance 101: Focus on bottlenecks! (?) If there's something significantly

    slow, you should work on that! Work on the algorithm/ architecture! ... very correct. ボトルネックをつぶせって⾔いますよね。そうだと思います。 the clear bottleneck
  10. Reality: There's not always something significant There's not always a

    "bottleneck" And still, your program needs to be faster ボトルネックがあるとは限らない no significant bottleneck
  11. It's hard to do performance afterwards Lots of slight slowdowns

    will impact performance as a whole However, those are hard to find since those are slight Even though they may be easy to fix "Not slow" != "Fast" チリツモでプログラムは遅くなる
  12. Know your performance numbers = (1..100).to_a # slow? numbers.select {

    it.even? }.first # numbers.find { it.even? } 10x faster! Take this impl
  13. Benchmarking "every single" change Run benchmarks as much as possible

    to catch slow code Maybe on every pull request? Or on every commit? Even more, every time you type? 1タイプするたびにベンチマークとってもいいですよ
  14. benchmarkkit I have created tools and frameworks to keep benchmarking

    "in the loop" Combined together forming BenchDD. ベンチマーク取るのを楽にするツールを作りました
  15. Building Xinatra with BenchDD Set a measurable perf goal Write

    code Measure & improve Everything starts from setting a measurable performance goal = Define what needs to be 100x fast Write an benchmark first!
  16. How many nanoseconds do we have? Sinatra Roda 100x Sinatra

    Empty Rack App ns/req (lower is better) 215 ns/req 350 ns/req 912 ns/req 35,000 ns/req class EmptyRackApp def call(env) return [200, "Hello world!", {}] end end
  17. How many nanoseconds do we have? Sinatra Roda 100x Sinatra

    Empty Rack App ns/req (lower is better) 215 ns/req 350 ns/req 912 ns/req 35,000 ns/req We need to fit Sinatra features in 135 ns/req 無の Rack app で 215 ns かかるので、残りの 135 ns で Sinatra の機能相当を動かさないといけないわけ Feature headroom = 135 ns
  18. Our starting point class Xinatra::Base def call(env) handler = do_routing(env)

    response = handler.call(env) return response end end How much time can we spend here?
  19. Know and benchmark your target Hono (Bun) Empty Rack app

    ns/req (lower is better) 215 ns/req 51 ns/req Honoのほうが空のRackアプリより速かった……。 I initially wanted to overtake Hono, a JavaScript web framework Full-featured Hono was faster than an empty Rack app...
  20. BenchDD main loop Anyways, we now know our goal Now

    it's time to write real code! vim → bench → vim → bench → ... Set a measurable perf goal Write code Measure & improve
  21. Starting implementation class Xinatra::Base def call(env) handler = do_routing(env) response

    = handler.call(env) return response end private def do_routing(env) if env['RACK_...'] return handler end end end
  22. Problem 1: not fun. Benchmark Benchmark Benchmark Benchmark Benchmark Benchmark

    Benchmark Benchmark Benchmark Edit code Edit code Edit code Edit code Edit code Edit code Edit code
  23. Benchmarking framework describe "routing" do setup do @router = TrieRouter.new

    @router.define("GET", "/foo", -> () { 'hello' }) end dataset "small" do { ["/hello", ...] } dataset "large" do { ... } scenario "trie" do data.each do |d| @router.match("GET", d) end end end
  24. Benchmark suite DSL + Editor integration I have created a

    benchmarking suite framework somewhat like RSpec With editor integration for easy running ベンチマーク⽤のRSpec⾵DSLとエディタ統合を作りました
  25. Benchmarking framework describe "routing" do setup do @router = TrieRouter.new

    @router.define("GET", "/foo", -> () { 'hello' }) end dataset "small" do { ["/hello", ...] } dataset "large" do { ... } scenario "trie" do data.each do |d| @router.match("GET", d) end end end
  26. Designing the workload Workloads should be (1) realistic/representative and (2)

    compact For Xinatra, I prepared multiple tiers of workloads Small: A generated set of requests (10k reqs) Large: Log collected from real Sinatra apps (100k reqs) ベンチマークのデータセットも複数作っていい
  27. Benchmarking tells us too less. What benchmarking tells us: time

    per iteration of current code What we need: How did the performance change? Why did the performance change?
  28. Benchmarking ❤ Profiling Explaining performance is exactly what profilers do

    I've added a new view to show performance di! between two revisions
  29. Di!erential flamegraphs # Improved $ Degraded from last run Benchmark

    1 (before) Benchmark 2 (after) Image from https://www.brendangregg.com/blog/2014-11-09/di!erential-flame-graphs.html
  30. Di"ng flamegraphs Profiling is automatically initiated for each benchmark Results

    are recorded in tmp/ The di! engine in Pf2 (profiler) generates di!erential flamegraphs
  31. Nurturing your bench suite Write a benchmark for each feature

    Routing Handling Before/after actions E2E Set a measurable perf goal Write code Measure & improve
  32. Sinatra 100x Sinatra Empty Rack App ns/req (lower is better)

    215 ns/req 350 ns/req 35,000 ns/req Reminder: We need to fit Sinatra features in 135 ns/req Feature headroom = 135 ns Ø∞± ns でいろんな機能を実装する必要があります
  33. Optimizing the significant: Routing Routing is the largest part in

    Xinatra = What routes.rb does in Rails Some algorithms come to mind... Trie-based routing Linear routing ルーティングが⼀番重いので、そこのアルゴリズムをちゃんとしておくのは当然のこと
  34. GET /admin/users/1 handler.call() Trie Routing handler = routes[:get] .dig(parts(req)) O(logN)

    Linear Routing handler = routes.find {|rt| rt.is_for?(req) } O(N) O(N) but Faster when lesser routes O(logN) and fast when more routes
  35. Choosing the routing strategy It's important to know the line

    where a simple O(N) loses to a complex O(logN) For 10-20 routes, linear routing was faster found by benchmarking! 270 ns/req (144x Sinatra) 数が少ないうちは O(N) のほうが O(logN) より速いことも
  36. Implementing other features & nitpicking We have implemented routing 215

    → 266 ns/req (84 ns headroom to go) Many features to implement params access, before/after actions, Cookies, ... Not bottlenecks, though challenging to fit in 84 ns 他の全機能を 84 ns/req に収めなきゃ
  37. Feature 1: params get '/search' do params #=> { "q"

    => "ruby" } end get '/search' do @params #=> { "q" => "ruby" } end Problem: Method calls are expensive % @ivar access is much faster! method 20-50 ns / call ivar 10 ns / access * incl. benchmarking overhead
  38. params() → @params @params is faster ... but it is

    mutable, and can do less work Xinatra supports both Users can gradually switch to @params and gain perf Lesson learned: Performance can influence API design. params() can't be faster than @params paramsを使って移⾏しつつ @params に切り替えることで速くできる
  39. Feature 2: The request object get '/search' do request #=>

    #<Sinatra::Request> request.env #=> {"RACK_*" => ... } end Can request be changed to gain performance?
  40. To implement Request#params ... Option 1: class class Request def

    env; ...; end def params; ...; end end Option 2: Data& Request = Data.define(:env, :params, ...) Option 3: Struct Request = Struct.new(:env, :params, ...) Data#params 87 ns / access Struct#path 89 ns / access Class#path 94 ns / access * incl. benchmarking overhead / w/YJIT
  41. Feature 3: before/after actions before do do_good_auth(params) end get '/'

    do # `before` implicitly called ... end Block is saved on app startup The block gets "called" on every request Can be used for authentication and other checks 認証とかに使える before actions
  42. Calling Procs (blocks) in Ruby Block#call 310 → 319 ns

    instance_exec 310 → 404 ns instance_eval 310 → 412 ns Fastest& but unusable since context changes Feature rich and faster version of _exec (...?) 412 ns (95x Sinatra) '
  43. Or just make it an actual method class Xinatra::Base def

    self.before(&block) @@befores_count += 1 define_method("before_#{@@befores_count}", &block) end def call(env) ... @@befores_count.times.do |i| self.send(:"before_#{i}") end end end Calling methods is faster than calling blocks + Allows YJIT ! 342 ns (118x Sinatra)
  44. Or make it static class Xinatra::Base def self.before(&block) define_method("before", &block)

    end def call(env) # eliminated #send __before end def __before; end # no-opstub end ⚠ Fast, but multiple befores cannot be defined in this version (breaking!) 270 ns/req (144x Sinatra)
  45. Feature 4: Rack::Session Session handling wasn't in the original benchmark

    set, so no numbers here, but Rack::Session usually consumes quite a lot of CPU Implement a equivalent in Rust ) Rack::Session 結構CPU使いがち
  46. And more, and more... Reducing Hash access Reducing object allocation

    Mutability is god! Reducing more and more # slow hash = {} hash[key] ||= [] # init hash[key] << something # faster hash = Hash.new { [] } hash << something その他チマチマと削っていく
  47. Wrapping up Building Xinatra was removing a ton of small

    debris Doing high performance was not "making it fast", but "not making it slow" Ten 10% slower code = 150% slower code チリを取るのが仕事でした
  48. Does this matter with me? Yes! In Ruby/Rails, CPU time

    is very precious ☹ "Databases are the bottleneck, Ruby code won't matter!" Rails isn't IO-bound That 1 ms could go far, especially when you scale
  49. Don't gacha! It's tempting to repeat benchmark commands until you

    get good results I did that a lot Instead of wasting time, do a statistical hypothesis test (p=0.05) ベンチガチャ引くより仮説検定しよう
  50. YJIT Enabling YJIT during benchmarking is important Keep environment close

    to prod! YJIT engages JITing for method called 30 times A short warmup period should su"ce
  51. Won't profiling a!ect benchmarking? Yes. You will see lower scores

    with the profiler enabled. That's okay as long is the overhead is consistent. プロファイリングで遅くなっても、⼀定の遅くなり具合ならいい
  52. Benchmarking in CI? Benchmarking in local is tedious! Why not

    run them in CI? Because CI envs are very unstable. Hyper Threading Neighbors Library updates Unstable Base CPU CIのマシンの性能は本当に不安定なのでダメ
  53. Do you now feel benchmarking? Some optimizations I covered today

    won't be easy to do after writing code Always run benchmarks when writing code and find them before git commit! git commit する前にベンチマークを取ろう!