Tokyo • Look back RubyKaigi with hands on! • Let's make the next RubyKaigi together Come to Cookpad booth for details https://cookpad.connpass.com/event/282436/
and a super-slow webapp • Contestants may request a 60-second benchmark during the contest • Scores and standings are decided based on benchmark results • The goal is to get the best score Benchmarker (scorer) Contestant VM (running a Ruby webapp) benchmark requests sent at an intensive rate for 60 seconds
• Sinatra • Puma • Nginx • MySQL • Implementations in other languages • Python, Go, Node.js, ... • Initial performance is designed to be not so different
to compete in ISUCON • How to track down and pro fi le slow code on the CRuby level • What future Rubies need to shine in ISUCON Wills Won't • Con fi guring Nginx, Linux, etc... • Monitoring on the system level • Itamae recipes we made for ISUCON
Kill N+1 SQL Queries • → user_ids.each {|id| query("select * from users where id = ?", id) } • → query("select * from users where id in (?)", user_ids) • Replace suboptimal algorithms • Utilize Server Resources (cpu, memory) to the last drop • Adding Puma threads/processes • Caching • Upgrading to ruby/ruby master • (Adding VMs and scaling VMs up are prohibited)
Kill N+1 SQL Queries • → user_ids.each {|id| query("select * from users where id = ?", id) } • → query("select * from users where id in (?)", user_ids) • Replace suboptimal algorithms • Utilize Server Resources (cpu, memory) to the last drop • Adding Puma threads/processes • Caching • Upgrading to ruby/ruby master • (Adding VMs and scaling VMs up are prohibited) But where should we start from?
it on site! binding.irb/pry in production • Debugging work fl ow: • Stop real benchmark requests • Write code in binding.irb/pry using real requests • Con fi rm it works • Copy to editor and save (Note: I don't do this at work - just a competition technique 😁)
Tracks everything, but huge performance impact • ruby-prof based on TracePoint • Sampling pro fi lers • Collects samples every 10-100ms, small performance impact • Stackprof (cpu, wall, memory) • based on rb_pro fi le_frames() API • rbspy (wall) • runs as a separate process and reads ruby memory (process_vm_readv(2))
fi le_frames() API • Returns the call stack that was running when rb_pro fi le_frames() was called • Stackprof calls rb_pro fi le_frames() on SIGPROF (cpu mode) or SIGALRM (wall mode) timers ti me a() a() a() a() b() b() c() 📸 rb_pro fi le_frames() Records call stack [a(), b()]
use the CPU at the same time • due to the Global VM Lock (GVL) • I/O ( fi le read/writes, network access, ...) can be performed in the background Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL Use CPU Do Do Use Wait Wait
rb_pro fi le_frames() returns information about the last active Thread (which had the GVL) • Threads doing I/O have low chances to be targeted • Statistics ( fl amegraphs) built from continuous rb_pro fi le_frames() calls may be not accurate, especially when many Threads are doing I/Os • (even in wall mode!) 💥 = Stack frame peeked by rb_pro fi le_frames() Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL 💥 = Stack frame peeked by rb_pro fi le_frames()
fi le_frames() API, a per-thread version of rb_pro fi le_frames() • Accepts VALUE thread as arg • https://github.com/ruby/ruby/ pull/7784 Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL 💥 = Stack frame peeked by rb_pro fi le_frames()
be active due to the GVL • CPU tasks and I/O can run simultaneously, but conditions are not always ideal • Resources are very limited in ISUCON; we want to squeeze everything out! 4 Ruby processes, 32 threads isn't enough to burn all CPU we want to see this (1 process, 8 threads in Go)
the GVL • Ok, Let's reduce Threads! • In the ISUCON webapp case, Puma is creating Threads • We can create more processes in place of Threads to keep the number of workers
is effective, but consumes more memory • Memory is precious in ISUCON as MySQL lives in the same VM . . . . . . More Processes (better cpu utilization but more memory) More Threads (lesser memory consumption but suboptimal cpu util)
event hooks were added in Ruby 3.2 • RUBY_INTERNAL_THREAD_EVENT_* • Shopify/gvltools, ivoanjo/gvl-tracing • I integrated this into pro fi ler results • Usable for tuning # of Puma threads? Try adding Puma threads Perfect! Check GVL wait Reduce threads, add processes Low enough Somewhat high
I/O for Ruby! • Sadly, we couldn't rewrite everything in 8 hours • Truf fl eRuby • ISUCON VMs didn't have enough memory for Truf fl eRuby 😔 • Arming Ractors for lesser GVL waits • Maybe my next challenge!
lightweight threads - this system makes high concurrency very easy • Go: Concurrency deeply embedded in the ecosystem • Contexts: Ability to cancel no longer needed MySQL queries • Timeouts: Same