Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Guerrilla Techniques for Robust Performance Eng...

Guerrilla Techniques for Robust Performance Engineering

The Guerrilla approach involves a set of techniques intended to overcome the lack of rigor in performance engineering by providing both students and professionals with a lingua franca that forces rigorous requirements to the surface. The base language comes from queueing theory because there is a 1-to-1 correspondence between the performance metrics that characterize queues and the performance metrics that characterize computer systems. Indeed, all computer systems, from your smartphone to Facebook.com, can be represented as a directed graph of queues.

Dr. Neil Gunther

May 02, 2025
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Technology

Transcript

  1. Guerrilla Techniques for Robust Performance Engineering Dr. Neil J. Gunther

    Performance Dynamics Research 5th Workshop on Education and Practice of Performance Engineering Toronto, Canada May 5, 2025 © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 1 / 18
  2. Going Guerrilla 1 Stanford Summer class attended c.1990 (Lazowska, Sevcik,

    Zahorjan: UW) 2 Stanford Summer class lecturer 1995 – 2001 3 Stanford cancelled Summer classes 2001 4 First private class in Crowne Plaza, Pleasanton, CA 2002 5 “Guerrilla Capacity Planning” from my offhand quip 2003 – 2020 6 Gaphorisms online: Guerrilla Aphorisms [5] 7 Guerrilla Capacity Planning (GCAP) book 2007 [6] 8 GCAP Online classes 2020 pandemic 9 GCAP Online classes 2020 – present Definition 1 (What is Guerrilla Capacity Planning?) “The planning horizon is now about 3 months (1 financial quarter), thanks to the gnomes on Wall Street. Only Guerrilla-style tactical planning is crazy enough to be compatible with that kind of insanity.” —NJG (2003) © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 2 / 18
  3. Data Comes from The Devil Why are we still doing

    TTY in the 21st century FFS? (Linus is the Devil’s handmaiden) Makes you think these are THE numbers but, all measurements are wrong. Should be able to click on a value to drill down and see the standard error CPU usage: 19.43% ± 0.97% user © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 3 / 18
  4. Models Come from God Figure 1: Charlton Heston laying down

    the laws of queueing theory Queueing theory books [1, 2, 10, 11, 14] are written by mathematicians for mathematicians. Led to the development of PDQ (Pretty Damn Quick) queueing analyzer software. The Universal Scalability Law [3, 9, 7, 12] is also based on queueing theory but, you don’t need to understand that to use it. © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 4 / 18
  5. Measurement Meets Model X(N) Measurements → C(N) System performance ←

    γ N 1 + α (N − 1) + β N(N − 1) USL model (1) Throughput data X(N) provides one definition of system capacity C(N) USL model provides another definition of throughput capacity C(N) [3, 9, 7, 12] Theorem (2008): C(N) must conform to the universal USL model The three Cs: 1 Concurrency (γ) — ideal parallelism, linear scaling 2 Contention (α) — queueing, buffering, Amdahl’s scaling 3 Coherency (β) — data/state exchange, messaging, memory paging © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 5 / 18
  6. Data Disasters Throughput (CPS) Vuser load Latency (ms) Vuser load

    Load-test data comparing the X and R performance of several http servers © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 6 / 18
  7. Java Geniuses From a Java performance book1 Fig.1a “isn’t scaling

    well” because response time is increasing “exponentially” 2 with increasing user load. Fig.1b “scales in a more desirable manner” because response time degradation is more gradual with increasing user load.3 1 S. Wilson and J. Kesselman, Java Platform Performance: Strategies and Tactics, Addison-Wesley (2000) 2 Wrong. It’s correctly scaling linearly. AKA the queueing theory hockey-stick. 3 If you can produce this kind of scaling in prod ... ship it! © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 7 / 18
  8. Queueing Cues N R Queueing theory dictates: Measured throughput pro-

    files X(N) must be CONCAVE wrt the load axis (N). N is the stimulus variable and X(N) is the response variable. N R Queueing theory dictates: Measured latency profiles R(N) must be CONVEX wrt the load axis (N). N is the stimulus variable and R(N) is the response variable. Any perf measurements that do not conform to these queueing rules ... are wrong! The more complex the test rig, the more likely the measurements will be wrong. Remember Chuck ... and Einstein: “If the data don’t fit the model, change the data.” © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 8 / 18
  9. Example 1: Production Data Sample from 24 hr dailys Ad

    nauseum time-series plot Plot X(t) vs. t hurts the brain Transform to steady-state coords Sample from 24 hr dailys Steady-state plot X(N) vs. N Time t is now impliciit But what is the message? © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 9 / 18
  10. Example 1: Performance model Throughput data must be concave wrt

    concurrent users (N) LOESS fit (green line) confirms that 100 < N < 500 Only the PDQ model shows the complete curve (blue dotted line) Throughout starts to saturate N ∼ 175 user processes Throughout maxes out at X ∼ 800 requests per second © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 10 / 18
  11. Example 2: Statistical Forecasting (Trending) Figure 2: Procurement forecast for

    spinning up more JVM servers Holt-Winters model [13] is the blue curve inside the circles. Blue line segment (inside the red funnel) is forecast average. Very different from a queueing model or the USL. Red funnel is 90% CI, yellow funnel is 95% CI. © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 11 / 18
  12. Example 3: AWS Application Performance Figure 3: PDQ model of

    Tomcat application on AWS [8] App is scaling as good as it possibly can (modulo statistical noise) Just from following AWS autoscaling guidelines No dramatic performance improvements But capacity planning cost reduction on $10 MM/yr AWS chargeback © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 12 / 18
  13. Example 4: GenAI LLM Efficient Compute Frontier Figure 4: GPT-3

    pre-training ECF across successively larger LLMs OpenAI GPT-3 is a multi-layer neural network with 150 BILLION connections (“parameters”) Guerrilla analysis of ECF [4] used a combination of queueing theory [2] and the USL [7] © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 13 / 18
  14. Example 4: ECF Error Loss in 3D Bistable queue Error

    loss transition All LLM training computations start in the upper metastable valley. Queue transitions from short length in stable upper valley to long length in stable lower valley, like a piece of string. Explains the common sigmoidal shape. Larger LLMs have deeper valleys that lie on the ECF bound. © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 14 / 18
  15. What is Guerrilla Perf Eng? Measurements: All data are wrong

    (by definition) Measurement is a process that inherently produces errors Standard error is the conventional quantification of errors How much error is acceptable? Models: All models are wrong (approximations) Data transformer from time-series to steady state plots Statistical models CAN only do trending on already measured data Queueing models can predict what CANNOT been measured Business: Need performance models to quantify ROI (e.g., AWS chargeback) Need performance models to predict procurement cycles Just ask the Finance Department Guerrilla Approach Measurements + Models = Information (need both Ms) © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 15 / 18
  16. References I [1] Arnold Allen. Probability, Statistics, and Queueing Theory

    with Computer Science Applications. 2nd. San Diego, CA: Academic Press, 1990. [2] Neil Gunther. Analyzing Computer System Performance Using Perl::PDQ. 2nd. Heidelberg, DE: Springer, 2011. [3] Neil Gunther. “Applying The Universal Scalability Law to Distributed Systems”. In: Distributed Systems Conference. Pune, India: Distributed Systems Meetup, 2019. URL: https://speakerdeck.com/drqz/applying-the-universal-scalability- law-to-distributed-systems. [4] Neil Gunther. “Does the Efficiency Compute Frontier Represent New Physics?” In: APS Global Physics Summit. Anaheim, CA: American Physical Society, 2025. URL: https://summit.aps.org. [5] Neil Gunther. Gaphorisms: Guerrilla Aphorisms. Performance Dynamics. Mar. 2021. URL: http://www.perfdynamics.com/Manifesto/gcaprules.html. [6] Neil Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Heidelberg, DE: Springer, 2007. [7] Neil Gunther. How to Quantify Scalability. Performance Dynamics. Feb. 2020. URL: http://www.perfdynamics.com/Manifesto/USLscalability.html. © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 16 / 18
  17. References II [8] Neil Gunther and Mohit Chawla. “Tomcat-Applikationsperformance in

    der Amazon-Cloud unter Linux modelliert”. In: Linux Magazin 08 (2019). English version: https://arxiv.org/pdf/1811.12341, pp. 38–49. [9] Neil Gunther, Paul Puglia, and Kristofer Tomasette. “Hadoop Superlinear Scalability: The perpetual motion of parallel performance”. In: Comm. ACM 58.4 (2015), pp. 46–55. DOI: 10.1145/2719919. [10] Mor Harchol-Balter. Performance Modelling and Design of Computer Systems: Queueing Theory in Action. Cambridge, UK: Cambridge University Press, 2013. [11] Peter Harrison and Naresh Patel. Performance Modelling of Communication Networks and Computer Architectures. Wokingham, UK: Addison-Wesley, 1993. [12] James Holtman and Neil Gunther. “Getting in the Zone for Successful Scalability”. In: International Conference of the Computer Measurement Group. December 7-12,Las Vegas, Nevada, USA: CMG Inc., 2008. URL: https://arxiv.org/abs/0809.2541. [13] Rob Hyndman and George Athanasopoulos et al. forecast: Forecasting Functions for Time Series and Linear Models. Comprehensive R Archive Network (CRAN). June 2024. URL: https://cran.r-project.org/web/packages/forecast/index.html. [14] Leonard Kleinrock. Queueing Systems. Vol. I: Theory. New York, NY: John Wiley, 1976. © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 17 / 18
  18. Questions? Thank you for attending www.perfdynamics.com Castro Valley, California Twitter

    twitter.com/DrQz LinkedIn Performance Dynamics Facebook Performance Dynamics Blog The Pith of Performance Training PerfDynamics.com/Classes Email [email protected] © 2025 Performance Dynamics Research Guerrilla Techniques for Robust Performance Engineering May 3, 2025 18 / 18