Guerrilla Techniques for Robust Performance Engineering

Guerrilla Techniques for Robust Performance Engineering Dr. Neil J. Gunther
Performance Dynamics Company 5th Workshop on Education and Practice of Performance Engineering Toronto, Canada May 5, 2025 © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 1 / 19

The Guerrilla Genesis 1 Stanford Summer school on queueing models
c.1990 2 Stanford Summer school became the lecturer 1995 – 2001 3 “Guerrilla Capacity Planning” — offhand quip by me 2000 4 Stanford cancelled all Summer schools 2001 5 First private GCAP class in hotel 2002 6 Gaphorisms online: Guerrilla Aphorisms [5] 7 Guerrilla Capacity Planning book 2007 [6] 8 C19 pandemic — first GCAP online classes 2020 9 All Guerrilla classes online 2020 – present Definition 1 (What is Guerrilla Capacity Planning?) “The planning horizon is now about 3 months (1 financial quarter), thanks to the gnomes on Wall Street. Only Guerrilla-style tactical planning is crazy enough to be compatible with that kind of insanity.” —NJG (2003) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 2 / 19

Data Comes from The Devil Why are we still doing
TTY in the 21st century FFS? (Linus is the Devil’s handmaiden) Makes you think these are THE numbers but, all measurements are wrong. Should be able to click on a value to drill down and see the standard error CPU usage: 19.43% ± 0.97% user © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 3 / 19

Models Come from God Figure 1: Charlton Heston laying down
the laws of queueing theory Queueing theory books [1, 2, 10, 11, 14] are written by mathematicians for mathematicians. Led to the development of PDQ (Pretty Damn Quick) queueing analyzer software. The Universal Scalability Law [3, 9, 7, 12] is also based on queueing theory but, you don’t need to understand that to use it. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 4 / 19

Performance Metrics Definition 2 There are only three performance metrics:
1 Time (the “zeroth” metric): T 2 Number (count, size, but no time): N 3 Rate (counts per unit time): N/T Everything else is a derived metric Example: IOPS = (Number of) IOs per second = N/T = Rate metric Question: Which metric is CPU % user? (see Slide 3) All performance metrics must boil down to Definition 2 Performance is primarily about Time (how fast) Capacity is primarily about Number (how big) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 5 / 19

Measurement Meets Model ( C T ) N Measurements →
X(N) System metric ← γ N 1 + α (N − 1) + β N(N − 1) USL model Completion rate C/T data is one definition of system throughput X(N) USL model is another definition of throughput X(N) [3, 9, 7, 12] Theorem (2008): X(N) must conform to the universal USL model The three Cs: 1 Concurrency (γ) — ideal parallelism, linear scaling 2 Contention (α) — queueing, buffering, Amdahl’s scaling 3 Coherency (β) — data/state exchange, messaging, memory paging © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 6 / 19

Data Disasters Throughput (CPS) Vuser load Latency (ms) Vuser load
Load-test data comparing the X and R performance of several http servers © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 7 / 19

Java Geniuses From a Java performance book1 Fig.1a “isn’t scaling
well” because response time is increasing “exponentially” 2 with increasing user load. Fig.1b “scales in a more desirable manner” because response time degradation is more gradual with increasing user load.3 1 S. Wilson and J. Kesselman, Java Platform Performance: Strategies and Tactics, Addison-Wesley (2000) 2 Wrong. It’s correctly scaling linearly. AKA the queueing theory hockey-stick. 3 If you can produce this kind of scaling in prod ... ship it! © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 8 / 19

Queueing Cues N X(N) Queueing theory dictates: Measured throughput pro-
files X(N) must be CONCAVE wrt the load axis (N). N is the stimulus variable and X(N) is the response variable. N R(N) Queueing theory dictates: Measured latency profiles R(N) must be CONVEX wrt the load axis (N). N is the stimulus variable and R(N) is the response variable. Any perf measurements that do not conform to these queueing rules ... are wrong! The more complex the test rig, the more likely the measurements will be wrong. Remember Chuck ... and Einstein: “If the data don’t fit the model, change the data.” © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 9 / 19

Example 1: Production Data Sample from 24 hr dailys Ad
nauseum time-series plot Plot X(t) vs. t hurts the brain Transform to steady-state coords Sample from 24 hr dailys Steady-state plot X(N) vs. N Time t is now impliciit But what is the message? © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 10 / 19

Example 1: Performance model Throughput data must be concave wrt
concurrent users (N) LOESS fit (green line) confirms that 100 < N < 500 Only the PDQ model shows the complete curve (blue dotted line) Throughout starts to saturate N ∼ 175 user processes Throughout maxes out at X ∼ 800 requests per second © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 11 / 19

Example 2: Statistical Forecasting (Trending) Figure 2: Procurement forecast for
spinning up more JVM servers Holt-Winters model [13] is the blue curve inside the circles. Blue line segment (inside the red funnel) is forecast average. Very different from a queueing model or the USL. Red funnel is 90% CI, yellow funnel is 95% CI. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 12 / 19

Example 3: AWS Application Performance Figure 3: PDQ model of
Tomcat application on AWS [8] App is scaling as good as it possibly can (modulo statistical noise) Just from following AWS autoscaling guidelines No dramatic performance improvements But capacity planning cost reduction on $10 MM/yr AWS chargeback © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 13 / 19

Example 4: GenAI LLM Efficient Compute Frontier Figure 4: GPT-3
pre-training ECF across successively larger LLMs OpenAI GPT-3 is a multi-layer neural network with 150 BILLION connections (“parameters”) Guerrilla analysis of ECF [4] used a combination of queueing theory [2] and the USL [7] © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 14 / 19

Example 4: ECF Error Loss in 3D Bistable queue Error
loss transition All LLM training computations start in the upper metastable valley. Queue transitions from short length in stable upper valley to long length in stable lower valley, like a piece of string. Explains the common sigmoidal shape. Larger LLMs have deeper valleys that lie on the ECF bound. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 15 / 19

What is Guerrilla Perf Eng? Measurements: All data are wrong
(by definition) Measurement is a process that inherently produces errors Standard error is the conventional quantification of errors How much error is acceptable? Models: All models are wrong (approximations) Data transformer from time-series to steady state plots Statistical models CAN only do trending on already measured data Queueing models can predict what CANNOT been measured Business: Need performance models to quantify ROI (e.g., AWS chargeback) Need performance models to predict procurement cycles Just ask the Finance Department Guerrilla Approach Measurements + Models = Information (need both Ms) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 16 / 19

References I [1] Arnold Allen. Probability, Statistics, and Queueing Theory
with Computer Science Applications. 2nd. San Diego, CA: Academic Press, 1990. [2] Neil Gunther. Analyzing Computer System Performance Using Perl::PDQ. 2nd. Heidelberg, DE: Springer, 2011. [3] Neil Gunther. “Applying The Universal Scalability Law to Distributed Systems”. In: Distributed Systems Conference. Pune, India: Distributed Systems Meetup, 2019. URL: https://speakerdeck.com/drqz/applying-the-universal-scalability- law-to-distributed-systems. [4] Neil Gunther. “Does the Efficiency Compute Frontier Represent New Physics?” In: APS Global Physics Summit. Anaheim, CA: American Physical Society, 2025. URL: https://summit.aps.org. [5] Neil Gunther. Gaphorisms: Guerrilla Aphorisms. Performance Dynamics. Mar. 2021. URL: http://www.perfdynamics.com/Manifesto/gcaprules.html. [6] Neil Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Heidelberg, DE: Springer, 2007. [7] Neil Gunther. How to Quantify Scalability. Performance Dynamics. Feb. 2020. URL: http://www.perfdynamics.com/Manifesto/USLscalability.html. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 17 / 19

References II [8] Neil Gunther and Mohit Chawla. “Tomcat-Applikationsperformance in
der Amazon-Cloud unter Linux modelliert”. In: Linux Magazin 08 (2019). English version: https://arxiv.org/pdf/1811.12341, pp. 38–49. [9] Neil Gunther, Paul Puglia, and Kristofer Tomasette. “Hadoop Superlinear Scalability: The perpetual motion of parallel performance”. In: Comm. ACM 58.4 (2015), pp. 46–55. DOI: 10.1145/2719919. [10] Mor Harchol-Balter. Performance Modelling and Design of Computer Systems: Queueing Theory in Action. Cambridge, UK: Cambridge University Press, 2013. [11] Peter Harrison and Naresh Patel. Performance Modelling of Communication Networks and Computer Architectures. Wokingham, UK: Addison-Wesley, 1993. [12] James Holtman and Neil Gunther. “Getting in the Zone for Successful Scalability”. In: International Conference of the Computer Measurement Group. December 7-12,Las Vegas, Nevada, USA: CMG Inc., 2008. URL: https://arxiv.org/abs/0809.2541. [13] Rob Hyndman and George Athanasopoulos et al. forecast: Forecasting Functions for Time Series and Linear Models. Comprehensive R Archive Network (CRAN). June 2024. URL: https://cran.r-project.org/web/packages/forecast/index.html. [14] Leonard Kleinrock. Queueing Systems. Vol. I: Theory. New York, NY: John Wiley, 1976. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 18 / 19

Questions? Thank you for attending www.perfdynamics.com Castro Valley, California Twitter
twitter.com/DrQz LinkedIn Performance Dynamics Facebook Performance Dynamics Blog The Pith of Performance Training PerfDynamics.com/Classes Email [email protected] © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 19 / 19

Guerrilla Techniques for Robust Performance Eng...

Guerrilla Techniques for Robust Performance Engineering

Dr. Neil Gunther

More Decks by Dr. Neil Gunther

Other Decks in Technology

Featured

Transcript

Guerrilla Techniques for Robust Performance Engineering Dr. Neil J. Gunther

The Guerrilla Genesis 1 Stanford Summer school on queueing models

Data Comes from The Devil Why are we still doing

Models Come from God Figure 1: Charlton Heston laying down

Performance Metrics Definition 2 There are only three performance metrics:

Measurement Meets Model ( C T ) N Measurements →

Data Disasters Throughput (CPS) Vuser load Latency (ms) Vuser load

Java Geniuses From a Java performance book1 Fig.1a “isn’t scaling

Queueing Cues N X(N) Queueing theory dictates: Measured throughput pro-

Example 1: Production Data Sample from 24 hr dailys Ad

Example 1: Performance model Throughput data must be concave wrt

Example 2: Statistical Forecasting (Trending) Figure 2: Procurement forecast for

Example 3: AWS Application Performance Figure 3: PDQ model of

Example 4: GenAI LLM Efficient Compute Frontier Figure 4: GPT-3

Example 4: ECF Error Loss in 3D Bistable queue Error

What is Guerrilla Perf Eng? Measurements: All data are wrong

References I [1] Arnold Allen. Probability, Statistics, and Queueing Theory

References II [8] Neil Gunther and Mohit Chawla. “Tomcat-Applikationsperformance in

Questions? Thank you for attending www.perfdynamics.com Castro Valley, California Twitter