Shows how well-known Brooks' law (viz., too many cooks) is contained in the Universal Scalability Law (USL) for response time latency rather than conventional throughput scalability.
Michelson Award 2008 Performance Dynamics Hotsos Symposium March 8, 2011 SM c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 1 / 51
of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 2 / 51
to quantify scalability: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2010: “How to Quantify Oracle Database Scalability: Examples” c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 3 / 51
to quantify scalability: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2010: “How to Quantify Oracle Database Scalability: Examples” Equal bang for the buck: linear concurrency c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 3 / 51
to quantify scalability: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2010: “How to Quantify Oracle Database Scalability: Examples” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 3 / 51
to quantify scalability: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2010: “How to Quantify Oracle Database Scalability: Examples” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 3 / 51
to quantify scalability: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2010: “How to Quantify Oracle Database Scalability: Examples” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead Calculate scalability curve from performance measurements c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 3 / 51
C relative capacity function of N CN(α, β) = N 1 + α (N − 1) + β N(N − 1) (1) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 4 / 51
C relative capacity function of N CN(α, β) = N 1 + α (N − 1) + β N(N − 1) (1) Three Cs: c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 4 / 51
C relative capacity function of N CN(α, β) = N 1 + α (N − 1) + β N(N − 1) (1) Three Cs: 1 Concurrency c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 4 / 51
C relative capacity function of N CN(α, β) = N 1 + α (N − 1) + β N(N − 1) (1) Three Cs: 1 Concurrency 2 Contention (amount α < 1) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 4 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write Question: Why not? Hint: Multiple CPUs means multiple caches c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write Question: Why not? Hint: Multiple CPUs means multiple caches Answer: Your cache line is flagged “dirty” (out of date) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write Question: Why not? Hint: Multiple CPUs means multiple caches Answer: Your cache line is flagged “dirty” (out of date) Data in your copy of DB table in not the most recent c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write Question: Why not? Hint: Multiple CPUs means multiple caches Answer: Your cache line is flagged “dirty” (out of date) Data in your copy of DB table in not the most recent Must request latest copy before proceeding c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
running on a multiprocessor OLTP application: shared writable data An ORA process requests to update data in the row of a table Must wait for RDBMS lock ... Finally, process gets the ORA lock: permission to write ORA process starts executing on CPU but cannot write Question: Why not? Hint: Multiple CPUs means multiple caches Answer: Your cache line is flagged “dirty” (out of date) Data in your copy of DB table in not the most recent Must request latest copy before proceeding Wait longer for your data to become consistent, i.e., β > 0 c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 5 / 51
as throughput X ORA presents “waiting time” metrics, not throughputs Standard v$ tables AWR (Automatic Workload Repository) OWI (Oracle Wait Interface) Will see how to get there via round-trip times : RTT = N X (2) from queueing theory Eqn.(2) is equivalent to Little’s law: N = XR NOTE: R always inversely proportional to throughput X See my blog post Bandwidth vs Latency—The World is Curved c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 6 / 51
Throughput as a function of load X(N) is generally a concave function Response time as a function of load R(N) is generally a convex function c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 7 / 51
and Speedup Sources of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 8 / 51
1965 Original edition 1975 Theorem (Brooks) Adding more manpower to a late project just makes it later. c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 9 / 51
law Theorem (Amdahl) Parallel speedup is limited by the serial portion of the program. Example Pregnancy is a serial process. Only runs on a single (married?) processor. Minimum gestation is T1 9 mths No matter how many women (processors) are involved. More about Amdahl’s law shortly c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 11 / 51
Man Month Turns out Brook’s law contains components too. My November 4, 2007 blog post: Modeling the Mythical Man Month 3 components: c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 13 / 51
Man Month Turns out Brook’s law contains components too. My November 4, 2007 blog post: Modeling the Mythical Man Month 3 components: 1 Raw manpower to shorten the schedule c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 13 / 51
Man Month Turns out Brook’s law contains components too. My November 4, 2007 blog post: Modeling the Mythical Man Month 3 components: 1 Raw manpower to shorten the schedule 2 Round-table conferencing to share status c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 13 / 51
Man Month Turns out Brook’s law contains components too. My November 4, 2007 blog post: Modeling the Mythical Man Month 3 components: 1 Raw manpower to shorten the schedule 2 Round-table conferencing to share status 3 One-on-one conversations b/w experts and novices c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 13 / 51
T1 to execute on a uniprocessor Work T1 −→ → → → → → → Divide across N = 6 processors (people) as equal sub-tasks Conquer it by running all 6 sub-tasks simultaneously Parallel performance: T6 = T1 6 = 0.17T1 (i.e., 16.67% of N = 1 time) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 15 / 51
= T1 N 0 2 4 6 8 10 People 2 4 6 8 10 12 Months N = 1 person takes 10 months N = 10 people take 1 month No people take ∞t time c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 16 / 51
they’re doing to listen to 1 person’s instructions −→ → → → → → → → Suppose this happens just 25% of the time T1 (dark blue) Only remaining 75% of T1 can be executed in parallel by 6 people T6 = 3 4 T1 6 + 1 4 T1 = 3 8 T1 = 0.375T1 (i.e., only 38% of N1 ) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 18 / 51
= (1 − α) T1 N parallel portion + αT1 sequential portion Fixed delay due to meetings 0 2 4 6 8 10 People 2 4 6 8 10 Months For α = 0.25 and T1 = 10 months, best time will be αT1 = 2.5 months even with ∞t number of people! c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 19 / 51
pairwise combinations: N 2 ≡ N! (N − 2)! 2! = N(N − 1) 2 For N = 6 people 1 2 6(6 − 1) = 15 conversations Growing delay due to 1 on 1 mtgs 0 2 4 6 8 10 People 5 10 15 Months This is really Brooks’ law (grows linearly with N) TN = T1 N + βN(N − 1) T1 N c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 21 / 51
SN = T1 TN (3) Overhead for just round-table meetings: SN = T1 T1 N + α(N − 1) T1 N = 1 1 N + α N − 1 N which we can simplify further to c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 23 / 51
no 1-on-1 delays (i.e., β = 0) Rewrite (4) as T1 TN = 1 1 N + α N − 1 N Example In previous example α = 1/4 and N = 6 T6 = 1 6 T1 + 1 4 5 6 T1 = 3 8 T1 same as before Speedup is only S6 = 8 3 = 2.67 (cf. S6 = 6 linear) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 25 / 51
types of delay effects: SN = T1 T1 N + α(N − 1) T1 N + βN(N − 1) T1 N = 1 1 N + α N − 1 N + βN N − 1 N which we can simplify further as: SN = N 1 + α(N − 1) + βN(N − 1) (5) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 26 / 51
precisely the USL scalability law in eqn.(1) CN(α, β) = N 1 + α(N − 1) + βN(N − 1) 0 2 4 6 8 10 People 0.5 1.0 1.5 2.0 Output Handles worse scalability than Amdahl’s law, even if β < α ! c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 27 / 51
Speedup Sources of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 28 / 51
the Genius Bar Finite number of people in the store at any time Average rate at which products need fixing Z mths, yrs Average repair time at Genius Bar S mins, hrs Genius Bar is its own queue c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 29 / 51
Normalized synchronous throughput: XN = N RN + Z XN = N NS + Z XN X1 = N NS + Z 1 S + Z −1 c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 30 / 51
Repairman From roundtable meetings (Amdahl’s law): TN = αT1 + (1 − α) T1 N But T1 = S + Z = serial + parallel for N = 1 TN = α(S + Z) + (1 − α) S + Z N Magic identity: α = S S + Z produces this simplification TN = S + Z N Speedup SN = T1 TN = S + Z S + Z N = N(S + Z) NS + Z SN = N NS + Z (S + Z) Same as normalized synchronous throughput in Repairman c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 31 / 51
2002) Amdahl’s law is equivalent to the synchronous throughput bound in the machine repairman queueing model of a multiprocessor or multicore. Theorem (Gunther 2008) The USL is equivalent to the synchronous throughput bound in the machine repairman queueing model with linearly load-dependent service time. Corollary The USL subsumes Amdahl’s law when β = 0. Proof. See “A General Theory of Computational Scalability Based on Rational Functions” http://arxiv.org/abs/0808.1431 It’s all connected by queueing theory c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 32 / 51
and Speedup Sources of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 33 / 51
, . . . , XN ) on test rig: CN = XN X1 From USL regression in Excel we get: CN (α, β) = N 1 + α(N − 1) + βN(N − 1) Use this to calculate XN for any N value using: XN = CN (α, β) X1 Round-trip time eqn.(2) in repairman queue is given by: RTT = RN + Z = N XN Use this to determine RN and compare with response-time measurements c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 34 / 51
Sources of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 35 / 51
Controlled measurements using Benchmark Factory1 Use USLcalc.xls spreadsheet available from my web site (http://www.perfdynamics.com/) 1Peter Stalder, “How to Quantify Oracle Database Scalability: Part II”, Hotsos 2010 c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 36 / 51
USL throughput data and USL model (note concave shape) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 39 / 51
USL response-time data and USL model (note convex shape) c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 40 / 51
Sources of delay Productivity 3 Repairman Queueing Model 4 USL Response Time Scalability 5 Worked Examples Oracle Order-Entry Workload Oracle CRM Application 6 Wrap Up c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 47 / 51
RN from Little’s law: RTT = N/XN 3 Apply USL in Excel, R, Mathematica, etc. Cf. v$, AWR, OWI metrics at selected user load N USL contains Brooks’ law and Amdahl’s law (a good thing) USL is equivalent to synchronous repairman queueing model USL models worst case synchronous queueing Your app should scale better than USL If it doesn’t, change your data 2Gunther, Hotsos 2007 3Gunther and Stalder, Hotsos 2010 c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 48 / 51
VoltDB really as scalable as they claim?”, Feb 28, 2011 Applies USL model Quantifies scalabiity of no-SQL VoltDB Not response time analysis Not Oracle 2 D. Abercrombie: “Response time analysis based on AWR data”, Mar 5, 2011 Definitely Oracle Definitely response-time analysis Not USL ... (yet) 3 G. Hendriksen: “GAPP Improvements: a Method to Diagnose and Predict Performance in Complex Architectures”, Mar 7, 2011 Uses Oracle Data Mining (ODM) Definitely response-time analysis Not USL ... (yet) See my blog post: “Mine the GAPP” Details at 12 ... Hotsos 2012 c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 49 / 51
Chapters 4–6 Also check out: Special USL web page Guerrilla performance classes c 2017 Performance Dynamics Brooks, Cooks, and Response Time Scalability September 6, 2017 50 / 51