Superlinear Speedup: The Perpetual Motion of Parallel Performance

Superlinear Speedup The Perpetual Motion of Parallel Performance Dr. Neil
Gunther Performance Dynamics Hotsos Symposium March 5, 2013 SM c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 1 / 64

Outline Quick review 20 years of USL scalability analysis Appearance
of “super linear” data starting c. 2010: Some users complain USL doesn’t work for superlinearity! But precious little correct data (e.g., none on Wikipedia) Likely to see more superlinearity in distributed systems Can’t just ignore it or people will abandon USL Super linear speedup described on Wikipedia (must be true) Add 3rd parameter to USL: To fit superlinear data Headache the size of an elephant April 2012 discovered stunningly simple result No modification to USL equation (Huh?) Ramifications for scalability analysis are quite profound Like perpetual motion, if it’s too good to be true... c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 2 / 64

Review of USL Outline 1 Review of USL 2 Application
of USL Memcache Varnish Postgres 3 Superlinearity Something for nothing Mathematica modeling Postgres 9.2FL superlinearity 4 Summary c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 3 / 64

Review of USL How to Quantify Scalability Previous USL presentations
at Hotsos: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead Calculate scalability curve from performance measurements c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL Also ended up in my books Chapters
6 and 14 Chapters 4–6 Also check out: Special USL web page Guerrilla perf and CaP classes c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 5 / 64

Review of USL Universal Scalability Law (USL) N virtual users
or processes provide load c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (0 < α < 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (0 < α < 1) 3 Coherency (0 < β < 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Concave shape of USL function Xdata(N) Xdata(1)
→ CN(α, β) = N 1 + α(N − 1) + βN(N − 1) 0 2 4 6 8 10 N 0.2 0.4 0.6 0.8 1.0 1.2 1.4 C Α,Β Handles scalability degradation (universal) Goal is to get rid of scalability maximum c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 7 / 64

Review of USL How do we determine α and β?
C(N) = N 1 + α (N − 1) + β N(N − 1) Gene Amdahl (1967): brute force measurement for α Clever way: Apply statistical regression I will use R: FOSS package with 40 yr history (since S at Bell Labs) Sophisticated/accurate statistical tools Interpreted programming language (cf. Mathematica) Magic functions in R: nls() nonlinear LSQ ﬁt (α, β in one swell foop) optimize() to estimate X(1) if missing predict() for smooth interpolation/extrapolation from data plot() with many variants c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 8 / 64

Application of USL Outline 1 Review of USL 2 Application
of USL Memcache Varnish Postgres 3 Superlinearity Something for nothing Mathematica modeling Postgres 9.2FL superlinearity 4 Summary c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 9 / 64

Application of USL Memcache Memcache Joint work with S. Subramanyam
(Sun, USA) and S. Parvu (Sun, FI) Presented at Velocity 2010 conference c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 10 / 64

Application of USL Memcache Memcache Scalability Scaleup Scaleout c 2014
Performance Dynamics Superlinear Speedup October 15, 2014 11 / 64

Application of USL Memcache Memcache: Scaleout strategy Distributed cache of
key-value pairs Pre-loaded from RDBMS Tier of cheap, older CPUs (e.g., not multicore) Single threading ok, until next hardware roll c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 12 / 64

Application of USL Memcache Memcache: measurements Example (Read in raw
data and plot it) input <- read.table(fname,header=TRUE,sep="\t") print(input) plot(input$N,input$X_N,type="b") 2 4 6 8 10 12 14 100 150 200 250 300 350 Raw data for memcached 132 input$N input$X_N Typing input into R console: > input N X_N 1 1 89 2 2 160 3 4 272 4 8 333 5 10 352 6 12 339 7 14 315 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 13 / 64

Application of USL Memcache Memcache: nonlinear regression Example (Normalize, check
efﬁciencies, ﬁt USL) > input N X_N Norm Effcy 1 1 89 1.000000 1.0000000 2 2 160 1.797753 0.8988764 3 4 272 3.056180 0.7640449 4 8 333 3.741573 0.4676966 5 10 352 3.955056 0.3955056 6 12 339 3.808989 0.3174157 7 14 315 3.539326 0.2528090 Formula: Norm ˜ N/(1 + alpha * (N - 1) + beta * N * (N - 1)) Parameters: Estimate Std. Error t value Pr(>|t|) alpha 0.063520 0.011433 5.556 0.002597 ** beta 0.011323 0.001063 10.649 0.000126 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.07824 on 5 degrees of freedom Algorithm "port", convergence message: relative convergence (4) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 14 / 64

Application of USL Memcache Memcache: scalability analysis 0 2 4
6 8 10 12 14 0 50 100 150 200 250 300 350 Threads (N) Throughput X(N) in KOps/s USL Scalability Analysis of 'memcached 132' Data ! = 0.0635 ! = 0.011323 R2 = 0.9961 Nmax = 9.09 Xmax = 344.76 Xroof = 1401.13 Z(sec) = 0 TS = 3001131110 Created by NJG on Wed Jan 30 11:10:32 2013 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 15 / 64

Application of USL Varnish Varnish Data by D. Popa (DigitAir,
RO) via S. Parvu (Nokia, FI) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 16 / 64

Application of USL Varnish Varnish: architecture HTTP accelerator Reverse web
proxy caching system Sits in front of classic web server Caching handled by virtual memory Highly scalable (linear) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 17 / 64

Application of USL Varnish Varnish: measurements Example (Read in raw
data and plot it) input <- read.table(fname,header=TRUE,sep="\t") print(input) plot(input$N,input$X_N,type="b") 0 100 200 300 400 0 100 200 300 400 500 Raw data: Varnish input$N input$X_N By typing input into R console: > input N X_N 1 1 1.4 2 2 2.7 3 5 6.4 4 10 12.8 5 25 32.0 6 50 64.0 7 75 98.0 8 100 131.0 9 150 197.0 10 250 320.0 11 300 392.0 12 400 518.0 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 18 / 64

Application of USL Varnish Varnish: nonlinear regression Example (Fit to
USL model) N X_N Norm Effcy 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 Formula: Norm ˜ N/(1 + alpha * (N - 1) + beta * N * (N - 1)) Parameters: Estimate Std. Error t value Pr(>|t|) alpha 5.721e-04 7.220e-05 7.924 1.28e-05 *** beta -9.414e-07 1.978e-07 -4.759 0.000769 *** <<<<<<< beta < 0 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 2.078 on 10 degrees of freedom Number of iterations to convergence: 11 Achieved convergence tolerance: 1.199e-07 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 19 / 64

Application of USL Varnish Varnish: USL analysis with β <
0 0 100 200 300 400 0 100 200 300 400 500 USL Fit to Varnish Load (N) Throughput X(N) ! = 6e-04 ! = !1e-06 R2 = 0.9997 Nmax = NaN Xmax = NaN Xroof = 2447.13 Z(sec) = 0 TS = 3001131837 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 20 / 64

Application of USL Varnish Varnish: USL convex projection 0 200
400 600 800 1000 0 500 1000 1500 2000 USL bogus projection for Varnish Load (N) Throughput X(N) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 21 / 64

Application of USL Varnish Varnish: nonlinear regression Example (Fit to
β = 0 model) > input N X_N Norm Effcy 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 Formula: Norm ˜ N/(1 + alpha * (N - 1)) <<<<<<< beta=0 model **** Parameters: Estimate Std. Error t value Pr(>|t|) alpha 0.0002361 0.0000218 10.84 3.3e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 3.617 on 11 degrees of freedom Number of iterations to convergence: 5 Achieved convergence tolerance: 9.72e-08 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 22 / 64

Application of USL Varnish Varnish: USL β = 0 analysis
0 100 200 300 400 0 100 200 300 400 500 USL Fit to Varnish Load (N) Throughput X(N) ! = 2e-04 ! = 0 R2 = 0.9992 Nmax = NaN Xmax = NaN Xroof = 5928.53 Z(sec) = NaN TS = 3001131618 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 23 / 64

Application of USL Varnish Varnish: concave scalability projections 0 1000
2000 3000 4000 5000 0 1000 2000 3000 4000 5000 6000 USL Projections for Varnish Load (N) Throughput X(N) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 24 / 64

Application of USL Postgres Postgres Data via R. Haas (EnterpriseDB,
MA) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 25 / 64

Application of USL Postgres Postgres: PG 9.x measurements c 2014
Performance Dynamics Superlinear Speedup October 15, 2014 26 / 64

Application of USL Postgres Postgres: PG 9.x scalability analysis 0
20 40 60 80 0 10000 20000 30000 40000 50000 User threads (N) NOTx/Sec X(N) USL Analysis of PG91X ! = 0.0385534 ! = 0.00107257 R2 = 0.8687 Nmax = 29.94 Xmax = 42999.47 Xroof = 113434.9 Z(sec) = NaN TS = 604121120 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 27 / 64

Application of USL Postgres Postgres: β = 0 analysis for
PG 9.1 0 20 40 60 80 0 10000 20000 30000 40000 50000 User threads (N) NOTx/Sec X(N) USL Analysis of PG91X ! = 0.0385534 ! = 0 R2 = 0.8687 Nmax = NaN Xmax = NaN Xroof = 113434.9 Z(sec) = NaN TS = 604121128 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 28 / 64

Application of USL Postgres Postgres: USL β = 0 projections
for PG 9.1 0 100 200 300 400 0 50000 100000 150000 200000 Clients (N) TPS X(N) USL Projections for PG91X PG 9.1 data USL scalability profile USL max prediction PG 9.2FL avg saturation X c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 29 / 64

Superlinearity Outline 1 Review of USL 2 Application of USL
Memcache Varnish Postgres 3 Superlinearity Something for nothing Mathematica modeling Postgres 9.2FL superlinearity 4 Summary c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 30 / 64

Superlinearity Something for nothing Super Efﬁciencies c 2014 Performance Dynamics
Superlinear Speedup October 15, 2014 31 / 64

Superlinearity Something for nothing c 2014 Performance Dynamics Superlinear Speedup
October 15, 2014 32 / 64

Superlinearity Something for nothing Recent examples Perpetual motion Perpetual motion
contraptions violate conservation of energy law. Super efﬁciency is tantamount to getting more than 100% of something. You know it’s wrong but proving it is usually the harder part. a. Z-Torque bicycle crank b. Negative Kelvin temperatures c. Superluminal neutrinos Performance super efﬁciency Superlinear scalability (hardware or software) exhibits measured throughput performance that exceeds 100% of available capacity. Needs explaining (or debugging). c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 33 / 64

Superlinearity Something for nothing a. Z-Torque bicycle crank Conjecture (Jan
12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

Superlinearity Something for nothing a. Z-Torque bicycle crank Conjecture (Jan
12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] Bug: Bad physics Somebody doesn’t understand vector moments. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

Superlinearity Something for nothing b. Negative Kelvin temperatures Conjecture (Jan
3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

Superlinearity Something for nothing b. Negative Kelvin temperatures Conjecture (Jan
3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. Normal ground state Flipped ground state Bug: Maybe not Depends how you deﬁne temperature. Shortly, we’ll see negative time. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

Superlinearity Something for nothing c. Superluminal neutrinos Conjecture (Sept 23,
2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

Superlinearity Something for nothing c. Superluminal neutrinos Conjecture (Sept 23,
2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 Bug: Dec 14, 2011 Screwed by a $0.50 ﬁber connector not being screwed tight. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

Superlinearity Something for nothing Application superlinearity—This is what it looks
like 0 20 40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) Raw data for PG92flX c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 37 / 64

Superlinearity Something for nothing Superlinear efﬁciencies Example (PG 92FL data)
> input N X_N Norm Effcy 1 1 4439.85 1.000000 1.0000000 << ok 2 4 17111.29 3.854023 0.9635058 3 8 33305.86 7.501573 0.9376966 4 12 47466.03 10.690907 0.8909089 5 16 61403.72 13.830132 0.8643832 6 20 73229.07 16.493589 0.8246794 7 24 97529.10 21.966754 0.9152814 <-- increasing !? 8 28 143119.87 32.235290 1.1512604 <-- above 100% !? 9 32 183640.43 41.361849 1.2925578 <-- ? 10 36 186552.78 42.017808 1.1671613 <-- ? 11 40 187370.09 42.201892 1.0550473 <-- ? 12 44 188295.57 42.410340 0.9638714 13 48 184799.33 41.622873 0.8671432 14 52 182925.81 41.200895 0.7923249 15 56 181790.11 40.945098 0.7311625 16 60 176109.85 39.665717 0.6610953 17 64 176334.82 39.716388 0.6205686 18 68 171278.40 38.577516 0.5673164 19 72 168922.21 38.046825 0.5284281 20 76 165651.64 37.310185 0.4909235 21 80 164238.55 36.991910 0.4623989 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 38 / 64

Superlinearity Something for nothing Another way to screw everything up
Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? Don’t use log-linear axes. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? Don’t use log-linear axes. (And certainly not base-2 logs.) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? Don’t use log-linear axes. (And certainly not base-2 logs.) Without warning the reader ... c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? Don’t use log-linear axes. (And certainly not base-2 logs.) Without warning the reader ... BIG TIME! c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Superlinearity Mathematica modeling Mathematica Modeling c 2014 Performance Dynamics Superlinear
Speedup October 15, 2014 40 / 64

Superlinearity Mathematica modeling Generic form of superlinear scaling Ê Ê
Ê Ê Ê Ê Gradient inflection Gradient maximum 0 5 10 15 20 0 5 10 15 20 General form appears to be: Ideal linear slope: C(N)/N = 100% Data above linear slope: C(N)/N > 100% Point of inﬂection Otherwise convex upward: C(N) → ∞ Maximum in gradient Degradation beyond max Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 41 / 64

Superlinearity Mathematica modeling Plausible 3-parameter USL model CN (α, β,
γ) = N exp(−γ(N − 1)) + α(N − 1) + βN(N − 1) (1) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Tue 11 Oct 2011 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 3-Parameter Model Properties of eqn. (1): e−γ(N−1) → 1 as γ → 0 γ = 0 same as USL NLS ﬁt parameters: α = 0.001 β = 0.00425 γ = 0.1 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 42 / 64

Superlinearity Mathematica modeling Parameterized Elephant “With four parameters I can
ﬁt an elephant. With ﬁve I can make his trunk wiggle.” —John von Neumann params = 1 params = 2 params = 3 params = 4 params = 1 params = 2 params = 3 params = 4 See my animated blog post: A Winking Pink Elephant c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 43 / 64

Superlinearity Mathematica modeling Magic Moment !!! CN (α, β) =
N 1 + α(N − 1) + βN(N − 1) (2) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Thu 19 Apr 2012 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 2-Parameter Model NLS ﬁt parameters: α = −0.0859 β = 0.0064 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 44 / 64

Superlinearity Mathematica modeling Magic Moment !!! CN (α, β) =
N 1 + α(N − 1) + βN(N − 1) (2) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Thu 19 Apr 2012 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 2-Parameter Model NLS ﬁt parameters: α = −0.0859 β = 0.0064 Properties of eqn. (2): It’s our fave USL (Hello!) But α < 0 allowed Capacity credit Still have β > 0 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 44 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little
Story: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) Think of the non-trip time as a time credit Proposition (Faster than parallel) Negative α induces a negative execution time (i.e., a time credit) due to latent additional resources (e.g., more memory or cache) and that translates into performance that is faster than parallel. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α in USL
Initial unit of computing capacity p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 46 / 64

Superlinearity Mathematica modeling Positive α Some fraction of original capacity
lost to overhead Α p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 47 / 64

Superlinearity Mathematica modeling Negative α Some fraction of original capacity
is added (opposite sign) Α Α p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 48 / 64

Superlinearity Mathematica modeling Positive α Capacity Scaling Growing capacity loss
as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 2 3 4 5 6 p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 49 / 64

Superlinearity Mathematica modeling Negative α Capacity Scaling Growing capacity increase
as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 p 0.5 0.5 1.0 C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 50 / 64

Superlinearity Mathematica modeling Negative α in the Data This is
how it would appear in scalability measurements 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Linear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Sublinear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Superlinear Can generalize this concept to nonlinear scalability c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 51 / 64

Superlinearity Postgres 9.2FL superlinearity Postgres 9.2FL Analysis c 2014 Performance
Dynamics Superlinear Speedup October 15, 2014 52 / 64

Superlinearity Postgres 9.2FL superlinearity Postgres: PG 9.2FL measurements 0 20
40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) Raw data for PG92flX c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 53 / 64

Superlinearity Postgres 9.2FL superlinearity Postgres: PG 9.2FL scalability N ≤
48 0 20 40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) USL Analysis of PG92flX α = −0.0109191 β = 0.000257488 R2 = 0.9521 Nmax = 62.66 Xmax = 210508.7 Xroof = NaN Z(sec) = NaN TS = 1604121213 NJG Mon Apr 16 12:13:47 2012 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 54 / 64

Superlinearity Postgres 9.2FL superlinearity Postgres: PG 9.2FL scalability N ≤
80 0 20 40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) USL Analysis of PG92flX α = −0.0155072 β = 0.000386942 R2 = 0.9579 Nmax = 51.23 Xmax = 186930.1 Xroof = NaN Z(sec) = NaN TS = 1604121214 NJG Mon Apr 16 12:14:49 2012 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 55 / 64

Superlinearity Postgres 9.2FL superlinearity Superlinear scaling zones CN (α, β)
= N 1 − α(N − 1) + βN(N − 1) Superlinear Payback 0 5 10 15 20 N 0 5 10 15 20 C N (a) Data in superlinear zone where C(N)/N > 100% like perpetual motion (b) Data in payback zone paying the piper sudden degradation where C(N)/N 100% (c) Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 56 / 64

Superlinearity Postgres 9.2FL superlinearity Superlinear Payback Theorem Theorem (Gunther 2012)
Superlinear scaling in the USL model, with α < 0 and β > 0, always induces capacity degradation because the following properties hold: 1 Superlinear asymptote at: Nα = α − 1 α 2 Inﬂection point N± is the smallest positive root of: ∂2 N Csl (N, −α, β) = N3β2 + (3N − 1)(α − 1)β + (α − 1)α = 0 3 Capacity maximum at: Nmax = 1 − α β 4 ∀N > N±, superlinear capacity Csl (N) crosses the linear bound at: Nx = α β c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 57 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Superlinear asymptote N C
N N C N Proof. Linear bound: C(N)/N = 1 (dashed line) Super efﬁcient region: Csl (N)/N > 1 Superlinear segment curved upward by α < 0 (convex function) Asymptote at N = Nα (vertical line) where Csl (N, −α) → ∞ c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 58 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Upper bound and Saturation
N C N N C N Proof. A physical capacity bound must exist (dashed horizontal line) Csl (N) scaling curve will saturate below that bound (2nd red segment) That saturation segment must cross linear bound at Nx Therefore, must be an inﬂection point in Csl (N) at N± < Nx c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 59 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Inﬂection, Crossing and Degradation
N C N N C N Proof. Inﬂection point N± joins superlinear and saturation segments Csl (N) crosses linear bound at Nx = |α/β| Since α < 0, crossing can only arise from coherency term with β > 0 Hence, superlinearity always induces coherency roll off (payback) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 60 / 64

Superlinearity Postgres 9.2FL superlinearity Payback parameters for PG 9.2FL 0
20 40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) USL Analysis of PG92flX α = −0.0155072 β = 0.000386942 R2 = 0.9579 Nmax = 51.23 Xmax = 186930.1 Xroof = NaN Z(sec) = NaN TS = 1604121214 NJG Mon Apr 16 12:14:49 2012 α = −0.0155, β = 0.000387 N± = 14.0351 Nx = 40.0517 Nmax = 51.2253 Nα = 65.5161 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 61 / 64

Summary Outline 1 Review of USL 2 Application of USL
Memcache Varnish Postgres 3 Superlinearity Something for nothing Mathematica modeling Postgres 9.2FL superlinearity 4 Summary c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 62 / 64

Summary Summary USL is 2-parameter scalability model C(N, α, β)
Requires α, β > 0 for C(N) to be concave function Superlinear measurements C(N)/N > 1 do exist Extra ﬁtting parameter C(N, α, β, γ) ⇒ JvN elephants Discovered superlinear USL with α < 0 Super-efﬁciencies are not free Like perpetual motion: no free lunch pay the piper eventually debugging it is the hard part Thm: Superlinearity always followed by capacity degradation More (Oracle ???) superlinear measurements would be good c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 63 / 64

Summary Thank you for attending! Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com
Twitter/DrQz Facebook [email protected] +1-510-537-5758 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 64 / 64

Superlinear Speedup: The Perpetual Motion of Pa...

Superlinear Speedup: The Perpetual Motion of Parallel Performance

More Decks by Dr. Neil Gunther

Featured

Transcript