Quantifying Scalability FTW

Quantifying Scalability FTW How to do a scalability surge in
∆t < 1 hour Dr. Neil J. Gunther Performance Dynamics SURGE 2010 Sept 30 – Oct 1 SM c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 1 / 45

Scaling vs. Scalability Outline 1 Scaling vs. Scalability 2 Components
of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 2 / 45

Scaling vs. Scalability Motivation for This Talk Practical methodology for
assessing the cost-beneﬁt of a given scalability strategy quantify system scalability scalability is not a single number (it’s a function) all measurements are wrong by deﬁnition need a framework to validate data measurement + model == information Scalability: sustainable performance under increasing load (size N) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 3 / 45

Scaling vs. Scalability Jack and the Beanstalk Jack climbs a
magic beanstalk up into the clouds (10,000 ft?) Guarded by a giant who is 10 times bigger than Jack “Fee-ﬁe-foe-fum!” and all that c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 4 / 45

Scaling vs. Scalability Where Are All the Giants? Can giants
exist? Can 10,000’ beanstalk exist? Guinness world record Robert P. Wadlow (USA) Height: 8’11” (2.72 m) Jack Height: 1.8 m tall (L) Weight: 90 kg Giant (10x bigger) Height: 18 m tall (10 × L) L3 × 90 kg = 103 × 90 kg Weight: 90,000 kg A bone-crushing 100 tons! c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 5 / 45

Scaling vs. Scalability Scaling vs. Scalability Natural scaling Inherent critical
limits to sustainable loads When the load (volume) exceeds the material strength (supporting area), things tend to snap Load ∼ L3 (volume), but strength ∼ L2 (cross-section area) Computer scalability No critical limit Point of diminishing returns Scalability is about sustainable size c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 6 / 45

Scaling vs. Scalability Natural System Scaling Weight Strength 0.0 0.5
1.0 1.5 2.0 Weight 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Scalability Giant’s legs, beanstalks, bridges, collapse where the curves cross c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 7 / 45

Scaling vs. Scalability Computer System Scaling Scaling Degradation 0 200
400 600 800 1000 Users 100 200 300 400 500 600 Scalability Critical point is maximum in throughput curve Beyond max performance degradation or retrograde scalability c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 8 / 45

Scaling vs. Scalability Web 2.0 Scalability Fails Twitter.com Amazon EC2
Cuil.com Apple iStore Google Gmail WolframAlpha c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 9 / 45

Scaling vs. Scalability Scalability is Not a Number Google 2005
paper “Parallel Analysis with Sawzall” “If scaling were perfect, performance would be proportional to the number of machines... In our test, the effect is to contribute 0.98 machines.” Translation: Not 100% linear but 98% of linear or C++, while capable of handling such tasks, are more awkward to use and require more effort on the part of the programmer. Still, Awk and Python are not panaceas; for instance, they have no inherent facilities for processing data on multiple machines. Since the data records we wish to process do live on many machines, it would be fruitful to exploit the combined computing power to perform these analyses. In particular, if the individual steps can be expressed as query operations that can be evaluated one record at a time, we can distribute the calculation across all the machines and achieve very high throughput. The results of these operations will then require an aggregation phase. For example, if we are counting records, we need to gather the counts from the individual machines before we can report the total count. We therefore break our calculations into two phases. The first phase evaluates the analysis on each record individually, while the second phase aggregates the results (Figure 2). The system described in this paper goes even further, however. The analysis in the first phase is expressed in a new procedural programming language that executes one record at a time, in isolation, to calculate query results for each record. The second phase is restricted to a set of predefined aggregators that process the intermediate results generated by the first phase. By restricting the calculations to this model, we can achieve very high throughput. Although not all calculations fit this model well, the ability to harness a thousand or more machines with a few lines of code provides some compensation. !""#$"%&'#( !"#$%&'#%()*$(' +,-$$%&'&.$.' ! ! )*+&$#,( /.0'&.$.' Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically involves less data than the previous. Of course, there are still many subproblems that remain to be solved. The calculation must be divided into pieces and distributed across the machines holding the data, keeping the computation as near the data as possible to avoid network bottlenecks. And when there are many machines there is a high probability of some of them failing during the analysis, so the system must be 3 Theo Schlossnagle: “Linear scaling is simply a falsehood” p.71 Scalability is a function Not a number Always limits, e.g., throughput capacity Want to quantify such limits c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 10 / 45

Components of Scalability Outline 1 Scaling vs. Scalability 2 Components

Components of Scalability Math Phobes Can Relax Proper quantiﬁcation involves
math Quantifying scalability requires some math But nothing as complicated as this1 Pr{Murphy} = (U + C + I) × (10 − S) 20 × A 1 − sin(F/10) I have no idea what this equation is (ask Theo ) 1Source: Theo Schlossnagle, Scalable Intenet Architectures, p.12 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 12 / 45

Components of Scalability Equal Bang for the Buck (Concurrency) c
2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 13 / 45

Components of Scalability Cost of Sharing Resources (Contention) c 2010
Performance Dynamics Quantifying Scalability FTW October 1, 2010 14 / 45

Components of Scalability Diminishing Returns (Saturation) c 2010 Performance Dynamics
Quantifying Scalability FTW October 1, 2010 15 / 45

Components of Scalability Negative ROI (Inconsistency Delays) c 2010 Performance
Dynamics Quantifying Scalability FTW October 1, 2010 16 / 45

Components of Scalability Universal Scalability Law (USL) Pulling it all
together: N users or processes c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) 3 Consistency as in ACID & CAP Thm (amount β) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) 3 Consistency as in ACID & CAP Thm (amount β) Theorem (Universality) Only need α, β coefﬁcients to determine maximum in C(N) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Problem: Bad scalability data Outline 1 Scaling vs. Scalability 2
Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 18 / 45

Problem: Bad scalability data Data Are Not Divine Data come
from the Devil Models come from God Skepticism should rule! Theorem Data + Models ≡ Insight Data needs to be put in prison (a model) and made to confess the truth Corollary Waterboarding your data is ok c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 19 / 45

Problem: Bad scalability data Scalability Measurements J2EE web application Throughput
measurements using Apache Jmeter c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 20 / 45

Problem: Bad scalability data Bad Data The Problem Monotonically increasing,
looks ok visually but some data are > 100% efﬁcient Can’t haz Or your have some very serious explaining to do c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 21 / 45

Problem: Bad scalability data Put Your Data in Prison Excel
table of various USL quantities. Column F is scaling efﬁciency: C/N. Between N = 5 and 150 vusers, efﬁciencies > 1.0. Can’t have more than 100% of anything. Need to explain? Data + Model == Information Merely attempting to set up the USL model in Excel or R, shows measurement data (not the model) are wrong. c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 22 / 45

Problem: eBay 1.0 scalability Outline 1 Scaling vs. Scalability 2
Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 23 / 45

Problem: eBay 1.0 scalability eBay 1.0 Capacity Upgrades The Problem
Want to compare capacity upgrades for Sun E10K backend Running ORA dbms for both OLTP and DSS eBay 1.0 had no performance measurements of their app eBay Inc. was just hiring into a QA/load-test group No scalability measurements Sun PS provided me with their M-values M-values ⇒ α 0.005 But that’s only α = 1 2 % contention ... WTF !? ORA dmbs is more typically α ≈ 3% But at least we have things quantiﬁed c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 24 / 45

Problem: eBay 1.0 scalability eBay 1.0 Optimistic Projections Python: CPU
Upgrade Scenarios - Optimistic 0.00 50.00 100.00 150.00 200.00 250.00 0 4 8 12 16 20 24 28 32 Weeks since 8/5/99 Total Utilization (%) 52way@333 52way@400 64way@333 64way@400 1 E10K 2 E10Ks c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 25 / 45

Problem: eBay 1.0 scalability How to keep the peace? The
Solution Apply the USL model to Sun’s M-values C(N) = N 1 + α(N − 1) + βN(N − 1) ORA backend ⇒ α 0.03 Simply re-run the USL curves with that value. Voila! Creates a scalability envelope eBay mileage will vary within this scalability envelope c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 26 / 45

Problem: eBay 1.0 scalability eBay 1.0 Realistic Projections CPU Upgrade
Scenarios - Realistic 0.00 50.00 100.00 150.00 200.00 250.00 0 4 8 12 16 20 24 28 32 Weeks since 8/5/99 Total Utilization (%) 52way@333 52way@400 64way@333 64way@400 1 E10K 2 E10Ks c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 27 / 45

Problem: memcache scalability Outline 1 Scaling vs. Scalability 2 Components

Problem: memcache scalability Velocity 2010, June 24 Velocity 2010, June
24 2 2 Scalability Scalability c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 29 / 45

Problem: memcache scalability Scale out with memcache Tiers of older
servers Servers often blades Mostly single processor Single threading ok c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 30 / 45

Problem: memcache scalability But ... Datacenter HW gets rolled Single-CPU
blades will be replaced with multicores Multicores will be the only game in town (HW vendor decision) The Problem memcached is thread limited on multicores c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 31 / 45

Problem: memcache scalability The evidence Velocity 2010, June 24 Velocity
2010, June 24 12 12 Memcached Memcached scaling is thread limited scaling is thread limited c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 32 / 45

Problem: memcache scalability Performance measurement rig SunFire X4170 w/ 64
GB RAM2 Intel Nehalem multicores 2 processor sockets ⇒ 2 quad-cores == 8 cores Intel SMT enabled ⇒ 2 threads/ core 16 virtual CPUs seen by Solaris Load generators ←→ 10Gbps link ←→ SUT 2Joint work with Sun, pre-Oracle acquistion c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 33 / 45

Problem: memcache scalability USL regression in Excel c 2010 Performance
Dynamics Quantifying Scalability FTW October 1, 2010 34 / 45

Problem: memcache scalability USL regression in R # Standard non-linear
least squares (NLS) fit using USL model usl <- nls(Norm ˜ N/(1 + alpha * (N-1) + beta * N * (N-1)), input, start=c(alpha=0.1, beta=0.01)) # Get alpha & beta parameters for use in plot legend x.coef <- coef(usl) # Determine sum-of-squares for R-squared coeff from NLS fit sse <- sum((input$Norm - predict(usl))ˆ2) sst <- sum((input$Norm - mean(input$Norm))ˆ2) # Calculate Nmax and X(Nmax) Nmax<-sqrt((1-x.coef[’alpha’])/x.coef[’beta’]) Xmax<-input$X_N[1]* Nmax/(1 + x.coef[’alpha’] * (Nmax-1) + x.coef[’beta’] * Nmax * (Nmax-1)) # Plot all the results plot(x<-c(0:max(input$N)), input$X_N[1] ...) title("USL Scalability") points(input$N, input$X_N) legend("bottom", legend=eval(parse(text=sprintf(...) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 35 / 45

Problem: memcache scalability Raw bench data p Xp 50 100
150 200 250 300 10 20 30 40 50 60 Data smoother p Xp 50 100 150 200 250 300 10 20 30 40 50 60 USL fit p Xp 50 100 150 200 250 300 10 20 30 40 50 60 USL fit + CI bands p Xp 50 100 150 200 250 300 10 20 30 40 50 60 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 36 / 45

Problem: memcache scalability The envelope please! Table: Intel 2-socket duo
core + SMT Version α β Nmax 1.2.8 0.0255 0.0210 7 1.4.1 0.0821 0.0207 6 1.4.5 0.0988 0.0209 6 Little’s law 3: N = X(R + Z) threads Also know R is on the order of ms (10−3 s), so latency dominated by client-side “think time” Z = 5 s in tests Avg X ≈ 350 KOPS on Intel quad-core Therefore: N ≈ 350 × 103 × 5 = 1, 750, 000 threads Same as users, assuming 1 user process per thread 3See e.g., Scalable Intenet Architectures, p.127 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 37 / 45

Problem: memcache scalability SPARC Solaris mods Table: SPARC T2 +
Solaris Version α β Nmax Vanilla 0.0041 0.0092 22 Modiﬁed 0.0000 0.0004 48 The Solution Partitioned mcd hash table Single hash table contention avoided by partitioning table Solaris patches improve scalability to ≈ 40 threads Throughput X increases from 200 → 400 KOPS on SPARC CMT Can’t assume same 2x win on x86 arch c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 38 / 45

Summary and Review Outline 1 Scaling vs. Scalability 2 Components

Summary and Review Why Should You Care? Werner Vogels, Amazon
CTO “Scalability is hard because it cannot be an after-thought. Good scalability is possible, but only if we architect and engineer our systems to take scalability into account.” Old reason: Concurrent programming was hard on SMPs New reason: Multicores are SMPs on a chip (HW vendor decision) More threads enable higher concurrency, shorter user latencies But it’s hard: beware the 3rd C in the USL (β coefﬁcient) Theo Schlossnagle, OmniTI CEO “Simply having a solution that scales horizontally doesn’t mean that you are safe.” c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 40 / 45

Summary and Review Where is Your Application? Class A Class
B Ideal concurrency (α, β = 0) Contention-only (α > 0, β = 0) Shared-nothing platform Message-based queueing (e.g., MQSeries) Google text search Message Passing Interface (MPI) applications Lexus–Nexus search Transaction monitors (e.g., Tuxedo) Read-only queries Polling service (e.g., VMWare) Peer-to-peer (e.g., Skype) Class C Class D Incoherent-only (α = 0, β > 0) Worst case (α, β > 0) Scientiﬁc HPC computations Anything with shared writes Online analytic processing (OLAP) Hotel reservation system Data mining Banking online transaction processing (OLTP) Decision support software (DSS), Java database connectivity (JDBC) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 41 / 45

Summary and Review USL Scalability Zones Think scalability zones rather
than Scalability curves A B C 0 20 40 60 80 100 120 N 0 200 400 600 800 1000 X N Websphere measurements (dots) A Asynchronous messaging (average queue lengths) B Synchronous messaging (worst queue lengths) C Synchronous messaging + pairwise exchanges c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 42 / 45

Resources and Coordinates Outline 1 Scaling vs. Scalability 2 Components

Resources and Coordinates Resources and Coordinates Castro Valley, California, 94552
Resources: Books Training USL tools Coordinates: www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz [email protected] +1-510-537-5758 Chapters 4–6 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 44 / 45

Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW
October 1, 2010 45 / 45

Quantifying Scalability FTW

Quantifying Scalability FTW

More Decks by Dr. Neil Gunther

Other Decks in Technology

Featured

Transcript