Dr. Neil Gunther Performance Dynamics NoCOUG Winter Conference 2012 Thursday, February 23 @ 1 pm SM c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44
Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44
a Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 1 / 44
Laws Zipf’s law 5e+02 2e+03 5e+03 2e+04 5e+04 2e+05 Zipf's Law for UK English Words Ranked Words (W) Frequency (F) the it at would much us love lay eye dare Created by NJG Sat Oct 24 11:15:45 2009 F = W!D D = 1.1311 c = 981293.8 SQL accesses 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log SQL Times [1:100] Index etimesA Internet packets DRAFT log10(d) log10(r/s) 0 1 2 3 4 5 0 1 2 3 4 Website visitors !"#$%&'()*')+,'-./'0+-/1+/-' ' $ !"$ "$ !"$ #$ !"$ $$ !"$ "$ !"$ #$ !"$ $$ %&'()*+,-+&%./&)+0.1.2,*1$ *3%4+,-+1.2)$ 567+0.1.2,*1+2,+1.2)1$ -.2+8.29+ !$ +:+!$ c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 3 / 44
Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 3 / 44
regular polygons with successively shorter sides. Polygon represents measurement device or ruler Double-log plot of the estimated circumference (y-axis) vs. the length of the polygon side (x-axis). As the sides get shorter the perimeter of the polygon approaches the actual circumference. 1.00 0.50 0.20 0.30 0.15 1.50 0.70 Ln S 6.1 6.15 6.2 6.25 Ln P Estimated perimeter vs. length of polygon sides Euclidean geometry of the circle Greeks knew (irrational) ratio of diameter D to circumference C: π = C/D Successive measurements converge to fixed value: C Speed of convergence is clearly seen on logarithmic axes. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 4 / 44
shapes like coastline of Britain? Greeks would never consider such imperfect non-Euclidean geometry. Successively smaller ruler size (S) produces longer coastline estimate (L)! Why? Smaller ruler gets into more coastal nooks and crannies. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 5 / 44
(L) against ruler size (S) on log-log axes1. Why do border lengths fall on straight lines in log-log plot? Any crazy country shape is then characterized by a single number: its slope! Reason remained obscure until Mandelbrot resurrected it as geometry of fractals2 1 Lewis F. Richardson (1961) “The problem of contiguity: An appendix to Statistic of Deadly Quarrels.” 2 B. Mandelbrot, The Fractal Geometry of Nature, W. H. Freeman, New York (1983) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 6 / 44
lines on log-log plot have form: Y = mX + c But Y ≡ ln(L) and X ≡ ln(S) with negative slope: ln(L) = −α ln(S) + ln(k) = ln(k S−α) Taking antilogs of both sides reveals general power law form: L = k S−α L = k Sα (1) Reverse this logic If your data looks “linear” on a log-log plot Assume it signals presence of a power law like (1) Find the slope to characterize it Exponent α is the “power” in power law c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 7 / 44
1.0 Α 1.5 Α 2.0 Α 3.0 20 40 60 80 100 0.01 0.02 0.03 0.04 General shape of power law eqn.(1) is a hyperbola Blue curve is an exponential function Other curves are power laws with increasing α exponents (slopes) Big powers Large α power laws are indistinguishable from an exponential c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 8 / 44
k CDF 0 10 20 30 40 50 x 0.1 0.2 0.3 0.4 0.5 PDF Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than “normal” Mass of tail measured by cumulative distribution function (CDF) Log-log fitting On a log-log plot we are trying to fit right-hand side data, not left side c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 9 / 44
How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 9 / 44
c.1990 impact of ISDN broadband Packet tracing measurements at Bellcore Surprise: large packet trains Surprise: Service times file-size dependent Surprise: Packet arrivals not always Poisson Surprise: Queueing models break down How to do CaP for future Internet? Why is it happening? Part of the answer is power laws Netflix uses 33% of USA Internet BW c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 10 / 44
0 1 2 3 4 5 0 1 2 3 4 Figure 4.2.1 (b). Pox plot of R/S for sequence AUG89.MB. The plot tightly clusters around a straight line whose asymptotic slope clearly lies between the slopes 0.5 (lower dotted line) and 1.0 (upper dotted line) and is readily estimated (using the "brushed" points) to be about 0.79. 39 Y = (max − min)/std dev (“rescaled range”) X = sample size (in trace file c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 12 / 44
0.05 0.10 0.15 0.20 0.25 Bellcore packet data shows fractal behavior in time (5 decades) Diffusion model of Brownian motion Solution is a normal distribution which evolves in time (t) ∂t f(x, t) = σ2∂2 x f(x, t) f(x, t) = 1 √ 4πσ2t exp „ −(x − µ)2 4σ2t « L2 = 4σ2t Diffusion length L = √ 4σ2t L ∼ t 1 2 Generalization L ∼ t 1 2 → tH (Brownian → Levy) What happens if H = 1 2 becomes 1 2 < H < 1? c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 13 / 44
Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 14 / 44
How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 14 / 44
Data (already ranked) is 1000 most common wordforms in UK English based on 29 works of literature by 18 authors (4.6 × 106 words) Wordform: english word Abs: absolute frequency (total number of occurrences) r: range (number of texts in which the word occurs) mod: modified frequency as defined by Rosengren (1972) Read data file > dir<-setwd("˜/../GDAT Scripts/Power Laws/") > td<-read.table("zipf1000.txt",header=TRUE) > head(td) Rank Wordform Abs r mod 1 1 the 225300 29 223066.9 2 2 and 157486 29 156214.4 3 3 to 134478 29 134044.8 4 4 of 126523 29 125510.2 5 5 a 100200 29 99871.2 6 6 I 91584 29 86645.5 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 15 / 44
Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the their us love voice true state eye stand worth service neck land art Linear plot of data > plot(td$Rank, td$Abs, type="p", main="Ranked 1000 UK English Words", xlab="Ranked words (W)", ylab="Frequency of occurrence (F)", xaxt="n",log="xy",cex=0.5) > ticks.at<-seq(min(td$Rank), max(td$Rank),10) > ticks.lab<-as.character(td$Wordform[ticks.at]) > Axis(td$Rank, at=ticks.at, las=1, side=1,labels=ticks.lab, cex.axis=0.75) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 16 / 44
2e+05 Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the it at would much us love lay eye dare Data on log-log axes > plot(td$Rank, td$Abs, type="p", main="Ranked 1000 UK English Words", xlab="Ranked words (W)", ylab="Frequency of occurrence (F)", xaxt="n",log="xy",cex=0.5) > ticks.at<-seq(min(td$Rank), max(td$Rank),10) > ticks.lab<-as.character(td$Wordform[ticks.at]) > Axis(td$Rank, at=ticks.at, las=1, side=1,labels=ticks.lab, cex.axis=0.75) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 17 / 44
2e+05 Ranked 1000 UK English Words Ranked words (W) Frequency of occurrence (F) the it at would much us love lay eye dare Regression fit to logarithmic data # regression model of Y=log(y) and X=log(x) > z.fit <- lm(log(td$Abs) ˜ log(td$Rank)) # Must transform back to log scaled coords in plot > ly <- exp( (coef(z.fit)[2])*log(td$Rank) + coef(z.fit)[1] ) > lines(td$Rank, ly, col="blue",lty="solid",lwd=2) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 18 / 44
Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 19 / 44
time Frequency 0 100 200 300 400 0 50 100 150 Histogram of measured SQL query times x-axis is elapsed time in seconds y-axis is number of queries with that time What distribution profile is it? Exponential, log-normal,... Can’t tell by just staring at it c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 20 / 44
300 400 500 0 100 200 300 400 Index orad$Elapstime Linear plot of unranked data > plot(orad$Elapstime, type="h") Like Zipf’s law, data must be ranked by frequency of occurrence c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 22 / 44
20 50 100 200 500 0.1 0.5 1.0 5.0 50.0 500.0 Log-Log SQL Times Index otr Log-log plot of ranked data > plot(otr,log="xy",main="Log-Log SQL Times") Clearly this profile is not power law overall But the first 100 queries do appear to be power law c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 24 / 44
across 3 regions as follows: Windowed plots # Define data windows of ranked data etA<-otr[1:100] etB<-otr[100:270] # gap... etC<-otr[420:500] plot(etA,type="p",log="xy",main="Log-Log of SQL-A Times") plot(etB,type="p",log="y", main="Log-Lin of SQL-B Times") plot(etC,type="p",log="y", main="Log-Lin of SQL-C Times") 1 2 5 10 20 50 100 100 200 300 400 500 Log-Log of SQL-A Times Index etA 0 50 100 150 30 40 50 60 70 80 Log-Lin of SQL-B Times Index etB 0 20 40 60 80 0.090 0.095 0.100 0.105 0.110 Log-Lin of SQL-C Times Index etC c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 25 / 44
20 50 100 100 200 300 400 500 Log-Log SQL A-Times Index etA From coef(zA.fit) know log(xA) = -0.4632485 Empirical slope γ = 0.46 to two significant decimal digits About half Zipfian slope γ = 1.0 ± 0.5 Correlations are stronger than for Zipf Hypothesis Shorter query times (window A) may be associated with dictionary lookups or other structured data. That structure provides correlations. Longer queries in windows B and C are not structured (ad hoc?) and are therefore more randomized. The lack of strong correlations shows up as different exponential decay rates. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 30 / 44
Fractal? How It Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 30 / 44
required to register with the Australian Tax Office (ATO) for an Australian Business Number (ABN) to claim an income tax refund. The ABN was introduced in Y2K. Data from website hosting initial ABN registrations. Period covers March 27 to September 19, 2000 Post-advertising traffic 1 March to 30 May , 2000 Deadline spike on 31 May, 2000 Smaller traffic peaks from 1 June to 30 June, 2000 Post deadline period from 1 July to 19 Sept, 2000 Full details can be found in my CMG-A paper cmga-p10167.pdf included in your GDAT class materials. c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 32 / 44
4 2000 21 5 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 2 106 ORA Connections y-axis is the number of Oracle RDBMS connections Here, the y-axis is log scaled Peak growth preceding spike looks linear on semi-log plot x-axis index (not shown) is “days from the start of data window” time series index range t = 0 to t = 38 days c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 34 / 44
2000 21 4 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 ORA Connections Linear peak-growth on semi-log axes Curve must be an exponential function Use Exp as regression model ˆ y(t) = A exp(Bt) Model parameters Origin: A = 114128 Curvature: B= 0.0175 Doubling period: T2 = ln(2) B ∼ 6 months c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 35 / 44
4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106 ORA Connections Revert to linear axes to review the trend Exponential forecast up to the crosshairs looks valid But significantly underestimates onset of the “11th hour” peak As well as rapid drop off on RHS of the peak c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 36 / 44
80 100 120 400000 600000 800000 1. 106 Power law has a critical point tc Equation: ˆ y(t) = k|t − tc|−γ See far LHS curve with tc = 40 (blue) Estimate ˆ y(t) → ∞ at t = tc Translate ˆ y(t) rightward until lower part of curve matches Exp function (red) Critical point also moves to tc = 61 (31 May, 2000) Critical point New element is the appearance of a critical point at tc Power law goes infinite and spikes at tc = 61 with γ = 0.6421 c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 37 / 44
law 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106 ORA Connections Exponential trend is consistent with data through April 2000 Completely underestimates onset of the “11th hour” spike Completely overestimates decay of traffic load beyond spike Data is already exceeding Exp model during April-May period Power law model predicts all these effects quite well Critical point is inclusion of critical point tc Use Exp model to baseline a power law model c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 38 / 44
the spike coming without knowing tc ? 2 weeks from data start 0.30 0.35 0.40 0.45 9.24 9.26 9.28 9.30 log(scale) log(SDF) SDF H = 0.714 1.5 months from data start 0.4 0.6 0.8 1.0 9.0 9.2 9.4 9.6 9.8 10.0 log(scale) log(SDF) SDF H = 0.687 Estimate H = 1 2 (1 − β) from the slope β of ln[S(f)] vs ln[f] in frequency domain: H ∈ [1 2 , 1) persistent autocorrelations (increase/decrease typically followed by increase/decrease) H = 1 2 statistically independent random fluctuations H ∈ (0, 1 2 ] antipersistent autocorrelations (increase/decrease typically followed by decrease/increase) c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 39 / 44
Works Internet Traffic 2 Applications Word Fractals Fractal Query Times Fractal Time Series 3 Conclusions c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 39 / 44
are not like standard statistical distributions Power laws have fatter tails than standard dsns Fat tail carries the bulk of the information (tale is in the tail) Power laws are easy to demonstrate with log-log plot Looked at 3 examples: Zipf, ORA SQL, and ORA ABN traffic spike Trick is to explain persistent strong correlations May need more data but that’s exactly how it should be c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 40 / 44
to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44
to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes More persistent response degradation than usual c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44
to see them in raw performance data Must transform your data to see them Ranked data appears linear on double-log axes More persistent response degradation than usual Can seriously impact overall database performance c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 42 / 44
H. Freeman, 1983 L. Liebovitch, Fractals and Chaos Simplified for the Life Sciences, Oxford Uni. Press, 1998 K. Park and W. Willinger, Self-Similar Network Traffic and Performance Evaluation, John Wiley, 2000 N. J. Gunther, Guerrilla Capacity Planning, Springer, 2007 www.perfdynamics.com/iBook/gcap.html The R Project Tools for Statistical Computing www.r-project.org Performance Dynamics Educational Services Local and on-site training www.perfdynamics.com/Classes/schedule.html c 2012 Performance Dynamics Oracle Meets Fractals February 21, 2012 43 / 44