Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Latency's Worst Nightmare: Performance Tuning Tips and Tricks

Latency's Worst Nightmare: Performance Tuning Tips and Tricks

Delivering high performance web applications, using Amazon Web Services.

Matt Wood

April 18, 2013

More Decks by Matt Wood

Other Decks in Technology


  1. Figure 3 Interactive user productivity versus computer response time for

    human-intensive interactions for system A E 600 3 - T 7 w z E 500 - U E w E - > > - - 400 - 3 n F 2 300 - 200 - 100 - 0 - 0 -" INTERACTIVE USER PRODUCTIVITY (IUP) - HUMAN-INTENSIVE COMPONENT OF IUP A MEASURED DATA (HUMAN-INTENSIVE " COMPONENT) 0 0 0 0 I 1 I I I 1 2 3 4 5 COMPUTER RESPONSE TIME (SI A. J. Thadhani, IBM Systems Journal 20 (4), 1981 Productivity and response time
  2. -0.70% -0.60% -0.50% -0.40% -0.30% -0.20% -0.10% 0.00% 50ms pre-

    header 100ms pre- header 200ms post- header 200ms post- ads 400ms post- header Page load time and average daily searches per user http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/
  3. -5.00% -4.50% -4.00% -3.50% -3.00% -2.50% -2.00% -1.50% -1.00% -0.50%

    0.00% 50 200 500 1000 2000 Percent change Added delay Queries per visitor Query refinement Revenue per visitor Any clicks Satisfaction Page load delay and business metrics http://www.webperformancetoday.com/2013/04/10/cloud-connect-2013-web-acceleration-and-front-end-optimization-slides/
  4. 2.2s 15.4% reduction in page load time increase in conversion

    rate https://blog.mozilla.org/metrics/2010/04/05/firefox-page-load-speed-%E2%80%93-part-ii/
  5. Initial connection SSL negotiation Time to first byte Content download

    %CPU kbps Network latency Download + negotiation time
  6. Europe Amsterdam (2) Dublin Frankfurt (2) London (2) Madrid Milan

    Paris (2) Stockholm South America Sao Paulo North America Ashburn, VA (2) Dallas, TX (2) Hayward, CA Jacksonville, FL Los Angeles, CA (2) Miami, FL Newark, NJ New York, NY (3) Palo Alto, CA Seattle, WA San Jose, CA South Bend, IN St. Louis, MO CloudFront Edge Locations
  7. Static and dynamic content Cache dynamic pages (search results). Use

    query strings or cookie for cache keys. Network and Path optimizations accelerate even unique content.
  8. Build for horizontal scale Decrease request contention. Reduce capacity planning

    headaches. Requires a stateless application architecture.
  9. Small things, loosely coupled. Do one thing, and do it

    well. The Unix Way. Take a look at the Unicorn and Rainbows approach. Asynchronous be default (where possible).
  10. Fast booting with EBS-backed instances. Linux is faster to boot

    than Windows. EBS-backed instances are faster than S3 backed.
  11. Benchmark on business metrics. Relate application metrics to business metrics.

    Customers supported/instance. Photos processed/dollar.
  12. The Canary in the Coal Mine Standardize on 64-bit AMIs.

    Deploy across instance types. Evaluate new instance types with real traffic.
  13. Interface with the data store. Faster if you don’t have

    to go to disk. Increased concurrency.
  14. Managing state. Transient data only. Web server state, high score

    tables, etc. Time consuming task results (many to many query results).
  15. Best practices. Assume cold cache latency in application architecture. Set

    appropriate time-to-live. Batch requests rather than sequential single. Architect for cache failure.
  16. Vertical scale. More resources for a single DB engine. Add

    memory for DB caches. Add CPU for more intensive queries.
  17. Provisioned throughput is consistent. Consistent, predictable performance. Relational databases with

    RDS. NoSQL stores with DynamoDB. Relational & NoSQL with EC2 and EBS.
  18. Provisioned throughput with RDS. 12.5k IOPS on MySQL. 25k IOPS

    on Oracle. 10k IOPS on SQL Server. Provision up to 30k for reduced latency.
  19. Provisioned throughput and instance types. Optimized for provisioned IO storage:

    m1.large: 500 Mbps m1.xlarge, m2.xlarge, m2.4xlarge: 1000 Mbps
  20. 7 6 1 2 Content delivery Horizontal scale 3 Instance

    selection 4 Caching 5 Read capacity
  21. Standard EBS volumes. Moderate or bursty workloads. 100 IOPS, bursting

    to hundreds of IOPS. Bursting is good for boot volumes.
  22. Provisioned IOPS with EBS volumes. Predictable, high performance for IO

    intensive workloads. 2000 IOPS per volume. Stripe volumes for additional IO. Deliver within 10% of the performance, 99.9% of the time.
  23. High bandwidth networking cg1.4xlarge, cc2.8xlarge, hi1.4xlarge, hs1.8xlarge and cr1.8xlarge run

    on non-blocking, 10 gigabit networking. Not EBS-Optimized, but can be used with provisioned IOPS volumes.
  24. High I/O instances Designed for for high throughput database workloads.

    2 x 1TB SSDs 2 GB/s for reads 1.1 GB/s for writes
  25. High Storage instances High sequential IO. 24 x 2TB drives.

    2.4 GB/s of 2MiB sequential reads. 2.6 GB/s for sequential writes.
  26. 7 1 2 Content delivery Horizontal scale 3 Instance selection

    4 Caching 5 Read capacity 6 Block store
  27. 1. Make fewer HTTP requests. 2. Use a content delivery

    network. 3. Add an expires header. 4. GZIP components. 5. Put style sheets at the top. 6. Put scripts at the bottom. 7. Avoid CSS expressions. 8. Make JavaScript and CSS External. 9. Reduce DNS lookups. 10. Minify Javascript. 11. Avoid redirects. 12. Remove duplicate scripts. 13. Configure ETags. 14. Make AJAX cacheable. http://stevesouders.com/hpws/rules.php 14 rules for faster loading web sites
  28. 7 1 3 4 6 5 Content delivery Instance selection

    Caching Read capacity Front end optimization 2 Horizontal scale Block store