Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A/B Testing From The Ground Up (Nov 2017)

maltzj
November 06, 2017

A/B Testing From The Ground Up (Nov 2017)

maltzj

November 06, 2017
Tweet

More Decks by maltzj

Other Decks in Programming

Transcript

  1. Hi! I’m Maltz! • Android at Yelp! • Now: Building

    experimentation systems • Previously: Full-Stack and data stuff at Eat24.
  2. • What are A/B tests? • Why run A/B tests?

    • What infrastructure is needed to A/B test? • What can you buy vs. build? • What are pitfalls to watch out for? Specifically
  3. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  4. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  5. Green: 50% 0 100 Red: 50% get_bucket(id + salt) %

    100 get_bucket(“my_exp163”) % 100 get_bucket(...419) % 100 get_bucket(...802) % 100
  6. Common Pitfalls Green: 100% 0 1 Green: 80% 0 1

    Red: 20% Green: 50% 0 1 Red: 50% 0 1 Red: 100% Time get_bucket(id + salt) get_bucket(“my_exp163”) get_bucket(...419) get_bucket(...802)
  7. Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id +

    salt) get_bucket(...261) get_bucket(...591) get_bucket(...812) Green: 90% 0 1 Yellow: 5% Red: 5% Green: 80% 0 1 Yellow: 10% Red: 10%
  8. Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id +

    salt) get_bucket(...311) get_bucket(...451) get_bucket(...723) Green: 50% 0 1 Red-Buffer (no-op): 20% Red: 5% Green: 50% 0 1 E: 5% Red: 25% Yellow: 25% Yellow-Buffer (no-op): 20% Yellow 5%
  9. Common Pitfalls cont... Status Quo: 50% get_bucket(id) Cohort A: 50%

    Experiment 1 Status Quo: 50% Cohort A: 50% Experiment 2 Cohort B 50% Status Quo: 50% Experiment 3 Cohort A 50%
  10. Option 1: Config File + Buckets Endpoint YAML File Backend

    Mobile App Request Cohorts w/ id Parse + Retrieve Cohorts
  11. Pros • Easy to implement • Easy to understand Cons

    • Cohorts may not load in time ◦ Some ways to work around this • Hard to run experiments at startup (i.e. onboarding) • Hard to handle complex conditions Option 1: Config File + Cohorts Endpoint
  12. Pros • Easy to implement on clients ◦ Just one

    call! • Handles complex conditions well Cons • Be mindful of parameter resolution speed • Many parameters if sent from the server Option 2: Parameter Experiments
  13. Traditional Bucketing • Optimizely • MixPanel • Apptimize Parameter Based

    • Firebase Remote Config • Planout SDKs Options to reuse
  14. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  15. EventManager Service 1 Service 5 Service 4 Service 3 Service

    2 search.bar.text.update Common metadata (e.g. timestamps) gets attached here
  16. EventManager Service 1 Service 5 Service 4 Service 3 Service

    2 search.bar.text.update Event-specific metadata (e.g search term) gets attached here.
  17. Product Manager Engineer We need to know when the user

    updated the search bar text Search for search.bar.text.update
  18. A possible workflow Machine Readable Documentation (YAML, Jsonschema, etc) Codegen

    Analytic Definitions (JavaPoet) Add/update docs Your App Publish to an internal Maven repo
  19. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  20. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Send (1) and (2)
  21. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Successful Response
  22. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Send (1) and (2)
  23. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)
  24. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Successful Response
  25. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)
  26. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend
  27. • How many analytics are you losing? • How important

    are the analytics you’re losing? • What’s the cost of gaining back those analytics? ◦ Mostly engineering time/complexity Things to Consider
  28. • Analytic channels • Structured flat files • JobManagers •

    SQLite Databases • Tape Queues Your building blocks
  29. • Flush analytics every 30s + 20 analytics • Always

    send analytics via JobManager • Flush analytics when app goes into the background ◦ Can be detected easily with architecture components! • It’s not perfect, but it works well enough What does Yelp do?
  30. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  31. What’s a funnel? Home 100k Users Order Start Order Complete

    Home 100k Users Home 100k Users Search Page 80k Users (80%)
  32. What’s a funnel? Home 100k Users Order Complete Home 100k

    Users Search Page 80k Users (80%) Order Start 10k Users (12.5%)
  33. What’s a funnel? Home 100k Users Search Page 80k Users

    (80%) Order Start 10k Users (12.5%) Order Complete 5k Users (50%)
  34. Option A: BigQuery Export Pros • Comes out of the

    box with Firebase • Allows you to join information to get answers Cons • Can’t be backfilled • Still limited by firebase • Probably need to write scripts in order to join your data
  35. Option B: Internal Metrics Pros • Maximum power + flexibility

    • Can backfill • Can execute arbitrary queries with just SQL Cons • You need to maintain the whole pipeline yourself • High cost to reach feature parity with 3rd party services
  36. • A/B testing helps build a data-driven culture, so even

    if you don’t do it perfectly, you still get many benefits • Know the basics of statistical experiments before you start A/B testing • Keep engineers involved in analytic definition + analysis 3 Things to Take Home
  37. Papers + General Reference on Experiments • Exp-Platform.com ◦ A/B

    Testing At Scale • Twitter Unified Logging • SOLID Analytics With RxJava Tools on Statistics • A Concise Guide to Statistics • Causal Inference In Statistics • Optimizely Sample Size Calculator Resources