$30 off During Our Annual Pro Sale. View Details »

A/B Testing From The Ground Up (Nov 2017)

maltzj
November 06, 2017

A/B Testing From The Ground Up (Nov 2017)

maltzj

November 06, 2017
Tweet

More Decks by maltzj

Other Decks in Programming

Transcript

  1. Jonathan Maltz
    [email protected]/@maltzj
    A/B Testing From The Ground Up

    View Slide

  2. Yelp’s Mission
    Connecting people with great
    local businesses.

    View Slide

  3. Hi! I’m Maltz!
    ● Android at Yelp!
    ● Now: Building
    experimentation systems
    ● Previously: Full-Stack and
    data stuff at Eat24.

    View Slide

  4. What Will We Talk
    About Today?

    View Slide

  5. A/B Testing

    View Slide

  6. ● What are A/B tests?
    ● Why run A/B tests?
    ● What infrastructure is needed to A/B test?
    ● What can you buy vs. build?
    ● What are pitfalls to watch out for?
    Specifically

    View Slide

  7. But First...

    View Slide

  8. Why A/B Test?

    View Slide

  9. Click me!
    Time
    Clicks

    View Slide

  10. Click me!
    Time
    Clicks

    View Slide

  11. We should keep the
    button Red!

    View Slide

  12. But wait...

    View Slide

  13. Click me!
    Time
    Clicks

    View Slide

  14. View Slide

  15. Click me! Click me!
    50% of Users
    Vs.
    50% of Users

    View Slide

  16. View Slide

  17. Inform The Hippo!

    View Slide

  18. Inform The Hippo!
    (Highest Paid Person’s Opinion)

    View Slide

  19. View Slide

  20. Bucketing
    System
    Your App
    Metrics Collection
    Pipeline (on device)
    Metrics Service Metrics Service Metrics Service
    Sends Allocation
    Information
    Log on-device
    events
    Forward to
    metrics service

    View Slide

  21. Bucketing
    System
    Your App
    Metrics Collection
    Pipeline (on device)
    Metrics Service Metrics Service Metrics Service
    Sends Allocation
    Information
    Log on-device
    events
    Forward to
    metrics service

    View Slide

  22. Bucketing System

    View Slide

  23. At a high level
    Identifier Bucket
    Randomization
    function (probably
    a hash)

    View Slide

  24. Green: 50%
    0 100
    Red: 50%
    get_bucket(id + salt) %
    100
    get_bucket(“my_exp163”) %
    100
    get_bucket(...419) %
    100
    get_bucket(...802) %
    100

    View Slide

  25. View Slide

  26. Common Pitfalls
    Green: 100%
    0 1
    Green: 80%
    0 1
    Red: 20%
    Green: 50%
    0 1
    Red: 50%
    0 1
    Red: 100%
    Time
    get_bucket(id + salt) get_bucket(“my_exp163”) get_bucket(...419)
    get_bucket(...802)

    View Slide

  27. Common Pitfalls cont...
    Green: 100%
    0 1
    Time
    get_bucket(id + salt) get_bucket(...261) get_bucket(...591)
    get_bucket(...812)
    Green: 90%
    0 1
    Yellow:
    5%
    Red:
    5%
    Green: 80%
    0 1
    Yellow: 10%
    Red: 10%

    View Slide

  28. Common Pitfalls cont...
    Green: 100%
    0 1
    Time
    get_bucket(id + salt) get_bucket(...311) get_bucket(...451)
    get_bucket(...723)
    Green: 50%
    0 1
    Red-Buffer (no-op): 20%
    Red:
    5%
    Green: 50%
    0 1
    E: 5%
    Red: 25%
    Yellow: 25%
    Yellow-Buffer (no-op): 20%
    Yellow
    5%

    View Slide

  29. Don’t Forget The Salt!

    View Slide

  30. Common Pitfalls cont...
    Status Quo: 50%
    get_bucket(id)
    Cohort A: 50%
    Experiment 1
    Status Quo: 50% Cohort A: 50%
    Experiment 2
    Cohort B 50%
    Status Quo: 50%
    Experiment 3 Cohort A 50%

    View Slide

  31. getBucket(
    id + exp_name)

    View Slide

  32. How To Implement It?

    View Slide

  33. Option 1: Config File +
    Buckets Endpoint

    View Slide

  34. Option 1: Config File + Buckets Endpoint
    YAML File
    Backend
    Mobile App

    View Slide

  35. Option 1: Config File + Buckets Endpoint
    YAML File
    Backend
    Mobile App
    Request Cohorts
    w/ id

    View Slide

  36. Option 1: Config File + Buckets Endpoint
    YAML File
    Backend
    Mobile App
    Request Cohorts
    w/ id
    Parse + Retrieve
    Cohorts

    View Slide

  37. View Slide

  38. Pros
    ● Easy to implement
    ● Easy to understand
    Cons
    ● Cohorts may not load in time
    ○ Some ways to work around this
    ● Hard to run experiments at startup (i.e. onboarding)
    ● Hard to handle complex conditions
    Option 1: Config File + Cohorts Endpoint

    View Slide

  39. Option 2: Parameter
    Experiments

    View Slide

  40. Option 2: Parameter Experiments
    Configuration
    Experimentation
    System
    Mobile App

    View Slide

  41. Option 2: Parameter Experiments
    Configuration
    Experimentation
    System
    Mobile App
    Request
    parameter

    View Slide

  42. Option 2: Parameter Experiments
    Configuration
    Experimentation
    System
    Mobile App
    Request
    parameter
    Parse + Retrieve
    param value

    View Slide

  43. View Slide

  44. Pros
    ● Easy to implement on clients
    ○ Just one call!
    ● Handles complex conditions well
    Cons
    ● Be mindful of parameter resolution speed
    ● Many parameters if sent from the server
    Option 2: Parameter Experiments

    View Slide

  45. ● Defaults
    ● Override capability
    ● Whitelisting
    ○ Employees only
    Other stuff you’ll want

    View Slide

  46. Traditional Bucketing
    ● Optimizely
    ● MixPanel
    ● Apptimize
    Parameter Based
    ● Firebase Remote Config
    ● Planout SDKs
    Options to reuse

    View Slide

  47. On-Device Logging

    View Slide

  48. Bucketing
    System
    Your App
    Metrics Collection
    Pipeline (on device)
    Metrics Service Metrics Service Metrics Service
    Sends Allocation
    Information
    Log on-device
    events
    Forward to
    metrics service

    View Slide

  49. View Slide

  50. Main Principles

    View Slide

  51. 1. Define Domain Events
    + Delegate Formatting

    View Slide

  52. EventManager
    Service 1 Service 5
    Service 4
    Service 3
    Service 2
    search.bar.text.update

    View Slide

  53. EventManager
    Service 1 Service 5
    Service 4
    Service 3
    Service 2
    search.bar.text.update
    Common metadata (e.g.
    timestamps) gets attached
    here

    View Slide

  54. EventManager
    Service 1 Service 5
    Service 4
    Service 3
    Service 2
    search.bar.text.update
    Event-specific metadata
    (e.g search term) gets
    attached here.

    View Slide

  55. 2. Developers Own
    Analytic Definitions

    View Slide

  56. Product Manager Engineer
    Let’s track this as
    search_bar_text_update

    View Slide

  57. Product Manager Engineer
    Okay!

    View Slide

  58. Product Manager Engineer
    Let’s track this as
    search_bar_text_update
    Okay!

    View Slide

  59. Product Manager Engineer
    We need to know when
    the user updated the
    search bar text

    View Slide

  60. Product Manager Engineer
    We need to know when
    the user updated the
    search bar text
    Search for
    search.bar.text.update

    View Slide

  61. 3. Keep Documentation
    Close To Code

    View Slide

  62. A possible workflow
    Machine Readable
    Documentation
    (YAML, Jsonschema,
    etc)
    Codegen
    Analytic
    Definitions
    (JavaPoet)
    Add/update
    docs
    Your App
    Publish to an
    internal Maven repo

    View Slide

  63. ● Segment
    ● mParticle
    Options to reuse

    View Slide

  64. Event Deliverability

    View Slide

  65. Bucketing
    System
    Your App
    Metrics Collection
    Pipeline (on device)
    Metrics Service Metrics Service Metrics Service
    Sends Allocation
    Information
    Log on-device
    events
    Forward to
    metrics service

    View Slide

  66. Mainly A Problem If You
    Want Internal Collection

    View Slide

  67. Mainly A Problem If You
    Want Internal Collection
    (You’ll want that eventually though)

    View Slide

  68. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Your
    Backend

    View Slide

  69. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Your
    Backend
    Send (1) and (2)

    View Slide

  70. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Your
    Backend
    Successful Response

    View Slide

  71. At a High Level
    Your App
    Analytics Queue
    Your
    Backend

    View Slide

  72. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Your
    Backend

    View Slide

  73. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Your
    Backend
    Send (1) and (2)

    View Slide

  74. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Business.view (3)
    Your
    Backend
    Send (1) and (2)

    View Slide

  75. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Business.view (3)
    Your
    Backend
    Successful Response

    View Slide

  76. At a High Level
    Your App
    Analytics Queue
    Business.view (3)
    Your
    Backend

    View Slide

  77. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Business.view (3)
    Your
    Backend
    Send (1) and (2)

    View Slide

  78. At a High Level
    Your App
    Analytics Queue
    Search.list.view (1)
    Search.item.click.1 (2)
    Business.view (3)
    Your
    Backend

    View Slide

  79. View Slide

  80. It’s All About Tradeoffs

    View Slide

  81. ● How many analytics are you losing?
    ● How important are the analytics you’re losing?
    ● What’s the cost of gaining back those analytics?
    ○ Mostly engineering time/complexity
    Things to Consider

    View Slide

  82. ● Analytic channels
    ● Structured flat files
    ● JobManagers
    ● SQLite Databases
    ● Tape Queues
    Your building blocks

    View Slide

  83. Example!
    Event Manager
    Search.list.view

    View Slide

  84. Example!
    Event Manager
    Write to Queue
    Search.list.view
    Tape Queue

    View Slide

  85. Example!
    Event Manager
    Is over flush threshold?
    Search.list.view
    Tape Queue

    View Slide

  86. Example!
    Event Manager Tape Queue
    Copy + Clear
    Contents
    Search.list.view
    JobManager

    View Slide

  87. Example!
    Event Manager
    Create Persistent
    Job with contents
    Search.list.view
    Tape Queue
    JobManager

    View Slide

  88. ● Flush analytics every 30s + 20 analytics
    ● Always send analytics via JobManager
    ● Flush analytics when app goes into the background
    ○ Can be detected easily with architecture components!
    ● It’s not perfect, but it works well enough
    What does Yelp do?

    View Slide

  89. Analysis

    View Slide

  90. Bucketing
    System
    Your App
    Metrics Collection
    Pipeline (on device)
    Metrics Service Metrics Service Metrics Service
    Sends Allocation
    Information
    Log on-device
    events
    Forward to
    metrics service

    View Slide

  91. Two Main Questions

    View Slide

  92. 1. Does Funnel
    Conversion Increase?

    View Slide

  93. What’s a funnel?
    Home
    Search Page
    Order Start
    Order Complete

    View Slide

  94. What’s a funnel?
    Home
    100k Users
    Search Page
    Order Start
    Order Complete

    View Slide

  95. What’s a funnel?
    Home
    100k Users
    Order Start
    Order Complete
    Home
    100k Users
    Home
    100k Users
    Search Page
    80k Users (80%)

    View Slide

  96. What’s a funnel?
    Home
    100k Users
    Order Complete
    Home
    100k Users
    Search Page
    80k Users (80%)
    Order Start
    10k Users (12.5%)

    View Slide

  97. What’s a funnel?
    Home
    100k Users
    Search Page
    80k Users (80%)
    Order Start
    10k Users (12.5%)
    Order Complete
    5k Users (50%)

    View Slide

  98. View Slide

  99. 2. Do Users
    (click/order/watch)
    More?

    View Slide

  100. Lots of options here

    View Slide

  101. Looking for a decent catch-all?

    View Slide

  102. Eventually You’ll Need
    To Join This Data

    View Slide

  103. Option A: BigQuery Export

    View Slide

  104. Option A: BigQuery Export
    Pros
    ● Comes out of the box with Firebase
    ● Allows you to join information to get answers
    Cons
    ● Can’t be backfilled
    ● Still limited by firebase
    ● Probably need to write scripts in order to join your data

    View Slide

  105. Option B: Internal Metrics

    View Slide

  106. Option B: Internal Metrics
    Pros
    ● Maximum power + flexibility
    ● Can backfill
    ● Can execute arbitrary queries with just SQL
    Cons
    ● You need to maintain the whole pipeline yourself
    ● High cost to reach feature parity with 3rd party services

    View Slide

  107. ● A/B testing helps build a data-driven culture, so even if
    you don’t do it perfectly, you still get many benefits
    ● Know the basics of statistical experiments before you
    start A/B testing
    ● Keep engineers involved in analytic definition + analysis
    3 Things to Take Home

    View Slide

  108. Papers + General Reference on Experiments
    ● Exp-Platform.com
    ○ A/B Testing At Scale
    ● Twitter Unified Logging
    ● SOLID Analytics With RxJava
    Tools on Statistics
    ● A Concise Guide to Statistics
    ● Causal Inference In Statistics
    ● Optimizely Sample Size Calculator
    Resources

    View Slide

  109. Thanks!
    ● My email - [email protected]
    ● My website - maltzj.com
    ● My twitter - @maltzj

    View Slide

  110. www.yelp.com/careers/
    We're Hiring!

    View Slide

  111. Questions?

    View Slide

  112. @YelpEngineering
    fb.com/YelpEngineers
    engineeringblog.yelp.com
    github.com/yelp

    View Slide