Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[mercari GEARS 2025] Running 1000 End-To-End We...

Avatar for mercari mercari PRO
November 14, 2025

[mercari GEARS 2025] Running 1000 End-To-End Web Tests Daily

Avatar for mercari

mercari PRO

November 14, 2025
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. Talk Contents
 • End-to-End tests • Test speed • Test

    tags • Picking tests to run • AI for end-to-end testing
  2. Gleb Bahmutov
 
 
 Gleb Bahmutov is a JavaScript ninja,

    image processing expert, and software quality fanatic.
 • https://gleb.dev
 • https://github.com/bahmutov
 • https://slides.com/bahmutov
 • https://youtube.com/@gleb
 Sr Director of Engineering

  3. Testing
 Unit
 Focus on the code
 Fast
 API
 Focus on

    computer interfaces
 E2E
 Focus on the human user
 Can be fast

  4. • Use API to set the state (create seller, item,

    etc)
 • Cache data
 • Intelligent waiting (few hardcoded waits)
 Individual Test Speed 🐢 🏎
 Making each test faster
  5. Test Speed 🐢 🏎
 • 1 E2E Cypress test =

    1 second to 4 minutes • 1000 E2E Cypress tests = TOO SLOW TO RUN ON EACH COMMIT • Run all tests every 8 hours
  6. Feature-specific testing
 • “I am working on items search, what

    tests do we have?”
 • “We received a bug report, is the search service working correctly?”
 • “I have opened a PR for the search service, I need to test it”

  7. Test Tags
 Feature Tags
 • @search
 • @sell
 • @offer


    • …
 Group tags
 • @sanity • @regression • @mobile • …
  8. Pick Tests To Run on Pull Request (Testing Repo)
 Runs

    a few tests across all features
 And all tests for the given tag(s)

  9. Pick Tests To Run on Pull Request (Web Repo)
 Dev

    can run E2E tests using tags
 115 specs have tests tagged @sanity or @profile

  10. Main API and service repos can test PRs using the

    “/cypress” comment
 Pick Tests To Run on Pull Request (Any Repo)

  11. Flexibility And Power • Pick tests using test tags: /cypress

    tags=@sanity
 • Pick tests visiting specific page: /cypress url=/mypage/purchase/in_progress/
 • Pick tests calling specific API: /cypress graphql=newLister
 

  12. Pick Tests Using Changed Source Code (Web Repo)
 1 spec

    that exercises the elements with test ids “MenuItems” and “MenuItemsComp”
 test ids

  13. AI suggests test tags based on the PR title and

    body text vs test tag descriptions
 Picking Test Tags Using AI

  14. When Someone Opens An Issue Tagged “bug” 🐞
 Picked tests


    to run based
 on the bug title and description

  15. copilot-instructions.md file
 When performing a code review:
 
 - confirm

    that there are no hard-coded magic numbers.
 Prefer using named constants.
 - do not allow unreachable code
 - check each HTML element that shows any unique application data,
 like prices, values, names, address, etc to have a `data-testid`
 attribute to be used in end-to-end tests. If the attribute is missing,
 add a `data-testid` attribute with a meaningful value.
 Also add `data-testid` attributes to the top level forms, pages,
 large components.

  16. Using AI for QA: What we learned so far
 Generation


    • Sometimes works for smaller inline coding
 • Hard to express all application knowledge as context for each prompt
 • Long waiting loops
 • Human review is hard

  17. Using AI for QA: What we learned so far
 Generation


    • Sometimes works for smaller inline coding
 • Hard to express all application knowledge as context for each prompt
 • Long waiting loops
 • Human review is hard
 Review
 • AI can help debug simple problems
 • Works well explaining and fixing edge cases with specific tools and languages
 • Copilot reviews work well

  18. Using AI for QA: What we learned so far
 Generation


    • Sometimes works for smaller inline coding
 • Hard to express all application knowledge as context for each prompt
 • Long waiting loops
 • Human review is hard
 Review
 • AI can help debug simple problems
 • Works well explaining and fixing edge cases with specific tools and languages
 • Copilot reviews work well
 “Picking”
 • Works really well for “pick the text most similar to X”, example is picking the appropriate test tags

  19. Conclusions
 
 • End-to-end testing is extremely important
 • Do

    not run ALL the tests ALL the time
 • Let AI help