[mercari GEARS 2025] Running 1000 End-To-End Web Tests Daily

Running 1000 End-To-End Web Tests Daily  Gleb Bahmutov  Mercari US
/ Sr Director of Engineering 

Talk Contents  • End-to-End tests • Test speed • Test
tags • Picking tests to run • AI for end-to-end testing

Gleb Bahmutov      Gleb Bahmutov is a JavaScript ninja,
image processing expert, and software quality fanatic.  • https://gleb.dev  • https://github.com/bahmutov  • https://slides.com/bahmutov  • https://youtube.com/@gleb  Sr Director of Engineering 

Mercari.com 

Example Mercari web E2E test 📺

Testing  Unit  Focus on the code  Fast  API  Focus on
computer interfaces  E2E  Focus on the human user  Can be fast 

E2E Test Size At Mercari.com 

Writing E2E Cypress Tests With Copilot’s Help  GitHub Copilot inline
code suggestion 👍👍👍

Example Cypress E2E Test  test tags page objects accessing page
elements

• Use API to set the state (create seller, item,
etc)  • Cache data  • Intelligent waiting (few hardcoded waits)  Individual Test Speed 🐢 🏎  Making each test faster

Test Speed 🐢 🏎  • 1 E2E Cypress test =
1 second to 4 minutes • 1000 E2E Cypress tests = TOO SLOW TO RUN ON EACH COMMIT • Run all tests every 8 hours

Test Speed 🐢 🏎  39 min across   20 machines
using Cypress Cloud 

Test Speed 🐢 🏎  Vary the number of machines 

Do We Need To Run All The Tests All The
Time?

Feature-speciﬁc testing  • “I am working on items search, what
tests do we have?”  • “We received a bug report, is the search service working correctly?”  • “I have opened a PR for the search service, I need to test it” 

Test Tags 

Test Tags  Feature Tags  • @search  • @sell  • @oﬀer 
• …  Group tags  • @sanity • @regression • @mobile • …

Test Tags 

Pick Tests To Run on Pull Request (Testing Repo)  Runs
a few tests across all features  And all tests for the given tag(s) 

Pick Tests To Run on Pull Request (Web Repo)  Dev
opens a web pull request 

Pick Tests To Run on Pull Request (Web Repo)  Preview
environment is deployed 

Pick Tests To Run on Pull Request (Web Repo)  Dev
can run E2E tests using tags  115 specs have tests tagged @sanity or @proﬁle 

Main API and service repos can test PRs using the
“/cypress” comment  Pick Tests To Run on Pull Request (Any Repo) 

Flexibility And Power • Pick tests using test tags: /cypress
tags=@sanity  • Pick tests visiting speciﬁc page: /cypress url=/mypage/purchase/in_progress/  • Pick tests calling speciﬁc API: /cypress graphql=newLister   

Even Better: Target The Web Tests To Changes 🎯 

<div data-testid=”Greeting”>  … // code changes  </div>  Web Pull Request 
the changed frontend source ﬁle 

E2E specs that use “Greeting” test id  cy.byTestId(‘Greeting’)  .should(‘be.visible’)  … 
new-user.cy.js  cy.byTestId(‘Greeting’)  …  home.cy.js 

Pick Tests Using Changed Source Code (Web Repo)  1 spec
that exercises the elements with test ids “MenuItems” and “MenuItemsComp”  test ids 

Which Test Tag(s)? 

Picking Test Tags Using AI  Which E2E tests should we
run for this service PR? 

AI suggests test tags based on the PR title and
body text vs test tag descriptions  Picking Test Tags Using AI 

Picking Test Tags Using AI  Conﬁdence with each picked test
tag 

Bonus: Automated Testing  for GitHub Bug Reports 

When Someone Opens An Issue Tagged “bug” 🐞  Picked tests 
to run based  on the bug title and description 

When Someone Opens An Issue Tagged “bug” 🐞  Completed test
run status: all current tests ✅ 

GitHub Copilot PR Reviews 👀 

copilot-instructions.md ﬁle  When performing a code review:    - conﬁrm
that there are no hard-coded magic numbers.  Prefer using named constants.  - do not allow unreachable code  - check each HTML element that shows any unique application data,  like prices, values, names, address, etc to have a `data-testid`  attribute to be used in end-to-end tests. If the attribute is missing,  add a `data-testid` attribute with a meaningful value.  Also add `data-testid` attributes to the top level forms, pages,  large components. 

Copilot review can detect page elements without “data-testid” attributes and
even suggest good attribute names 

Using AI for QA: What we learned so far  Generation 
• Sometimes works for smaller inline coding  • Hard to express all application knowledge as context for each prompt  • Long waiting loops  • Human review is hard 

• Sometimes works for smaller inline coding  • Hard to express all application knowledge as context for each prompt  • Long waiting loops  • Human review is hard  Review  • AI can help debug simple problems  • Works well explaining and ﬁxing edge cases with speciﬁc tools and languages  • Copilot reviews work well 

• Sometimes works for smaller inline coding  • Hard to express all application knowledge as context for each prompt  • Long waiting loops  • Human review is hard  Review  • AI can help debug simple problems  • Works well explaining and ﬁxing edge cases with speciﬁc tools and languages  • Copilot reviews work well  “Picking”  • Works really well for “pick the text most similar to X”, example is picking the appropriate test tags 

Conclusions    • End-to-end testing is extremely important  • Do
not run ALL the tests ALL the time  • Let AI help 

Thank You!  Gleb Bahmutov  Mercari US / Sr Director of
Engineering  gleb.dev 

[mercari GEARS 2025] Running 1000 End-To-End We...

[mercari GEARS 2025] Running 1000 End-To-End Web Tests Daily

More Decks by mercari

Other Decks in Technology

Featured

Transcript