Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[mercari GEARS 2025] Exploring LLM-Driven Forma...

Avatar for mercari mercari PRO
November 14, 2025

[mercari GEARS 2025] Exploring LLM-Driven Formal Verification for Robust Continuous Integration of Services

Avatar for mercari

mercari PRO

November 14, 2025
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. Product engineer (BE) since 2019
 R4D-supported Ph.d. student since 2022


    
 Graduate School of Mathematics, Nagoya University
 
 A newbie father since Jan 2025
 42
  2. The title has lots of buzzwords, but actually…
 LLM-Driven
 ≈


    Always using LLMs to automate boring, heavy-labor tasks
 Formal Verification
 ≈
 Check specifications logically with a machine
 Continuous Integration
 ≈
 Continuously check every new change
 44
  3. Once upon a time, there was an incident
 Our data

    storage of item attributes randomly dropped attributes
 It only occurred on production
 There were no logs or errors at all 
 Actual: random attributes are dropped silently
 System writes the change to the database
 No logs or errors occur
 Color
 Dark blue
 +Authentication
 Verified
 Color
 Dark blue
 Authentication
 Verified
 45
  4. Finding and fixing the issue took 180+ days
 
 It

    is a production-only bug.
 There were no errors and no logs when the drop happened.
 Finding issues is costly, and each time only a few items were impacted.
 The issue was resolved after the incident was identified.
 46
  5. Bug detection from 180 days to 1 day
 48 Setup


    1. Check out the commit before the bug fix
 2. No hint
 3. Give symptoms
 4. Give possible files
 5. Ask the LLM service to build the formal spec to reveal the bug

  6. Why formal models (specs)?
 
 No logs or errors occur

    Code tests can’t identify specification logic issues
 Formal verification ≈ checking specifications logically by machine
 Natural-language spec
 (ambiguous)
 Formal spec
 (TLA+)
 TLC model checker
 Invariants (requirements) hold
 Bug found
 Implementation code
 ≈ state machine
 49
  7. Reality
 Natural-language spec
 (ambiguous)
 Formal spec
 (TLA+)
 TLC model checker


    Invariants (requirements) hold
 Bug found
 Implementation code
 • No time to learn new concepts
 • No time to apply the knowledge
 • Duplicated spec writing
 • Extra work before coding • Delivery time gets longer • Code may be buggy still • Tests are still required 51
  8. Therefore: Coding first, then verification in CI Flow Natural-language spec


    (ambiguous)
 Formal spec
 (TLA+)
 ModelFuzz*
 Model-guided fuzz test
 Invariants (requirements) hold
 Bug found
 Implementation code
 Git commit
 JIRA ticket
 Prompts
 To LLM service
 • Spec per change
 • Less duplication work
 Coverage for every change guaranteed by fuzz test
 * Gulcan, E. B., Ozkan, B. K., Majumdar, R., & Nagendra, S. (2025). Model-guided Fuzzing of Distributed Systems.
 https://arxiv.org/abs/2410.02307
 52
  9. Current progress
 For Bugs
 A TLA+ spec for the silent

    attribute dropping bug shows the invariant violation, fully by a LLM service within one day
 
 For Features
 Another TLA+ spec also fully by a LLM service for a API flow change to prove its integrity
 
 
 Next Step
 Complete the ModelFuzz setup and run it for the whole attributes data store service to showcase the method
 54
  10. Takeaway 55 🧩 Formal verification isn’t just theory ⚙ LLM

    + TLA+ = Faster, logical debugging 🧠 Automation can make correctness part of the workflow 🚀 Code first, verify continuously 🌱 Next step: make “robust CI” the default.