Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trailblaze - Droidcon NYC 2025

Trailblaze - Droidcon NYC 2025

Presented at Droidcon NYC in June 2025.

https://github.com/block/trailblaze

Avatar for Sam Edwards

Sam Edwards

July 22, 2025
Tweet

More Decks by Sam Edwards

Other Decks in Technology

Transcript

  1. AI Driven Mobile Testing Blazing the Trail at Block Brian

    Gardner, Sam Edwards Droidcon NYC - June 25, 2025
  2. Given instructions in natural language the AI verifies the expected

    app behavior What does “AI Drivenˮ mean?
  3. Agent loop // For every instruction do { val request

    = requestFor(task) val response = askLLM(request) handleToolCalls(response, task) } while (!task.isFinished())
  4. LLM Request Model Chat messages Tools Provide tools the LLM

    can “useˮ • Tap • Input text • Erase text • Scroll
  5. LLM Request Model Chat messages Tools Provide tools the LLM

    can “useˮ • Tap • Input text • Erase text • Scroll • Halt
  6. LLM Request System Prompt Instructions History Screen state You are

    an assistant managing the users android phone that they do not have access to. You will autonomously complete complex tasks on the phone and report back when done. You will be provided with high-level instructions…
  7. LLM Request System Prompt Instructions History Screen state The LLM

    does not remember anything between messages. Each response saves: • Tool call used • Tool result
  8. LLM Request System Prompt Instructions History Screen state Provide the

    Screen State • View Hierarchy • Screenshot
  9. LLM Request System Prompt Instructions History Screen state Provide the

    current screen state • View Hierarchy • Screenshot
  10. Maestro • Maestro - Host Mode ◦ Android ◦ iOS

    ◦ Web • Maestro - On-Device ◦ 🤔
  11. On-Device Android Maestro Driver Maestro needs a hosted test environment

    • Machine runs the test and sends commands to device Created an on-device version for Android • Lets us run on Test Farm
  12. 1st Attempt - Manual Tools private fun tapOnElement(args: JsonObject) {

    val text = args.getValue("text").jsonPrimitive.content onView(withText(text)).perform(click()) } • Espresso, ADB
  13. 2nd Attempt - Typesafe Tools data class TapOnElement( val text:

    String, val longPress: Boolean, ) : TrailblazeTool
  14. 3rd Impl - Maestro Directly as Tools • Give the

    Maestro API Surface to the LLM as a Tool data class TapOnElementCommand( val selector: ElementSelector, val retryIfNoChange: Boolean? = null, val waitUntilVisible: Boolean? = null, val longPress: Boolean? = null, val repeat: TapRepeat? = null, val waitToSettleTimeoutMs: Int? = null, override val label: String? = null, override val optional: Boolean = false, ) : maestro.orchestra.Command
  15. Final - Typesafe Tools, Maestro data class TapOnElementWithTextTrailblazeTool( val text:

    String, val longPress: Boolean = false, ) : MapsToMaestroCommands() { override fun toMaestroCommands(): List<Command> = listOf( TapOnElementCommand( selector = ElementSelector( textRegex = text, ), longPress = longPress, ), ) }
  16. Final - Typesafe Tools, Maestro data class TapOnElementWithTextTrailblazeTool( val text:

    String, val longPress: Boolean = false, ) : MapsToMaestroCommands() { override fun toMaestroCommands(): List<Command> = listOf( TapOnElementCommand( selector = ElementSelector( textRegex = text, ), longPress = longPress, ), ) }
  17. Agnostic, Typesafe Tools • TrailblazeTool ◦ ExecutableTrailblazeTool ◦ DelegatingTrailblazeTool •

    TrailblazeExecutionContext ◦ TrailblazeAgent ◦ ScreenState ◦ LLM Request ID
  18. Challenges - Debugging • Low memory/CPU footprint • Avoid on-device

    storage & disk writes • Reports locally and on CI • Platform agnostic
  19. Challenges - No Tool Calls Message If no tool call

    is returned, just force one on the next chat request ToolChoice.mode(“requiredˮ)
  20. Challenges - LLM Picks Wrong Views • Tap on X,Y

    works GREAT for device control ◦ X,Y does not scale for recordings • Tap on text or ID ◦ Can be ambiguous ◦ Repeated text on screen ◦ Icons with no accessibility text
  21. Set of Mark Prompting • Draw bounding boxes around each

    interactable element • Filter out the elements that arenʼt on top or visible on screen
  22. Test Definition Kotlin Code Recordings Agent + LLM 🤖 Tool

    Calls Prompts, MCP Trailblaze Tools Trailblaze Tools Trailblaze Architecture
  23. Test Definition Kotlin Code Recordings Agent + LLM 🤖 Tool

    Calls Prompts, MCP Maestro Command Models Your App Specific Code Trailblaze Tools Trailblaze Tools Trailblaze Architecture
  24. Executing Device Maestro UI Driver Interface Test Definition Kotlin Code

    Recordings Agent + LLM 🤖 Tool Calls Prompts, MCP UI Interaction Trailblaze On-Device Android Maestro Driver Maestro Command Models Platform Specific Host Maestro Drivers Your App Specific Code Trailblaze Tools Trailblaze Tools Platform APIs Screen State Screenshot + View Hierarchy Trailblaze Architecture
  25. Executing Device Maestro UI Driver Interface Test Definition Kotlin Code

    Recordings Agent + LLM 🤖 Tool Calls Prompts, MCP Test Results and Logging Debug Logging Test Farm - CI/CD UI Interaction Trailblaze On-Device Android Maestro Driver Maestro Command Models Platform Specific Host Maestro Drivers Your App Specific Code Trailblaze Tools Trailblaze Tools Platform APIs Screen State Screenshot + View Hierarchy Trailblaze Architecture
  26. Maestro UI Driver Interface Test Definition Kotlin Code Recordings Agent

    + LLM 🤖 Tool Calls Prompts, MCP Test Results and Logging Debug Logging Test Farm - CI/CD Maestro Command Models Your App Specific Code Trailblaze Tools Trailblaze Tools Screen State Screenshot + View Hierarchy Trailblaze Architecture
  27. AI driven testing is a paradigm shift in mobile E2E

    testing • Reduction in manual testing effort • More resilient against UI changes Novel test cases uncover new requirements Adoption
  28. Test Authoring vs Execution • Test Setup ◦ Test Account

    Provisioning ◦ Authentication ◦ Feature Flags • Custom App Interactions ◦ Draw Signature ◦ Connect Virtual Card Reader
  29. Open source foundation MCP authoring improvements Beyond Android • iOS

    and Web support? • Screen state and device drivers are a little different • Reuse the same prompts across platforms Investment
  30. Engage/Contribute to Open Source • https://github.com/block/trailblaze Try it out in

    your current test environment • Incremental adoption is key Show us your AI tools! Call To Action