Trailblaze - Droidcon NYC 2025

AI Driven Mobile Testing Blazing the Trail at Block Brian
Gardner, Sam Edwards Droidcon NYC - June 25, 2025

Agenda • Intro and How it Started • Productionizing •
Future Plans

Intro 3

github.com/block/trailblaze

Open Source Demo

Droidcon NYC 2024 6

DragonCrawl April 2024

DragonCrawl September 2024

The way things were 10

Automated Testing • Espresso • UiAutomator • ComposeTestRule Manual testing
The way things were

What does “AI Drivenˮ mean?

Given instructions in natural language the AI verifies the expected
app behavior What does “AI Drivenˮ mean?

Start small 14

LLM Request Model Chat messages Tools

LLM response Message Tool calls

Agent loop // For every instruction do { val request
= requestFor(task) val response = askLLM(request) handleToolCalls(response, task) } while (!task.isFinished())

LLM Request Model Chat messages Tools GPT 4o-mini

LLM Request Model Chat messages Tools Provide tools the LLM
can “useˮ • Tap • Input text • Erase text • Scroll

LLM Request Model Chat messages Tools Provide tools the LLM
can “useˮ • Tap • Input text • Erase text • Scroll • Halt

LLM Request Model Chat messages Tools System Prompt Instructions History
Screen state

LLM Request System Prompt Instructions History Screen state You are
an assistant managing the users android phone that they do not have access to. You will autonomously complete complex tasks on the phone and report back when done. You will be provided with high-level instructions…

LLM Request System Prompt Instructions History Screen state Objectives for
the current test

LLM Request System Prompt Instructions History Screen state The LLM
does not remember anything between messages. Each response saves: • Tool call used • Tool result

LLM Request System Prompt Instructions History Screen state Provide the
Screen State • View Hierarchy • Screenshot

LLM Request System Prompt Instructions History Screen state Provide the
current screen state • View Hierarchy • Screenshot

ai-test-agent November 2024

🧭 Trailblaze March 2025

Trailblaze June 2025

Productionizing Beyond the Basics 30

How do we… • Scale? • Debug? • Record?

Scaling - CI/CD • Reusing Existing Test Farm ◦ Scaling
◦ Support

Scaling - UI Interactions • Espresso • ADB • UiAutomator
• Maestro

https://docs.maestro.dev/api-reference/commands

Maestro • Maestro - Host Mode ◦ Android ◦ iOS
◦ Web • Maestro - On-Device ◦ 🤔

Deep Dive https://handstandsam.com/2024/11/18/demystifying-maestros-ui-testing-implementation/

On-Device Android Maestro Driver Maestro needs a hosted test environment
• Machine runs the test and sends commands to device Created an on-device version for Android • Lets us run on Test Farm

1st Attempt - Manual Tools private fun tapOnElement(args: JsonObject) {
val text = args.getValue("text").jsonPrimitive.content onView(withText(text)).perform(click()) } • Espresso, ADB

2nd Attempt - Typesafe Tools data class TapOnElement( val text:
String, val longPress: Boolean, ) : TrailblazeTool

3rd Impl - Maestro Directly as Tools • Give the
Maestro API Surface to the LLM as a Tool data class TapOnElementCommand( val selector: ElementSelector, val retryIfNoChange: Boolean? = null, val waitUntilVisible: Boolean? = null, val longPress: Boolean? = null, val repeat: TapRepeat? = null, val waitToSettleTimeoutMs: Int? = null, override val label: String? = null, override val optional: Boolean = false, ) : maestro.orchestra.Command

Final - Typesafe Tools, Maestro data class TapOnElementWithTextTrailblazeTool( val text:
String, val longPress: Boolean = false, ) : MapsToMaestroCommands() { override fun toMaestroCommands(): List<Command> = listOf( TapOnElementCommand( selector = ElementSelector( textRegex = text, ), longPress = longPress, ), ) }

Agnostic, Typesafe Tools • TrailblazeTool ◦ ExecutableTrailblazeTool ◦ DelegatingTrailblazeTool •
TrailblazeExecutionContext ◦ TrailblazeAgent ◦ ScreenState ◦ LLM Request ID

Challenges 44

Challenges • Debugging • No Tool Calls • LLM Picks
Wrong Views

Challenges - Debugging • Low memory/CPU footprint • Avoid on-device
storage & disk writes • Reports locally and on CI • Platform agnostic

Challenges - No Tool Calls Message If no tool call
is returned, just force one on the next chat request ToolChoice.mode(“requiredˮ)

Challenges - LLM Picks Wrong Views • Tap on X,Y
works GREAT for device control ◦ X,Y does not scale for recordings • Tap on text or ID ◦ Can be ambiguous ◦ Repeated text on screen ◦ Icons with no accessibility text

Set of Mark Prompting https://github.com/microsoft/SoM

Set of Mark Prompting • Draw bounding boxes around each
interactable element • Filter out the elements that arenʼt on top or visible on screen

Trailblaze Architecture 52

Test Definition Kotlin Code Recordings Prompts, MCP Trailblaze Architecture

Test Definition Kotlin Code Recordings Agent + LLM 🤖 Tool
Calls Prompts, MCP Trailblaze Tools Trailblaze Tools Trailblaze Architecture

Test Definition Kotlin Code Recordings Agent + LLM 🤖 Tool
Calls Prompts, MCP Maestro Command Models Your App Specific Code Trailblaze Tools Trailblaze Tools Trailblaze Architecture

Executing Device Maestro UI Driver Interface Test Definition Kotlin Code
Recordings Agent + LLM 🤖 Tool Calls Prompts, MCP UI Interaction Trailblaze On-Device Android Maestro Driver Maestro Command Models Platform Specific Host Maestro Drivers Your App Specific Code Trailblaze Tools Trailblaze Tools Platform APIs Screen State Screenshot + View Hierarchy Trailblaze Architecture

Executing Device Maestro UI Driver Interface Test Definition Kotlin Code
Recordings Agent + LLM 🤖 Tool Calls Prompts, MCP Test Results and Logging Debug Logging Test Farm - CI/CD UI Interaction Trailblaze On-Device Android Maestro Driver Maestro Command Models Platform Specific Host Maestro Drivers Your App Specific Code Trailblaze Tools Trailblaze Tools Platform APIs Screen State Screenshot + View Hierarchy Trailblaze Architecture

Maestro UI Driver Interface Test Definition Kotlin Code Recordings Agent
+ LLM 🤖 Tool Calls Prompts, MCP Test Results and Logging Debug Logging Test Farm - CI/CD Maestro Command Models Your App Specific Code Trailblaze Tools Trailblaze Tools Screen State Screenshot + View Hierarchy Trailblaze Architecture

Future Plans 59

AI driven testing is a paradigm shift in mobile E2E
testing • Reduction in manual testing effort • More resilient against UI changes Novel test cases uncover new requirements Adoption

Test Authoring vs Execution • Test Setup ◦ Test Account
Provisioning ◦ Authentication ◦ Feature Flags • Custom App Interactions ◦ Draw Signature ◦ Connect Virtual Card Reader

Open source foundation MCP authoring improvements Beyond Android • iOS
and Web support? • Screen state and device drivers are a little different • Reuse the same prompts across platforms Investment

• https://github.com/block/goose Goose

Future Demo

Engage/Contribute to Open Source • https://github.com/block/trailblaze Try it out in
your current test environment • Incremental adoption is key Show us your AI tools! Call To Action

Trailblaze - Droidcon NYC 2025

Trailblaze - Droidcon NYC 2025

More Decks by Sam Edwards

Other Decks in Technology

Featured

Transcript