Don't Prompt Harder, Structure Better

Don't Prompt Harder, Structure Better Generating E2E test code with
LLM Yusuke Kita

Agenda LLM's failure modes How to structure context Structured context
in practice Takeaway 2

The Problem Without structure, LLMs hallucinate test code 3

What LLM generates without context Prompt: "Write a login test
for our iOS driver app" func testLogin() { let app = XCUIApplication() app.textFields["username"].tap() // No such identifier app.textFields["username"].typeText("test") app.buttons["Log in"].tap() // Actually "Sign in" app.staticTexts["Welcome"].tap() // No wait — not loaded yet } 4

Three Failure Modes Mode Problem 1 Hallucinated selectors IDs that
don't exist 2 Wrong text "Log in" but actually "Sign in" 3 No wait strategy Race conditions 5

How do we give LLM enough context? Not by writing
better prompts — by structuring the context it receives. 6

The Solution IR + Scenarios + Definitions + Templates —
3 layers that ground LLM output 7

Three Layers of Structured Context Layer Role Provides IR (Intermediate
Representation) "What to test" Spec intent, machine-readable Scenarios + Definitions "What exists" Step order, selectors, timeouts Templates "How to write" Code patterns, conventions Pipeline: PRD/Figma → IR → Code 8

IR — "What to test" ir: name: Driver Login preconditions:
- Operator account exists input_fields: - { name: operator_name, type: select } - { name: password, type: password } - { name: branch_office, type: select } - { name: radio_number, type: text } - { name: driver_number, type: text } IR captures intent separately from implementation → prevents hallucination. 9

Scenarios — "What steps in what order" name: ride-complete parameters:
passengerName: { type: string, default: "anonymous" } paymentAmount: { type: number, default: 50 } variants: anonymous: {} designated: { driverNumber: "${DRIVER_NUMBER}" } steps: - performFullLogin - acceptRideRequest - completeRideJourney: { passengerName: "${passengerName}" } - enterPaymentAmount: { amount: "${paymentAmount}" } - confirmPayment - endWorkDay Fixed steps → no hallucinated flows. Variants → multiple tests. 10

Definitions — "What exists" ios: helpers: performFullLogin: signature: "(app: XCUIApplication)
-> Void" acceptRideRequest: signature: "(app: XCUIApplication) -> Void" completeRideJourney: signature: "(app, passengerName: String?) -> Void" timeouts: login_process: 30 accessibility_ids: button_login: "button-login" button_confirm: "button-confirm" LLM gets: real selectors, reusable helpers, proven timeouts — no guessing. 11

Templates — "How to write" final class RideUITests: XCTestCase {
func testAcceptAndCompleteRide() throws { performFullLogin(app: app) acceptRideRequest(app: app) completeRideJourney(app: app, passengerName: "anonymous") } } // Selector priority: accessibilityIdentifier > label > predicate app.buttons["button-login"].tap() app.staticTexts["Enter driver number"].waitForExistence(timeout: 30) // Wait: waitForExistence for UI, sleep only for backend (with reason) // reason: "Backend processes state change" Thread.sleep(forTimeInterval: 5) 12

The Workflow Structured context first: how AI uses what exists
instead of guessing 13

How it works in practice Step What Action 1 Scenario
YAML Define WHAT to test 2 Definitions Define WHAT exists (helpers, selectors) 3 Templates Define HOW to write (patterns) 4 Test Cases Compose into tests AI follows this chain every time. 14

Same prompt, different context Prompt: "Write a ride completion test"
Without context → LLM guesses everything app.buttons["login"].tap() // wrong ID sleep(3) // arbitrary wait app.staticTexts["Welcome"].exists // wrong text With IR + Scenarios + Definitions + Templates → LLM uses real values performFullLogin(app: app) // from scenarios app.buttons["button-login"].tap() // from definitions app.staticTexts["Enter driver number"] .waitForExistence(timeout: 30) // from definitions // reason: "Backend processes state change" Thread.sleep(forTimeInterval: 5) // from templates 15

Takeaway The formula + 3 things to take home 16

The Formula Reliable AI-generated test code = IR (what to
test) + Scenarios (what steps) + Definitions (what exists) + Templates (how to write) 17

Three Things to Take Home 1. Structure > Prompts Give
LLM selectors, definitions, patterns — not just instructions. 2. Single Source of Truth Update once → all tests benefit. Humans and AI read the same source. 3. Context Prevents Hallucination AI knows what to test, what exists, and how to write → no guessing. 18

One More Thing Don't build for AI. Build for your
team. The context that helps your team write consistent code — is the same context AI needs. 19

Why it works for your team Reviewable — review context
before any code is generated Single source of truth — update one file, all tests stay consistent Onboarding — new members read structured files, not ask around 20

Thank You

Don't Prompt Harder, Structure Better

Don't Prompt Harder, Structure Better

Yusuke Kita

More Decks by Yusuke Kita

Other Decks in Programming

Featured

Transcript