Code Is Cheap. Software Isn’t.

#1 AI-assisted coding in real Java systems Code Is Cheap.
Software Isn't. AI can generate code faster than most developers can review it That sounds like progress In many teams, it is not TWITTER •@[email protected] MASTODON

#2 Act 1 / The seduction Code is cheap. Software
is not. The first week feels like magic o Boilerplate disappears o Tests show up faster o Unfamiliar APIs become usable o Small refactors feel effortless

#3 Act 2 / The collision Code is cheap. Software
is not. Then you aim it at a real codebase LOCAL DEMO Simple task isolated code path Fast success pattern is visible SYSTEM BOUNDARY Real Java codebase module graph team conventions legacy behavior HIDDEN CONTEXT Hidden constraints Legacy decisions Cross-module coupling Unexpected breakage local change, system failure

is not. The problem was never writing code Easy o Syntax o Boilerplate o API lookup o Repetitive changes o Code-completions Hard o Intent o Invariants o Compatibility o System behavior under change o Implicitly encoded knowledge

is not. Where systems actually fail Failures hide: o not in the obvious lines o In hidden assumptions o In changed behavior o In broken contracts o In subtle coupling o In pattern o In encoded best practices o In operational contexts o In production requirements Examples Cleanup breaks transaction behavior Modernization breaks serialization Query optimization changes paging semantics Optimizations for downstream systems (DBs)

#6 Act 3 / The diagnosis Code is cheap. Software
is not. Missing context looks like hallucination INPUT Underspecified intent what is missing is invisible COMPLETION AI completion plausible characters SYSTEM FIT Wrong fit assumptions collide Hallucination label symptom, not diagnosis The model completes the picture we failed to provide.

is not. Sameness is not meaning Good enough for boilerplate. Risky for architecture. Boilerplate can ride on sameness o Architecture cannot o Compatibility cannot o Risk decisions cannot o Non Functional Requirements cannot

is not. Ambiguity turns into assumptions Ambiguity appears when: o Intent is implicit o Structure is inconsistent o Conventions vary by module o Rules live only in people's heads You ask follow-up questions. Models guess.

#9 Act 4 / The discipline Code is cheap. Software
is not. Clever prompting is not enough Better prompts help the next task. Structure helps the next hundred. What scales o Explicit constraints o Small tasks o Visible decisions o Repeatable workflows Structure beats cleverness when the system gets large.

is not. Separate the what from the how HUMAN Define WHAT goals, constraints, invariants AGENT Execute HOW implementation, refactor, tests ENGINEERING Verify + decide evidence before merge Safe delegation starts when WHAT is explicit and HOW is reviewable.

is not. Bad AI requests mix everything together Bad "Modernize this service, improve performance, and clean it up." Why it fails Intent is unclear Trade-offs are hidden Success criteria are missing

is not. Good AI requests define boundaries first Keep the REST API stable Preserve transaction boundaries Do not add dependencies Refactor this method only Generate tests for changed behavior

is not. Constraints beat clever prompts Constraints protect: o Compatibility o Architecture o Performance o Security o Team conventions

is not. Big tasks fail for the same reason big migrations fail o Too many unknowns o Too many moving parts o Too little verification o Too much guesswork

is not. The big-bang fantasy is emotional, not technical Why it feels attractive o Clean slate o Visible progress o No legacy smell o No compromise Why it fails o Hidden edge cases disappear o Integration knowledge gets erased o Stable weirdness gets removed

is not. Decomposition still wins BIG-BANG ASK Migrate module too many unknowns DECOMPOSED WORK 1 annotations 2 config 3 client 4 tests 5 edge cases SAFE DELEGATION Reviewable diff Rollback path AI speeds the steps. Engineering still defines the cut lines.

is not. One useful Monday-morning rule If the task cannot fit into: o One clear description o One reviewable diff o One verification path ...it is too large for safe delegation.

is not. Prompts are moments. Systems compound. One-off prompting gives o Local wins o Inconsistent behavior o Repeated explanation Systematic AI work needs o Persistent context o Explicit rules o Reusable workflows

is not. Turn chat into an operating system From One clever conversation To o Rules o Skills o Commands o ADRs o Tests o Team-level reuse

is not. Architecture has to become explicit Architects shift from Reviewing outcomes To o Defining intent o Encoding constraints o Making trade-offs visible Artifacts: ADRs / Architecture notes / Policy docs / Service boundaries

is not. ADRs, agent rules, and tests are not the same thing ADR decision rationale WHY Agent rules allowed assistant behavior HOW Tests expected behavior WHAT Do not ask one artifact to do all three jobs.

is not. Not all context helps an agent Useful o Invariants o Forbidden changes o Approved patterns o Sensitive modules o Compatibility rules Not automatically useful o Giant wiki dumps o Stale diagrams o Every dependency in the system o Generic architecture prose

is not. Compress context into rules of change Good context answers: o What must never break o What should usually be preferred o Where the system is fragile o Which trade-offs are already decided Anything else is noise at the point of work.

is not. Standards must move from memory to mechanism Tribal knowledge is not enough. o Modes o Commands o Checks o Templates o Review gates If the rule matters, encode it.

is not. Modes, commands, and context files solve different problems Modes Control how the assistant may act Commands Run repeatable operations Context files Store project knowledge

is not. Skills turn prompting into engineering Gathers the right context Follows team constraints Performs a repeatable workflow Proposes verification steps Stops before unsafe assumptions

is not. Example: a safe refactor skill INPUTS Target + boundaries class, method, nearby tests Inspect usage Apply minimal change Generate tests Summarize assumptions Human review A skill is a repeatable workflow, not a longer prompt.

is not. One small command can encode big expectations A command can o Inspect architecture o Apply standards o Flag suspicious changes o Force tests before edits Example use "analyze module and identify where current complexity bypasses our standards"

is not. Code context is not enough Before asking AI to change or review code, gather: ADRs OpenAPI or protocol definitions Issue discussions Runbooks Incident notes Migration guides

is not. MCP: Docs MCP Server Ground the agent in the docs your system actually uses. Index official docs, GitHub repos, local folders, PDFs, and internal guides Ask the assistant to search docs before touching unfamiliar APIs Use for internal frameworks, migration notes, ADRs, and platform rules Risk control: version the corpus; stale docs are still stale Practical rule If a task depends on library behavior, require one cited docs lookup before code generation. github.com/arabold/docs-mcp-server

is not. MCP: Quarkus Agent MCP Give Java agents Quarkus-native workflow tools. Create and manage Quarkus apps from a separate MCP process Search Quarkus docs and read extension-specific skills before coding Start dev mode, inspect logs, call Dev UI tools, and run tests Use it when the task is Quarkus-specific, not generic Java Practical rule Before writing Quarkus code, have the agent load extension skills and search the matching docs version. github.com/quarkusio/quarkus-agent-mcp

is not. MCP: Context7 Pull version-specific docs into the prompt at the moment of work. Fetches up-to-date documentation and code examples from source projects Use exact library IDs when you know them; specify versions when it matters Good default for public library APIs, setup, and configuration tasks Risk control: docs are still external context; verify generated behavior Practical rule Add a project rule: use Context7 for library/API work unless local docs are more authoritative. github.com/upstash/context7

is not. MCP: Quarkus Dev MCP Expose the running app as context, not just the repository. In Quarkus dev mode, the app can expose an MCP endpoint Agents can access Dev UI tools and MCP resources from the live app Use for endpoints, config, exceptions, tests, logs, and extension- provided tools Risk control: dev-only, experimental, and should not be a production backdoor Practical rule When debugging, ground the model in the running dev app before it proposes fixes. quarkus.io/guides/dev-mcp

is not. Personal memory is also grounding Make durable preferences and project habits explicit. Use a team wiki or Notion for decisions, runbooks, service maps, and glossary terms Use personal memory for stable preferences: review style, risk tolerance, communication norms Use skill systems such as Superpowers to turn habits into repeatable workflows Risk control: memory must be inspectable, editable, and easy to override Practical rule Memory should not be magic. Treat it like configuration: explicit, reviewed, and scoped. github.com/obra/superpowers / Notion / team wiki

is not. Why shallow AI review misses the bugs that matter Shallow review catches o Syntax issues o Style problems o Obvious defects Intent-aware review catches o Wrong behavior o Broken assumptions o Requirement mismatches

is not. Use phases, not one giant session 1. Explore code + docs 2. Derive requirements 3. Review structure 4. Check behavior 5. Verify tests 6. Fix re-check Each pass has one job. That is how review stays sharp.

is not. Treat generated code like a pull request you did not write Pass 1 Structural correctness Pass 2 Requirement fit Pass 3 Consistency with the system

#38 Act 5 / The new job Code is cheap.
Software is not. Verification becomes the bottleneck Before Writing code was expensive Now Generating code is cheap Understanding and verification are expensive If you cannot verify it, do not delegate it.

Software is not. The hidden cost is cognitive overload o Too many candidate changes o Too many micro-decisions o Shallow review pressure o Fatigue disguised as productivity

Software is not. Protect the human, not just the system Mitigations: Smaller tasks Review pauses Separate generation from judgment Explicit ownership Fewer concurrent change streams

Software is not. Why Java teams are better positioned than they think Java already values: Contracts Explicit APIs Modularity Compatibility Testing Operational discipline These are not old habits. They are AI advantages.

Software is not. A practical operating model 1. Intent 2. Constraints 3. Decompose 4. Context 5. Rules + skills 6. Draft 7. Verify 8. Capture learning The loop only works if learning feeds back into the next task.

Software is not. Code is cheap. Intent is hard. Verification is everything. AI did not replace engineering. It made engineering visible again.

Software is not. Final takeaway Winning teams will not be the fastest at generating code. Shape intent clearly Bound change deliberately Verify without compromise Protect human judgment

#45 THANK YOU! Code is cheap. Software is not. Code
is cheap. Intent is hard. Verification is everything.

#46 Follow Bob on YouTube Continue with Bob Learn more
about Bob! Visit the team at the booth this week! Explore Bob the way that fits your next step, ask questions, learn the product, or go deeper with hands-on content. youtube.com/@ibm-bob bob.ibm.com

Code Is Cheap. Software Isn’t.

Code Is Cheap. Software Isn’t.

More Decks by Markus Eisele

Other Decks in Technology

Featured

Transcript