Getting to Know Your Legacy (System) with AI-Driven Software Archeology (WeAreDevelopers World Congress 2025)

Getting to Know Your Legacy (System) with AI-Driven Software Archeology
WeAreDevelopers World Congress 2025, July 10, 2025 Markus Harrer Software Evolutionist 1

I’m sorry! over AI years old! This talk’s idea is
2

What happened in between? Devstral ChatGPT Deep Research GPT-4.5 o3
/ o4-mini AlphaEvolve Veo 3 Claude 4 Jan ‘25 Jul ‘25 Initial idea Conference Apple not delivering AI (only a few things stayed the same …) 3 Context Engineering

All the cool kids are playing with AI b/w drawing
by Daniel Storii {turnoff.us}, Thanks to Michael Tharrington 4

Legacy systems are different Drawing by Daniel Storii {turnoff.us}, Thanks
to Michael Tharrington We have to deal with the here, AI! real world 5

Archeological techniques applied to legacy systems in the age of
AI 6 ArchAIology

Archeology? https://en.wikipedia.org/wiki/File:Archaeology.rome.arp.jpg https://en.wikipedia.org/wiki/File:Dolina-Pano-3.jpg 7

Archeology It's about getting inside the heads of our ancestors.
8

Archaeologists try to find the clues left by people who
lived before us, and they try to make sense of them.“ “Archaeology at work”, English Heritage Education Service https://www.youtube.com/watch?v=TFejIkYDH9Q Detectives of the past “ Archeologist 9

10 Legacy Archeologist Developer What is there? What is it?
How was it made?

Modern archeology techniques Excavation Typology Chaîne Opératoire amount of AI
usage little high 11 What is there? How was it made? What is it?

Excavation What is there? 12 amount of AI usage little
high

Excavation E.g. using the Wheeler-Kenyon Method Left image: https://en.wikipedia.org/wiki/Wheeler%E2%80%93Kenyon_method#/media/File:Moza-449.jpg 13

Excavation Wheeler-Kenyon Method for code using directories, file sizes and
Git history with Tree Maps 14

Excavation 15

The elephant in the room Where is AI? 16

The AI in the room helped me write analysis scripts
17

I had my fun (and LLMs know that!) 18 My
Software Analytics repositories on GitHub pandas matplotlib Plotly Jupyter Notebook

Excavation /> What is there? 19 amount of AI usage
little high

Typology What is it? 20 amount of AI usage little
high

Classification of artifacts by sorting by shapes and kinds Typology
21

Typology 22 Classification of artifacts by sorting by names and
kinds

Sorting can get very tedious and repetitive Typology Image: https://pixabay.com/de/photos/%C3%A4gypten-
tempel-hieroglyphen-pharao-1197835/ (NadineDoerle) 23

24 Using existing naming schemas . ├── model │ ├──
Owner.java │ ├── Person.java │ ├── Pet.java │ ├── Specialty.java │ ├── Vet.java │ └── Visit.java ├── repository │ ├── OwnerRepository.java │ ├── PetRepository.java │ ├── VetRepository.java │ └── VisitRepository.java └── web ├── OwnerController.java ├── PetController.java ├── PetValidator.java ├── VetController.java └── VisitController.java Repository Pattern repository/*Repository.java Abstracts data access by encapsulating the logic needed to retrieve, store, and query data. repository Repository Repository Repository Repository Typology creation with LLMs

A typology prompt for Claude Code Analyze the production Java
code in this codebase and extract distinct concepts. Categorize them into two groups: - technical_concepts: architectural patterns, design tactics, or technical structures - business_concepts: domain-relevant ideas, rules, or terms that represent key business logic For each concept, provide: - name: a short, descriptive name - explanation: a concise description of what the concept is - rationale: why this concept likely exists in the codebase (technical or domain motivation) - file_globs: glob-style patterns of the used naming conventions to identify where this concept appears in the codebase (e.g., **/**Service.java, **/invoicing/**.java) Output the result as a well-structured YAML file with two top-level sections: technical_concepts and business_concepts. Focus only on Java production code (exclude test files, scripts, and configuration files). 25

The expected result from an LLM technical_concepts: - name: "Boundary"
description: "Defines the interfaces for communication between the core business logic (Interactors) and the outer layers (e.g., UI, web services). It includes Request and Response Models." rationale: "Separates the core application from the delivery mechanism, allowing the presentation layer to change independently of the business rules." file_globs: - "**/boundary/*.java" ... business_concepts: - name: "Site" description: "A 'Site' represents a distinct container or context for content like comments, files, and schedules. Most other business concepts are scoped within a specific site." rationale: "The concept of a 'Site' allows for multi-tenancy or partitioning of data, where different users or groups can have their own isolated space within the application." file_globs: - "**/site/**/*.java" 26 glob file patterns + some manual editing …

Typology evaluation 27 See how many files can be associated
with a concept If you know one file that implements a concept, you know all the other files that implement the same concept! Distribution for a small software system (~300 source code files)

Typology coverage analysis 28 See where you need to dig
deeper into the code base to get more familiar with it

29 Color key ▪ Both concept types ▪ Technical concept
type only ▪ Business concept type only ▪ No member of any concept type Typology

Advanced Typology Evaluating the conceptual integrity Source: L. Adams Gilmour,
Early Medieval Pottery from Flaxengate Lincoln. Image: https://pixabay.com/de/photos/arch%C3%A4ologie- arch%C3%A4ologische-ausgrabung-59150/ 30

Advanced Typology 31 Detect files within concepts that don‘t do
what others do Example: Concept “Service“: Serve cold beers CService BService AService DService

Advanced Typology Evaluating the conceptual integrity using LLMs [...] Please
analyze the following source code and assess how well it implements the specified concept. [...] Demo: https://github.com/feststelltaste/software-analytics/blob/master/demos/20250710_WADWC/Conceptual%20Integrity%20Analysis.ipynb 32

Advanced Typology Demo: https://github.com/feststelltaste/software-analytics/blob/master/demos/20250710_WADWC/Conceptual%20Integrity%20Analysis.ipynb EXPLANATION: The code [...] perfectly aligns
with [...] key component of the Boundary layer. [...] SCORE: 1.0 33 Evaluating the conceptual integrity using LLMs

34 conceptual integrity analysis Grouped by concepts

35 conceptual integrity analysis Grouped by dirs and files

Typology /> What is it? 36 amount of AI usage
little high

Chaîne Opératoire How was it made? 37 amount of AI
usage little high

Chaîne opératoire The operating chain of all the steps of
the lifecycle of an artifact 1. Creation 2. Usage 3. Maintenance 4. Repair 5. Deposition Image: https://fr.wikipedia.org/wiki/Cha%C3%AEne_op%C3%A9rat oire#/media/Fichier:Cha%C3%AEne_op%C3%A9ratoire.png 38

The operating chain of all of the steps of the
lifecycle of an artifact Chaîne Opératoire create public class Customer add testNameCheck() change to BusinessPartner fix tech debt add calculateBonus() refactor testNameCheck() delete BusinessPartner public class BusinessPartner { private String name; private double bonus; public BusinessPartner(String name) { this.name = name; } public double calculateBonus() { ... t 39

But before we move on to the last demo Do
we even need human written glue code for ArchAIology?? 40

because of bad prompting technique 42

because I clearly stated my intent 44

Chaîne Opératoire /> How was it made? 45 amount of
AI usage little high

It works, if you know how to dig deeper into
your legacy (system)! ArchAIology 46

Questions Discussions Suggestions Networking Thank you very much! 47 #archaiology

More on the topic 48 My collection about the topic
https://github.com/feststelltaste/awesome-software-analytics

49 Bonus Material You’ve scrolled so far, you deserve a
present

Manual Work Transformation Tools Guided AI AI assistants AI agents
Developers manually analyze, reason about, and fix issues (based on deep domain and system knowledge) Human-based creation of formal rules and recipes to perform consistent, automated code transformations Human-led detection of issues or anti-patterns, followed by localized AI-generated fixes within defined areas Human-guided AI-based task execution for fixing code in smaller areas / clearly scoped contexts Autonomous systems orchestrate analysis, transformation and validation of modernization workflows General Idea Special issues like redesign of critical parts of business logic or performance optimization Framework migrations, API upgrades, bulk renames, restructurings Identifying systemic issues and using AI to propose or apply localized solutions Summarizing code, generating tests & comments, renaming identifiers, writing code snippets Cleanup ideation, automated, multistep refactoring, bug fixing across multiple code bases Typical Use Cases ++ + o - -- Control How much humans can be in the loop -- - - + ++ Risk How likely changes go wrong -- - - o ++ Breadth How wide the method can operate ++ ++ + o o Accuracy How well problematic spots are addressed o ++ ++ o - Traceability How well actions can be tracked ~ - o o o Efforts How much work setup and use need -- ++ o - + Volume How much can be processed Light Version 1.1a Markus Harrer AI for Legacy Modernization: When and How to Use (or not)

Manual Work Transformation Tools1 Guided AI AI assistants AI agents
Developers manually analyze, reason about, and fix issues (based on deep domain and system knowledge) Human-based creation of formal rules and recipes to perform consistent, automated code transformations Human-led detection of issues or anti-patterns, followed by localized AI-generated fixes within defined areas Human-guided AI-based task execution for fixing code in smaller areas / clearly scoped contexts Autonomous systems orchestrate analysis, transformation and validation of modernization workflows General Idea Special issues like redesign of critical parts of business logic or performance optimization Framework migrations, API upgrades, bulk renames, restructurings Identifying systemic issues and using AI to propose or apply localized solutions Summarizing code, generating tests & comments, renaming identifiers, writing code snippets Cleanup ideation, automated, multistep refactoring, bug fixing across multiple code bases Typical Use Cases Very High (humans drive everything) High (humans define transformation logic, execution is automatic) Medium (humans guide focus, agents generate and apply solutions) Low (humans initiate, roughly guide and review AI’s results) Very Low (agents make decisions and act with minimal intervention) Control How much humans can be in the loop Low to Medium (may suffer from outdated assumptions, overconfidence or unclear goals) Medium (when creating recipes) to none (during execution, but also depends on recipe quality) Low (with good problem localization that allows suggestions in limited contexts) High to medium (depends on scope and tasks) Very high to high (esp. with broad tasks and high autonomy + wrong tool use) Risk How likely changes go wrong Very narrow (limited by developers’ cognitive capacities) Narrow (limits defined by AST, LST or recipe capabilities) Narrow (scoped to recognizable patterns or metrics) Limited (current file, code block or interaction context) Very broad (across files, services and task types) Breadth How wide the method can operate Human-level quality (varies by experience) High (precise and deterministic) High (during analysis), medium (during fixing) Medium (but error-prone outside narrow, familiar contexts / training) Medium (depends on prompt quality, feedback loops, available tools) Accuracy How well problematic spots are addressed High (with peer review and diffs) Very high (rules, recipes, diffs) High (analysis steps, reports3, diffs) Medium (prompt history, diffs) Medium (prompts, execution paths, diffs) Traceability How well actions can be tracked Variable (depends on task difficulty) Low to medium4 (depends on reusing existing recipes or creating new ones) Medium (because data analysis needed) Medium (prompt writing, instruction definition, model tuning) First low (“it’s just prompts”), later high (MCPs, orchestration, validation, security, …) Efforts How much work setup and use require Limited due to the need for deep contextual understanding High-volume, homogeneous code bases Mid-sized codebases (with structural issues) Localized impact (limited by context window ) Large, heterogeneous systems (with recurring issues) Volume How much can be processed 1 e.g. Codemods, OpenRewrite, Rector 2 e.g. using jQAssistant, Semgrep, CodeScene 3 e.g. using Jupyter Notebooks 4 for new recipes, AI might be used Full Version 1.1a Markus Harrer MCP: Model Context Protocol AST: Abstract Syntax Tree LST: Lossless Semantic Tree AI for Legacy Modernization: When and How to Use (or not)

Getting to Know Your Legacy (System) with AI-Dr...

Getting to Know Your Legacy (System) with AI-Driven Software Archeology (WeAreDevelopers World Congress 2025)

More Decks by Markus Harrer

Other Decks in Technology

Featured

Transcript