Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Source Code Diff Revolution

Source Code Diff Revolution

SANER 2025 Keynote
Despite the tremendous impact of Large Language Models on facilitating many software engineering tasks, change comprehension still remains a very challenging task. Software developers spend a significant portion of the workday trying to understand and review their teammates’ code changes. Currently, most code reviewing and change comprehension is done using textual diff tools, such as the commit diff provided by GitHub or Gerrit. Such diff tools are insufficient, especially for complex changes, which move code within the same file or between different files. Abstract Syntax Tree (AST) diff tools brought several improvements in making easier the understanding of source code changes. However, they still have some constraints and limitations that affect negatively their accuracy. In this keynote, I will demonstrate these limitations using real case studies from open-source projects. At the same time, I will show how the AST diff generated by our tool addresses these limitations. Moreover, I will introduce a Benchmark we created based on commits from the Defects4J and Refactoring Oracle datasets and present the precision and recall of state-of-the-art AST diff tools on our benchmark. Finally, I will use examples from open-source projects showing that source code diff is extremely more challenging for test code and present our recent progress on documenting and detecting test-specific refactorings. I will conclude the keynote with some interesting future research directions. Vive la révolution!

Avatar for Nikolaos Tsantalis

Nikolaos Tsantalis

May 14, 2026

More Decks by Nikolaos Tsantalis

Other Decks in Research

Transcript

  1. Encyclopedia 3000: The picture shows humans, known as “software engineers”

    using source code diff tools to perform a “code review” in the year 2025.
  2. How frequently do we use diff tools? 41 minutes of

    code reviewing per day Global Code Time Report, 2022 https://www.software.com/reports/code-time-report 250K+ developers M. Codoban, S. S. Ragavan, D. Dig, and B. Bailey, “Software history under the lens: A study on why and how developers examine it,” ICSME 2015. Survey with 217 developers: P 85% of them consider software history important to their development activities P 61% need to refer to history at least several times a day
  3. Implications of low-quality diff ✘Makes blame fail ✘git log for

    block statements has 73% precision and 83% recall [Hasan et al., TSE 2024] ✘SZZ algorithm finds the change that introduced a bug, has over 9 variations fixing issues related to blame ✘Makes code reviewing slower ✘“Understanding the code takes most of the reviewing time” [Bacchelli and Bird, ICSE 2013] ✘“Understanding the code’s purpose, the motivations for the change, and how the change was implemented” [MacLeod et al., IEEE Software 2017] ✘Makes merge conflict resolution a nightmare ✘Structure-aware and refactoring-aware merging tools [IntelliMerge, OOPSLA 2019] [RefMerge, TSE 2022] [JDime, ASE 2017]
  4. Abstract Syntax Tree diff Fine-grained diff between AST nodes (not

    just lines) Supports moves and updates (not just additions and deletions) Still has limitations (coming up soon…)
  5. 2007 Change Distiller Fluri et al. 2014 GumTree Falleri et

    al. 2016 MTDiff Dotzler et al. 2018 IJM Frick et al. 2023 iASTMapper Zhang et al. 2024 RMiner 3 Alikhanifard et al. Language aware Partial matching Language independent Largest identical subtrees Language independent Move action optimizations Tree Matching Statement Mapping
  6. GumTree Greedy Type matched with variable Variable matched with method

    call Variable matched with lambda parameter For body block matched with method body block
  7. GumTree Simple RefactoringMiner Rename object to item Extract variable itemKey

    object matched with itemKey object matched with item
  8. The first AST Diff benchmark • Process (6 months): 1.

    Run all ASTDiff tools (GumTree 3.0, GumTree 2.1, IJM, MTDiff, RMiner) 2. Manually validate the diffs 3. Construct the “perfect” diff • Datasets: • 800 bug fixing commits from Defects4J • 187 refactoring commits from Refactoring Oracle Pouria Alikhanifard
  9. RefactoringMiner Statement mappings Program declaration mappings Import declaration mappings Refactoring

    mappings based on mechanics Tree Matcher Tree Matcher Overwrite conflicting mappings AST mappings AST mappings Final AST mappings Edit script version1 version2
  10. AST mapping accuracy dataset RMiner 3 Precision Recall GumTree greedy

    Precision Recall GumTree simple Precision Recall iASTMapper Precision Recall Defects4J 99.7 99.3 97.5 93.1 98.4 97.8 98.5 99 Refactoring 99.6 99.2 84.1 70.2 86.7 72.4 91.8 79.2 Overall 99.7 99.3 93.8 86.1 95.2 90 96.7 92.9 1. RMiner 3.0 99.5% 2. iASTMapper 94.8% 3. GumTree simple 92.6% 4. GumTree greedy 89.8% Tree Matching Statement Mapping Ranking based on F-score ±1-6% ±8-29% 99.4% 85.0% 78.9% 76.5% Refactoring only
  11. How can I use your tool? 1. dependency 2. Command

    line tool 3. Docker image 4. git rmd 5. GitHub action APIs: 1. With a commit of a locally cloned git repository 2. With a commit fetched directly from GitHub 3. With the files changed in a GitHub Pull Request 4. With two directories https://github.com/tsantalis/RefactoringMiner
  12. 1. Extract and Move Method 3. Local variable theRecord renamed

    to result 4. Inherited attribute allFields renamed to allReportedFields 5. Moved attribute UNKOWN_FIELD_AT_ENTRY_TYPE_CELL_ENTRY renamed to fix typo 6. String literal “-” extracted to an attribute 2. Move Attributes
  13. 55 test-specific refactorings 31 totally new 10 found by all

    3 methods 30 found by at least 2 different methods
  14. The two extremes in AST diff Language specific Language independent

    RefactoringMiner High accuracy Complex algorithm Hard to generalize Lower accuracy Simple algorithm Blind matching
  15. Variable url is extracted to construct the pagination url based

    on the value of the pageNumber parameter This line has been added to the extracted method to return whether the coursesContainer includes more pages with courses. This newly added while loop calls the extracted method by incrementing the pageNumber argument by 1 in each iteration and terminates when the extracted method returns false (i.e., there are no more pages left).
  16. DiffBenchmark 1. Generate “perfect diff” programmatically • Combining diffs from

    different tools • Discarding and injecting mappings 2. Translate the output diff of any tool to a common format based on offset information (e.g., gumtree-spoon) 3. Extend existing tools with missing features • enable multi-mappings • enable inter-file mappings • force semantically compatible matches 4. Compute precision/recall based on “perfect diff”