Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSER 2024 Keynote

CSER 2024 Keynote

Software developers spend a significant portion of the workday trying to understand and review the code changes of their teammates. Currently, most code reviewing and change comprehension is done using textual diff tools, such as the commit diff in GitHub or Gerrit. Such diff tools are insufficient, especially for complex changes, which move code within the same file or between different files. Abstract Syntax Tree (AST) diff tools brought several improvements in making easier the understanding of source code changes. However, they still have some constraints and limitations that affect negatively their accuracy. In this keynote, I will demonstrate these limitations using real case studies from open-source projects. At the same time, I will show how the AST diff generated by our tool addresses these limitations. Finally, I will introduce the Benchmark we created based on commits from the Defects4J and Refactoring Oracle datasets, and present the precision and recall of state-of-the-art AST diff tools on our benchmark. Vive la révolution!

https://www.cser.ca/2024s/#keynotes

Nikolaos Tsantalis

June 22, 2024
Tweet

More Decks by Nikolaos Tsantalis

Other Decks in Research

Transcript

  1. Abstract Syntax Tree diff Fine-grained diff between AST nodes (not

    just lines) Supports moves and updates (not just additions and deletions) Still has limitations (coming up soon…)
  2. 2007 Change Distiller Fluri et al. 2014 GumTree Falleri et

    al. 2016 MTDiff Dotzler et al. 2018 IJM Frick et al. 2023 iASTMapper Zhang et al. 2024 RMiner 3 Alikhanifard et al. Language aware Partial matching Language independent Largest identical subtrees Language independent Move action optimizations Tree Matching Statement Mapping
  3. GumTree Greedy Type matched with variable Variable matched with method

    call Variable matched with lambda parameter For body block matched with method body block
  4. GumTree Simple RefactoringMiner Rename object to item Extract variable itemKey

    object matched with itemKey object matched with item
  5. AST Diff benchmark • Process (6 months): 1. Run all

    ASTDiff tools (GumTree 3.0, GumTree 2.1, IJM, MTDiff, RMiner) 2. Manually validate the diffs 3. Construct the “perfect” diff • Datasets: • 800 bug fixings commits from Defects4J • 187 refactoring commits from Refactoring Oracle
  6. RefactoringMiner Statement mappings Program declaration mappings Import declaration mappings Refactoring

    mappings based on mechanics Tree Matcher Tree Matcher Overwrite conflicting mappings AST mappings AST mappings Final AST mappings Edit script version1 version2
  7. AST mapping accuracy dataset RMiner 3 Precision Recall GumTree greedy

    Precision Recall GumTree simple Precision Recall iASTMapper Precision Recall Defects4J 99.7 99.3 97.5 93.1 98.4 97.8 98.5 99 Refactoring 99.6 99.2 84.1 70.2 86.7 72.4 91.8 79.2 Overall 99.7 99.3 93.8 86.1 95.2 90 96.7 92.9 1. RMiner 3.0 99.5% 2. iASTMapper 94.8% 3. GumTree simple 92.6% 4. GumTree greedy 89.8% Tree Matching Statement Mapping Ranking based on F-score ±1-6% ±8-29% 99.4% 85.0% 78.9% 76.5% Refactoring only
  8. How can I use your tool? 1. dependency 2. Command

    line tool 3. Docker image 4. git rmd APIs: 1. With a commit of a locally cloned git repository 2. With a commit fetched directly from GitHub 3. With the files changed in a GitHub Pull Request 4. With two directories https://github.com/tsantalis/RefactoringMiner