2014) • First phase: Top-down AST matching to find the largest identical subtrees iteratively (AST hash value based on node label and value). • Second phase: matching previously unmatched AST, having a fair amount of their children matched (dice function, the ratio of common descendants between two nodes must be greater or equal to 0.5).
optimizations to improve the accuracy of the generated edit script, specifically for the Move actions, which make the edit scripts shorter. • IJM - Iterative Java Matcher (Frick et al., 2018): Partial matching, application of GumTree to selected parts of the source code (import declarations, methods with same signature). • All papers focus their evaluation on which tool generates the shorter edit script.
MTDiff and IJM generate inaccurate mappings for 20%–29%, 25%–36% and 21%– 30% of the file revisions, respectively. Our experimental results show that state-of-the-art AST mapping algorithms still need improvements.”
mappings • Sorting criteria for composite statement mappings • Multi-mapping support for duplicated code moved out of or moved into conditionals • Statement mapping scope based on call sites
from one leaf to the root of the subtree • GumTree default threshold = 2 • Why? Avoids matching remaining leaf expressions with height 1 (e.g., SimpleName nodes), which coincidentally have the same value. • Since we give as input a pair of matched statements, we configure minHeight = 1
accuracy: • True Positive: a mapping given by a tool that exists in the benchmark • False Positive: a mapping given by a tool does not exist in the benchmark • False Negative: a mapping that exists in the benchmark, but was not reported by a tool • Semantically incompatible mappings: • M = (m1 , m2 ) returned by a tool • m1 and m2 have the same AST type • The parents of m1 and m2 have a different AST type • M is not included in the benchmark
both benchmarks. • The accuracy improvements are more evident in the Refactoring benchmark. • GumTree 3.0 (simple) has better precision and recall than GumTree 3.0 (greedy), when considering sub-expression mappings. • RefactoringMiner and IJM excel in matching program elements (i.e., method, field declarations) accurately. • GumTree (greedy) and MTDiff generate the largest numbers of semantically incompatible mappings. • RefactoringMiner’s execution time is in the same order of magnitude with that of the faster tools.