Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Assessing the Threat of Untracked Changes in So...

Assessing the Threat of Untracked Changes in So!ware Evolution (ICSE 2018)

While refactoring is extensively performed by practitioners, many Mining Software Repositories (MSR) approaches do not detect nor keep track of refactorings when performing source code evolution analysis. In the best case, keeping track of refactorings could be unnecessary work; in the worst case, these untracked changes could significantly affect the performance of MSR approaches. Since the extent of the threat is unknown, the goal of this paper is to assess whether it is significant. Based on an extensive empirical study, we answer positively: we found that between 10 and 21% of changes at the method level in 15 large Java systems are untracked. This results in a large proportion (25%) of entities that may have their histories split by these changes, and a measurable effect on at least two MSR approaches. We conclude that handling untracked changes should be systematically considered by MSR studies.

ASERG, DCC, UFMG

June 01, 2018
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Assessing the Threat of Untracked Changes in Software Evolution André

    Hora, Danilo Silva, Marco Tulio Valente, Romain Robbes ICSE 2018
  2. MSR Examples • Library migration • Change prediction • Bug

    fixing • Warnings prioritization • Code expert computation • … 4
  3. MSR researchers are aware about this “threat”, but they often

    do not assess it “Our tool is unable to verify if an entity in revision n has been renamed in revision n+1” [48] “The development history of a file can be lost in case of renaming operations, copy or file split” [3] “It is possible to miss bug-introducing changes when a file changes its name since the approach does not track such name changes” [38] “We detect renamed or moved units as units that are removed first and added later” [50] 14
  4. MSR researchers are aware about this “threat”, but they often

    do not assess it “Our tool is unable to verify if an entity in revision n has been renamed in revision n+1” [48] “The development history of a file can be lost in case of renaming operations, copy or file split” [3] “It is possible to miss bug-introducing changes when a file changes its name since the approach does not track such name changes” [38] “We detect renamed or moved units as units that are removed first and added later” [50] 15 [2, 5, 6, 7, 12, 22, 26, 27, 28, 29, 34, 36, 42, 45, 53, 54, 59, 61, 62, 66, 67, 68…]
  5. Tracked and Untracked Changes version 1 version 2 public void

    foo() { obj.print() } public void foo() { obj.println() } version 3 public void bar() { obj.println() } tracked change: preserves the entity name and modifies its source code untracked change: modifies the entity name, and may also modify its source code 18
  6. Change Graph class Foo { mA() {…} } class Bar

    { mB() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 19 tracked change untracked change Legend
  7. Research Questions • RQ1.What is the frequency of untracked changes?

    • RQ2. What is the extension of untracked changes? • RQ3. What is the impact of untracked changes in existing MSR-based approaches? 21
  8. Tracked and Untracked Changes Computation Refactoring resolution • RefDiff [Silva

    et al., MSR 2017] • Precision: 85.6% - 100% • Recall: 89.8% - 93.9% 1. Rename Class 2. Move Class 3. Extract Superclass 4. Move and Rename Class 5. Extract Interface 6. Rename Method 7. Move Method 8. Extract Method 9. Inline Method 10. Pull Up Method 11. Push Down Method 23
  9. RQ1. What is the frequency of untracked changes? (example) class

    Foo { mA() {…} } class Bar { mB() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 26 17 changes 12 tracked changes 5 untracked changes
  10. RQ1. What is the frequency of untracked changes? (example) class

    Foo { mA() {…} } class Bar { mB() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 27 17 changes 12 tracked changes 5 untracked changes
  11. RQ1. What is the frequency of untracked changes? (example) class

    Foo { mA() {…} } class Bar { mB() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 28 17 changes 12 tracked changes 5 untracked changes
  12. RQ1. What is the frequency of untracked changes? (example) class

    Foo { mA() {…} } class Bar { mB() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 29 Not desirable: relevant data may be missed !!! 17 changes 12 tracked changes 5 untracked changes
  13. RQ1. What is the frequency of untracked changes? Untracked changes

    Classes 2% to 15% Methods 10% to 21% 31 Untracked changes are frequent
  14. RQ1. What is the frequency of untracked changes? Untracked changes

    Rename mtd: 26% Extract mtd: 23% Move mtd: 22% Move class: 12% 32
  15. RQ1. What is the frequency of untracked changes? Untracked changes

    Rename mtd: 26% Extract mtd: 23% Move mtd: 22% Move class: 12% 33 Keeping track of renamings is not enough
  16. class Foo { mA() {…} } class Bar { mB()

    {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 7 paths 3 paths: only tracked changes 4 paths: at least one untracked changes RQ2. What is the extension of untracked changes? (example) 35
  17. class Foo { mA() {…} } class Bar { mB()

    {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 RQ2. What is the extension of untracked changes? (example) 36 1 2 3 7 paths 3 paths: only tracked changes 4 paths: at least one untracked changes
  18. class Foo { mA() {…} } class Bar { mB()

    {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 RQ2. What is the extension of untracked changes? (example) 37 1 2 3 4 7 paths 3 paths: only tracked changes 4 paths: at least one untracked changes
  19. class Foo { mA() {…} } class Bar { mB()

    {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} mC() {…} } class Foo { mA() {…} } class Bar { mX() {…} } class Foo { mA() {…} } class Baz { mY() {…} } class Qux { mC() {…} } class Qux { mC() {…} mE() {…} } version 1 version 2 version 3 version 4 RQ2. What is the extension of untracked changes? (example) 38 1 2 3 4 Not desirable: their histories may be split !!! 7 paths 3 paths: only tracked changes 4 paths: at least one untracked changes
  20. RQ2. What is the extension of untracked changes? 39 18%

    to 41% entities with at least one untracked change in their histories
  21. RQ2. What is the extension of untracked changes? 22% to

    58% entities with at least one untracked change in their histories Only considering the most changed entities 40
  22. RQ2. What is the extension of untracked changes? 22% to

    58% entities with at least one untracked change in their histories Only considering the most changed entities 41 Untracked changes cause splits in entity histories
  23. RQ3. What is the impact of untracked changes in existing

    MSR-based approaches? • Approaches • API evolution mining rule (eg, Vector —> List) • API co-usage mining rule (eg, Map —> HashMap) • Results • Amount of mined rules: usually improves when taking into account untracked changes (median: 0% to +7%) • Quality of mined rules: slightly improves when including untracked changes (median: -2% to +2%) 42
  24. RQ3. What is the impact of untracked changes in existing

    MSR-based approaches? • Approaches • API evolution mining rule (eg, Vector —> List) • API co-usage mining rule (eg, Map —> HashMap) • Results • Amount of mined rules: usually improves when taking into account untracked changes (median: 0% to +7%) • Quality of mined rules: slightly improves when including untracked changes (median: -2% to +2%) 43 The impact of untracked changes is difficult to predict, and needs to be evaluated in a case-by-case basis
  25. Untracked changes are frequent (10-21% at method level) MSR studies

    should resolve untracked changes to access potentially relevant new mining data Keeping track of renamings is not enough (≈26%) MSR studies should address “extraction” and “moving” for a more complete resolution of untracked changes Untracked changes cause splits in entity histories (18-41%) MSR studies should resolve untracked changes when performing traceability analysis, for more precise entity lifespans 45
  26. Assessing the Threat of Untracked Changes in Software Evolution André

    Hora, Danilo Silva, Marco Tulio Valente, Romain Robbes ICSE 2018