$30 off During Our Annual Pro Sale. View Details »

Developer Productivity Engineering: What's in it for me?

Developer Productivity Engineering: What's in it for me?

_It may surprise you to learn that we developers are a patient, tolerant species. People pay us to do what we enjoy - write code and create working applications. In return, we will put up with all sorts of blockages and toil that get in the way of this - long build times, flaky tests, hard-to-debug toolchain failures and so on._

_Is this truly the price we need to pay? Could there be a better world, where the build is as fast as it could possibly be? A world where problems that affect many developers are quickly identified, and fixed?_

Welcome to the world of Developer Productivity Engineering, where we can get computers to do what they’re good at (automation) to make developers’ lives easier, and make us more effective at our jobs. And while developer joy may be a difficult thing to sell to decision makers, effective developers who are making the best use of their time, and their hardware, have a direct impact on an organization’s ROI. What’s not to love?

In this talk, Trisha will explore what DPE is, give you some practical ways to get started, and discuss ways to help the leaders in your organisation to understand the enormous value DPE could unlock.

Trisha Gee

April 17, 2023
Tweet

More Decks by Trisha Gee

Other Decks in Technology

Transcript

  1. Enterprise
    Developer Productivity Engineering


    What’s in it for me?

    View Slide

  2. ⬢ Lead Developer Advocate


    ⬢ Java Champion


    ⬢ 20+ years Java experience


    ⬢ …and author
    Trisha Gee

    View Slide

  3. https://trishagee.com/books/

    View Slide

  4. View Slide

  5. View Slide

  6. But Bottlenecks to Productivity are Everywhere
    Code
    Code
    Wait Time for Local Build
    Debug Build Failure
    Lunch
    Code
    Wait Time for Local Build
    Investigate/Fix Flaky Tests
    Sprint
    Waiting time for CI Build

    View Slide

  7. “Bottlenecks in the toolchain are holding back the
    rockstar 10x developers”
    Pete Smoot, Software Architect, Dell Technologies

    View Slide

  8. View Slide

  9. View Slide

  10. The “best” programmers outperformed
    the worst by roughly a 10:1 ratio

    View Slide

  11. What Mattered?

    View Slide

  12. ⬢ Paired programmers performed at roughly the same level
    What Mattered?

    View Slide

  13. ⬢ Paired programmers performed at roughly the same level
    ⬢ The average difference was only 21% between paired participants
    What Mattered?

    View Slide

  14. ⬢ Paired programmers performed at roughly the same level
    ⬢ The average difference was only 21% between paired participants
    ⬢ They didn’t work together on the task, but they came from the same
    organization
    What Mattered?

    View Slide

  15. ⬢ Paired programmers performed at roughly the same level
    ⬢ The average difference was only 21% between paired participants
    ⬢ They didn’t work together on the task, but they came from the same
    organization
    ⬢ The best organization performed 11.1x better than the worst
    What Mattered?

    View Slide

  16. “While this productivity differential among
    programmers is understandable, there is also a 10 to 1
    difference in productivity among software
    organizations.”
    Software Productivity in the Enterprise


    Harlan (HD) Mills


    https://trace.tennessee.edu/cgi/viewcontent.cgi?article=1010&context=utk_harlan

    View Slide

  17. “The bald fact is that many companies provide
    developers with a workplace that is so crowded, noisy,
    and interruptive as to fill their days with frustration.
    That alone could explain reduced efficiency as well as a
    tendency for good people to migrate elsewhere.”
    Peopleware: Productive Projects and Teams, Third Edition


    Tom DeMarco, Tim Lister

    View Slide

  18. Though the phrase had not yet been coined, increased
    productivity came down to developer experience.

    View Slide

  19. Gradle is Pioneering DPE
    DPE is a new software development
    practice used by leading software
    development organizations to
    maximize developer productivity
    and happiness.

    View Slide

  20. What Problems Does DPE Solve?

    View Slide

  21. View Slide

  22. DevOps, 12-Factor, Agile, etc, have still not
    captured all bottlenecks, friction, and obstacles
    to throughput
    Many are hiding in plain sight, in the developer
    experience itself

    View Slide

  23. A 10x organization should be reducing
    build and test feedback times and
    improving the consistency and
    reliability of builds

    View Slide

  24. Pain Point:
    Waiting for Builds &
    Tests to Complete

    View Slide

  25. Are you tracking local build and test
    times?

    View Slide

  26. View Slide

  27. View Slide

  28. The only initiatives that will positively
    impact performance are ones which
    increase throughput while
    simultaneously decreasing cost

    View Slide

  29. Faster Builds Improve Creative Flow
    Team 1 Team 2
    No. of Devs 11 6
    Build Time 4 mins 1 mins
    No. of local builds 850 1010

    View Slide

  30. Very Fast Feedback Is Important

    View Slide

  31. Solution: Acceleration Technologies

    View Slide

  32. Build Caching Speeds up Builds and Tests

    View Slide

  33. ⬢ Introduced to the Java world by Gradle in 2017


    ⬢ Used by leading technology companies like Google and Facebook


    ⬢ Can support both user local and remote caching for distributed
    teams
    Build Caching

    View Slide

  34. Build Caching
    When the inputs have not changed, the outputs can be reused from a previous run.

    View Slide

  35. Demo: Build Cache for Maven and Gradle

    View Slide

  36. Remote Build Cache
    ⬢ Shared among different machines


    ⬢ Speeds up development for the whole team


    ⬢ Reuses build results among CI agents/jobs and individual developers

    View Slide

  37. Test Distribution Parallelizes Test Execution

    View Slide

  38. Existing solutions: Single machine parallelism
    Parallelism in Gradle is controlled by these flags:
    --
    parallel / org.gradle.parallel

    Controls project parallelism, defaults to false
    --
    max-workers / org.gradle.workers.max

    Controls the maximum number of workers, defaults to the number of processors/cores
    test.maxParallelForks

    Controls how many VMs are forked by an individual test task, defaults to 1
    See https://guides.gradle.org/performance/#parallel_execution for more information

    View Slide

  39. Existing solutions: CI fanout
    See https://builds.gradle.org/project/Gradle for an example of this strategy
    Test execution is distributed by manually partitioning the test set and then running partitions in
    parallel on several CI nodes.
    pipeline {

    stage('compile') { ... }

    parallelStage('test') {

    step {

    sh './gradlew :testGroup1' 

    }

    step {

    sh './gradlew :testGroup2' 

    }

    step {

    sh './gradlew :testGroup3' 

    }

    } 

    }

    View Slide

  40. Assessment of existing solutions
    ⬢ Build Caching is great in many cases but
    doesn’t help when test inputs have changed.
    ⬢ Single machine parallelism is limited by that
    machine’s resources.
    ⬢ CI fanout does not help during local
    development, requires manual setup and test
    partitioning, and result collection/aggregation

    View Slide

  41. Test Distribution in Gradle Enterprise

    View Slide

  42. Test Distribution Results
    ‑ ~50%
    ‑ ~50%
    ‑ ~50%
    Measurements from the demo project
    Doubling the number of executors cuts build time in half

    View Slide

  43. Netflix reduced a 62-minute test cycle time down to just under 5 minutes!

    View Slide

  44. Machine learning leads to greater efficiencies

    View Slide

  45. Predictive Test Selection
    01 Instead of trying to analyze which tests could possibly be impacted by
    developer changes, Predictive Test Selection looks at the history of changes
    and what has happened to tests in the past
    02 When tests complete, they can either FAIL, SUCCEED, or be FLAKY.
    Predictive Test Selection will predict the outcome of the test based on the
    history it is analyzing
    03 PTS will recommend skipping tests that are successful, and will only run tests
    that are likely to provide valuable feedback
    https://arxiv.org/pdf/1810.05286.pdf

    View Slide

  46. Force multiplier when used in combination
    1. Build Cache. Avoid unnecessarily running
    components of builds and tests whose inputs
    have not changed.
    2. Predictive Test Selection. Run only the
    relevant subset of test tasks likely to provide
    useful feedback.
    3. Test Distribution. Speed up the execution
    of the necessary and relevant remaining
    tests by running them in parallel.
    4. Performance Continuity. Sustain Test
    Distribution and other performance
    improvements over time with data analytic
    and performance profiling capabilities.

    View Slide

  47. Is the build and test cycle fast enough?

    View Slide

  48. Is the build and test cycle fast enough?

    View Slide

  49. Is the build and test cycle as fast as it
    can possibly be?

    View Slide

  50. Pain Point:
    Inefficient
    troubleshooting of
    broken builds

    View Slide

  51. “ You can observe a lot by just watching.”
    Yogi Berra, Catcher and Philosopher
    Blank background use at will

    View Slide

  52. Build Scan: scans.gradle.com

    View Slide

  53. Learn more
    https://bit.ly/grdl-scan

    View Slide

  54. DPE Organizations Track Failure Rates

    View Slide

  55. Pain Point:
    Flaky Tests & Other
    Avoidable Failures

    View Slide

  56. Flaky builds and tests are maddening

    View Slide

  57. ⬢ Try it again


    ⬢ Re-run it


    ⬢ Re-run it again


    ⬢ Ignore it and approve PR


    ⬢ All of the above
    The test is flaky. What do you do now?

    View Slide

  58. Identify and Track Flaky Tests

    View Slide

  59. https://youtu.be/vHBzZHE4tJ0

    View Slide

  60. Pain Point:
    No Metric/KPI
    Observability

    View Slide

  61. Without focus, problems can sneak
    back in

    View Slide

  62. Continuous Improvement: It doesn’t really matter what you
    improve as long as you are constantly improving something,
    because…
    …entropy denotes that if you aren’t doing
    anything, you’re always getting worse.

    View Slide

  63. “The tools, services, and environments that developers
    need to do their jobs should be treated with
    production-level SLAs. The development platform is
    the production environment for the job of creating
    software”
    Release It! Second Edition


    Michael Nygard


    View Slide

  64. Pain Point:
    Inefficient use of CI
    Resources

    View Slide

  65. All Of This Will Improve CI
    Body text


    View Slide

  66. In Summary

    View Slide

  67. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    In Summary

    View Slide

  68. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    ⬢ Developer Productivity is deeply linked to Developer Experience
    In Summary

    View Slide

  69. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    ⬢ Developer Productivity is deeply linked to Developer Experience
    ⬢ If you do nothing about productivity, life will get worse
    In Summary

    View Slide

  70. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    ⬢ Developer Productivity is deeply linked to Developer Experience
    ⬢ If you do nothing about productivity, life will get worse
    ⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
    In Summary

    View Slide

  71. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    ⬢ Developer Productivity is deeply linked to Developer Experience
    ⬢ If you do nothing about productivity, life will get worse
    ⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
    ⬢ Start with observation, and then take action on data
    In Summary

    View Slide

  72. ⬢ 10x Developers might be a myth, but 10x Organisations are real
    ⬢ Developer Productivity is deeply linked to Developer Experience
    ⬢ If you do nothing about productivity, life will get worse
    ⬢ Fast feedback, efficient troubleshooting, and reliable cycles are key
    ⬢ Start with observation, and then take action on data
    ⬢ Proactively solve problems for the whole team
    In Summary

    View Slide

  73. Source: TechValidate. TVID: 066-EEE-DB1

    View Slide

  74. DPE Transforms Every Business Layer

    View Slide

  75. Next Steps

    View Slide

  76. https://bit.ly/speed-build
    Build speed challenge

    View Slide

  77. There’s a Book for This

    View Slide

  78. View Slide

  79. https://bit.ly/dpe-4me

    View Slide

  80. Thank you!

    View Slide

  81. How it works…
    1. When a test run starts, the build tool
    submits a test input snapshot and test
    set to a machine learning model.


    2. PTS automatically develops a test
    selection strategy by learning from
    historical code changes and test
    outcomes from your Build Scan data to
    predict a subset of relevant tests, which
    are then executed by your build.


    3. Code change and test results data are
    processed immediately after a Build
    Scan is uploaded to PTS and updates
    the test selection strategy based on new
    results.

    View Slide

  82. Cache Key/Value Calculation


    The cacheKey for Gradle Tasks/Maven Goals is based on the Inputs:


    cacheKey(javaCompile) = hash(sourceFiles, jdk version, classpath, compiler args)


    The cacheEntry contains the output:


    cacheEntry[cacheKey(javaCompile)] = fileTree(classFiles)


    For more information, see:


    https://docs.gradle.org/current/userguide/build_cache.html


    View Slide