Can We Measure Developer Productivity?

Can We Measure Developer Productivity? Eberhard Wolff Head of Architecture
https://swaglab.rocks/ https://ewolff.com/

Is Lines of Code (LoC) a good measure for developer
productivity?

Today I wrote just 10 lines of code. But I
finally fixed that bug!

Today I wrote just 10 lines of code. …because I
spent so much time deploying software Need to fix deployment!

Today I wrote 1.000 lines of code. …but that was
really yak shaving. No business value

Can We Measure Productivity? •Yes •But the questions really are:
Who measures? Why? How?

Good Scenarios •Review your own productivity (team or individual) •Measure
productivity to help

Problems

Goodhart’s Law Any observed statistical regularity will tend to collapse
once pressure is placed upon it for control purposes.

Goodhart’s Law When a measure becomes a target, it ceases
to be a good measure

Goodhart’s Law: Test Coverage •Test coverage: good measure for the
quality of test. •Higher: more parts of the code are executed, so the test can catch more errors.

Goodhart’s Law: Test Coverage •Set a test coverage goal •Metric
will be manipulated. •E.g. focus on trivial parts of the code •E.g. don’t check any results •E.g. just make sure no exception is thrown •….

Goodhart’s Law: Test Coverage •Test coverage increases. •But the tests
won’t really catch more problems. •Test coverage is not a good metric for the quality of the tests any more!

Goodhart’s Law: Test Coverage Solution? •Be smarter about what you
measure. •E.g. mutation testing •E.g. review test code •… •IMHO: This is a pointless arms race. •So don’t manage for metrics?

Goodhart’s Law: Solution •Purpose •Teams tries to optimize itself. •Probably
not a case for Goodhart’s Law •Management measures quality into software / people •Probably a case for Goodhart’s Law

Dealing with Goodhart’s Law •Let the team decide whether /
how they want to improve! •Help and support •i.e. pave the road. •Provide support: techniques and technologies

So: Questions •Who measures? •Why? •How?

Sensible Metrics

DORA: 4 Key Metrics •Change lead time •Deployment frequency •Change
fail percentage •Failed deployment recovery time •https://dora.dev/

DORA: Lean Optimization Production Idea Bottleneck becomes obvious Increase Speed!
Testing too slow!

DORA: Lean Optimization Production Idea Eliminate Bottleneck Increase Speed!

DORA: Lean Optimization Production Idea Increase Speed! Next Bottleneck becomes
obvious Deployment too slow!

DORA: Typical Bottlenecks •Configuration management •Testing •Deployment •Change approval •Decoupled
architecture •…

DORA: Typical Solutions •Automation •Also helps with quality, maintainability, …and
therefore productivity

DORA •Good empiric evidence •Metrics have many positive consequences •More
time for new features •Less burnout •More economic success

Another Good Metric: Business Value •I.e. Outcome •Ideally a $
/ € value •How do you measure business value? •Some organizations require a business case for a software project i.e. they can predict business value •Business case: starting point to find business value?

Other Metrics •Code quality •Complexity •Dependencies •… •Not productivity but
related to it.

https://software-architektur.tv/2023/06/07/folge168.html

Why do we focus on these metrics? What is the
science?

Empiric? •Empiric in our field is generally hard. •Empiric conclusions
about specific metrics? •But we must improve somehow. •Gut feeling?

https://software-architektur.tv/2021/10/25/episode86.html

SPACE: Areas •SPACE is a framework of metrics •Choose a
specific set of metrics to understand a specific problem

SPACE: Matrix of Metrics by Levels & Areas Area 1
Area 2 … Level 1 Metric … Level 2 … …

SPACE: Three Levels Individual / one person Team or group
/ people that work together System

SPACE: Areas •Satisfaction & Well-Being: e.g. Developer satisfaction / retention
•Performance (Outcome) Code review velocity •Activity (Count of actions) Code review scores

SPACE: Areas •Communication & Collaboration Code review thoughfullness •Efficiency &
Flow Productivity perception

SPACE: Recommendations •Multiple metrics across various dimensions •At least 3,
but not too many •At least one perceptual (survey) •What gets measured shows what is relevant •Hard to gamble

SPACE: Customizing •Understand your goal! •Select metrics! •No one size
fits all!

SPACE: Conclusion •A comprehensive and sensible framework •Many metrics •Must
be tailored to the environment •Broad (e.g. communication, collaboration, satisfaction) •Goes beyond pure performance

McKinsey

McKinsey •(Business) Consulting Company •Has some controversies: https://en.wikipedia.org/wiki/McKinsey_%26_Compa ny#Controversies

https://youtu.be/AiOUojVd6xQ

McKinsey Paper

McKinsey Matrix Outcome Focus Optimization Focus Opportunity Focus System level
Selected DORA / SPACE Metrics … Opportunity- focused metrics (McKinsey) + some SPACE metrics Team level … Individual level

McKinsey & DORA / SPACE •Predefined set of SPACE metrics
for projects •I.e. no customization per organization •Eliminates tailoring …and therefore the discussion about “why?”

McKinsey Matrix Outcome Focus Optimization Focus Opportunity Focus System level
Selected DORA / SPACE Metrics … Opportunity- focused metrics (McKinsey) + some SPACE metrics Team level … Individual level

Contribution Analysis •Measuring individual contributions to the backlog using JIRA
and custom tools. •Managers can manage expectations and improve performance this way. •IMHO problematic – is this a sensible metric? •Task might be important but time-consuming? •What if you don’t do tickets but support other people? •Shouldn’t contribution be about created business value?

Inner / Outer Loop Time Spent Inner Loop Outer Loop
Test Code Build Deploy at Scale Security and compliance Integrate Meetings

Inner / Outer Loop Time Spent Inner Loop Outer Loop
Test Code Build Deploy at Scale Security and compliance Integrate Meetings Optimize for time in the Inner Loop! Hacking away instead of a meeting to understand the problem? Really?

Developer Velocity Index •46 Driver in 13 capability areas •Technology
(Architecture, Public Cloud, Test Automation) •Working Practices (Engineering Practices e.g. Tech Debt) •Organizational Enablement (e.g. Culture, Talent Management)

Developer Velocity Index •Good foundation for a elaborated consulting project
•Does it help? •Benchmarking? By industry? •Every project is different •Can you arrive at results more pragmatically and quickly ? •E.g. interviews

Talent Capability Score •Individual skills •Diamond: majority in the middle
•Example: too many inexperienced individuals → training •Why not aim for the best? Majority More Skill

McKinsey Example I • Developers spend too much time on
design and managing dependencies • Clarify roles • Result: more code produced • Pro: Managing dependencies is annoying • Con: Design can be useful • Might be a good idea!

McKinsey Example II •New employees don't achieve as much •So:
better onboarding and mentoring •IMHO good idea •High potential for poor metrics •Mentors perform poorly with regards to Developer Velocity Index

McKinsey: Recommended Approach •Learn the basics for communication with C-level
•Assess your systems (e.g. to measure test coverage) •Build a plan - concrete goal •Remember that measuring productivity is contextual - it’s about getting better.

McKinsey: Conclusion •SPACE should be customized •New metrics are questionable
•In my experience, you can find the main challenges quicker e.g. with interviews. •However, the examples and general recommendations make sense. •Doesn’t seem to aim at identifying people to fire.

Criticism •The paper has sparked quite some criticism. •Next slides
show some highlights. •Not a comprehensive discussion!

Dan North’s Criticism

Dan North’s Criticism - Highlights •Contribution Analysis measures the wrong
thing •Has the outer loop really low value? •Talent capability: depends on organization

Dan North’s Recommendation •Theory of Constraints: Identity bottleneck, Utilize it
fully •Lead time or flow •I.e. Lean / DORA •If you hire the best, productivity is a problem of the organization not the individual. •Coaching & peer feedback

Kent Beck’s and Gergely Orosz’s Criticism

Effort Output Outcome Impact

Spot customer pain point Ship a solution Design docs Code
Feature in prod Customer behave differently Value generated

Inner / outer loop Developer velocity index Talten capability score
Contribution analysis Retention

Kent Beck & Gergely Orosz: Highlights •“Absurdly naïve” •Ignores software
development teams •It's about individual performance •CEO/CFO will override the CTO to implement the McKinsey framework •Unethical CEOs and CTOs are the target audience •Then it destroys the organization

Kent Beck & Gergely Orosz: Highlights •The criticism doesn’t match
what the paper says. •The paper has completely different examples and recommendations. •The criticism might be a caused by the scandals around McKinsey. •Prejudice?

Kent Beck & Gergely Orosz: Advice • Understand why you're
measuring and recognize power relationships. • Promote Self-Measurement: Teams should analyze their own data. • Trust Your Judgement: Rely on explanations that resonate and take responsibility for decisions. • Productivity metrics are misleading • Focus on Real Accountability: Prioritize consistent delivery of customer-valued outcomes. • IMHO great idea!

Conclusion

Conclusion •Beware of Goodhart’s Law! •Use metrics to support teams!
•Therefore: Create your own custom metrics for the problem at hand. •SPACE is a great starting point.

https://software-architektur.tv/2023/12/22/folge194.html

WIR SUCHEN MITGESTALTER:INNEN swaglab.rocks/karriere

Drink a virtual coffee with me! https://calendly.com/ eberhard-wolff-swaglab/

Send email to [email protected] Slides + Service Mesh Primer EN
+ Microservices Primer DE / EN + Microservices Recipes DE / EN + Sample Microservices Book DE / EN + Sample Practical Microservices DE/EN + Sample of Continuous Delivery Book DE Powered by Amazon Lambda & Microservices EMail address logged for 14 days, wrong addressed emails handled manually

Can We Measure Developer Productivity?

Can We Measure Developer Productivity?

More Decks by Eberhard Wolff

Other Decks in Technology

Featured

Transcript