Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Empirical Software Engineering in the Public Se...

Avatar for Arie van Deursen Arie van Deursen
May 03, 2025
7

Empirical Software Engineering in the Public Sector

Keynote delivered at the 2nd International Workshop on Methodological Issues with Empirical Studies in Software Engineering held at the ACM/IEEE International Conference on Software Engineering (ICSE) in Ottawa, Canada, on May 3, 2025.

Abstract
The public sector is a challenging software engineering domain. Government software systems are highly diverse, covering all aspects of society, including tax collection, social benefits, health care insurance, traffic management, infrastructure, and more. They have a wide variety of (conflicting) stakeholders, and are held publicly accountable.

In The Netherlands, complex Information Technology (IT) projects of the governemnt must be evaluated by a dedicated "Advisory Council on IT Assessment" (AcICT). Over the past decade, over 125 public reports have appeared. Each report describes the risks and chances of success of a given IT project, and includes recommendations to ministers and parliament on how to proceed. In this presentation I will discuss what we can learn from these 125 reports, and which reportoire of empirical software engineering methods is or is not suitable for this task.

Avatar for Arie van Deursen

Arie van Deursen

May 03, 2025
Tweet

Transcript

  1. David Notkin 1955-2013 “ … the intent is to make

    the engineering of software more effective so that society can benefit even more from the amazing potential of software.” 2 “If we care about influence, as I hope we do, then adding value to society is the real measure we should pursue.” ACM TOSEM Editorial, 2013
  2. Design Science Methodology • Starting point: A problem context •

    Design steps (“change” the world) § Design an artifact to improve the problem context § Decompose problem, integrate other artifacts • Investigation steps (“understand” the world) § Answer knowledge questions about the artifact in question § Formulate ”theories” that inform design steps § Can contribute to ”the world’s knowledge” on similar problems 3 Roel Wieringa
  3. 4 Tech sector • $$$ income from ads • Billions

    of users • 10s of 1000s of developers (large tech corps) • Metrics driven • High developer autonomy Financial sector • €€€ interest on assets and liabilities • Billions of transactions • 1000s of developers (medium-sized bank) • Critical to society • Highly regulated Public sector • €€€ taxes • Billions of transactions • 1000s of developers (ministry) • All aspects of society • Public accountability Img: Maeslantkering, Rijkswaterstaat
  4. Public Sector Challenges • Complex (political) decision making • High

    demands on privacy, availability, transparency, inclusion, accessibility,… • Long (system, data) lifetimes • Accountability to parliament, voters • Each government system is different 5 Poor government digitalization undermines societal trust and democracy
  5. Advisory Council for IT Assessment (AcICT) • Dutch independent council

    (2015) • Advices ministers and parliament on risks and chances of success in complex government IT systems • Enshrined in law since 2024: § Govt obliged to submit systems for assessment, collaborate, and respond • All reports are public 6 https://www.adviescollegeicttoetsing.nl/
  6. Council Organization • Five cabinet-appointed council members § Experts from

    society, industry and academia • Supported by office of ~25 assessors • Assessment takes around six months • Data collection, interviews, analysis and advice formulation, fact checking, response from minister, … • Outcome: 8-10 page advice to minister 7
  7. A Word of Caution • Council focuses on large (>

    €5M) projects § From these, council selects high risk projects § This gives biased, distorted view of government IT • All sectors have failures, but these are less … public • The public sector also has plenty of successes • The public sector is full of hard-working, amazing professionals dedicated to making society better 8 Council members, 2024
  8. Assessment Framework Risk Areas 1. Business case, benefits, finance 2.

    Project organization and ownership 3. Risk management and project dependencies 4. Alignment business processes and IT solution 5. Scope control 6. Architecture, functional feasibility, technical realizability 7. Planning and realization 8. Procurement, tendering 9. Acceptance and transition to line 11 https://www.adviescollegeicttoetsing.nl/onze- werkwijze/documenten/publicaties/2021/12/01/toetskader-acict
  9. Levels of Impact Manual inspection of 100 reports Identified 3

    types of impact 1. MINOR: Continue, with suggestions for improvement 2. REVISE: Continue, with urgent interventions 3. MAJOR: Abort or major interventions 13
  10. Project Types • Build new system • Replace existing system

    • Adjust existing system substantially • Engage in major new procurement agreement 14
  11. Impact Differences per Project Type 15 Green field / replacement:

    • 35-40% revise • 35-40% major Evolution: 72% (13/18) minor impact
  12. Example 1: OpenVMS • Unemployment benefit systems S1 & S2:

    § From: Cobol + Codasyl on Itanium/OpenVMS § To: Java + relational database on Linux • Automated code conversion • 2020-2025, budget €36M • Assessment in 2023 § Half (€19M) of budget spent 16
  13. Advice Halfway assessment organization itself decided to terminate project •

    Hard to maintain code explosion: ”JOBOL” Advice: 1. S1 (no Codasyl): Replatform to Linux 2. S2: Develop multiple alternative scenarios 3. Draw lessons from failed conversion 17 https://www.adviescollegeicttoetsing.nl/documenten/publicaties/2023/12/21/bit-advies-programma-openvms
  14. Example 2: Traffic Management • Replace 26 traffic management systems

    • Adopt & customize solution based on commercial product • 2015-2026, budget €166M (originally €35M) • Assessment in 2022: Half (€83M) of budget spent 18
  15. Advice 1. Align ambition and capabilities 2. Take lead over

    supplier 3. Prioritize moving maintenance and operations to line 4. Organize fallback scenario 19
  16. Example 3: Funding Education • Distribute €32B/year over all Dutch

    educational institutions • Modernize current .NET systems § Target: Rule-based platform + Java • 2019-2024, budget €18M § €12M spent in 2022 • Assessments in 2021 and 2023 20
  17. Advice • Professionalize culture: § Governance & finance § Development

    & maintenance • Terminate use of (niche) rule- based platform • Invest in current .NET systems to safeguard continuity • Plan for stepwise modernization 21 https://www.adviescollegeicttoetsing.nl/onderzoeken/documenten/publicaties/2023/09/25/bit -advies-doorontwikkelen-applicatielandschap-bekostiging-2
  18. Recurring Advice Based on manual analysis of reports: 1. Reduce

    risk: Make it smaller 2. Articulate needs 3. Strengthen governance 4. Define mitigation measures 5. Invest in own capabilities 22 Egon Schiele, Leutnant Heinrich Wagner, Wikipedia
  19. The US Treasury • Handling $5.45 trillion in federal payments

    • Managing U.S. government debt • Collecting federal taxes (via Internal Revenue Service IRS) • Enforcing tax laws • Currency circulation • Oversight on financial sector 24 Alexander Hamilton
  20. IRS IT in 2023 Inflation Reduction Act: • 10 years,

    $80 billion extra for IRS • Become a “digital first” tax collector • Reduce $7 trillion of uncollected taxes • Allocate $4.8 billion for IT modernization • Grow IRS from 80,000 to 165,000 employees 25 Janet Yellen
  21. US Treasury IT in 2025 • DOGE team gained access

    to Treasury’s data and code, jeopardizing privacy and integrity • Intent to fire half of 100,000 IRS employees 26 “But the most alarming aspect [is …] the systematic dismantling of security measures that would detect and prevent misuse […] by removing the career officials in charge of those security measures and replacing them with inexperienced operators.“
  22. “Thoughts and Prayers” • Support civil servants who are keeping

    critical systems up and running • Train our students to act in accordance with codes of ethics and professional conduct (e.g. from Assoc. Computing Machinery ACM) • Resist the illusion that a posse of whiz kids can quickly solve deeply complex (IT) problems • Undermining government IT can bring a nation down 27 Timothy Snyder & Nora Krug. On Tyranny. Twenty Lessons from the Twentieth Century. Graphic Edition, 2021
  23. 28

  24. Artifacts Under Analysis • System under study in report §

    Capabilities of system created § Creation process / methodology • The AcICT assessment procedure § The assessment framework, the steps, … • The legal context of AcICT § Is it a useful instrument § Is it applicable in other contexts? 29
  25. Theory Building Theory: a belief about a pattern in phenomena

    that has survived testing against empirical facts and critical reviews by peers. • A conceptual framework, used to frame a research problem, describe phenomena, and analyze their structure • Generalizations about patterns in phenomena, used to explain the causes, mechanisms or reasons for phenomena. § Used to predict phenomena, or to justify artifact design 30
  26. 31 Advice as “theory” about a single system Theory about

    assessment procedures Theory about regulations on government IT quality
  27. Report Structure: Pyramid • Straight to the point: § Starts

    with the main message – the conclusion. • Title = Main message • Background / problem statement • 3-4 risks identified, one section each § Section title = risk • 3-4 recommendations, one section each § Section title = recommendation. • Apply recursively for more details 32
  28. A Boxed Summary on Page One Project success under pressure

    due to unexpected costs and duration A. Fine-grained solution takes too much time B. Development process is inefficient C. Costs of operations and maintenance underestimated https://www.adviescollegeicttoetsing.nl/onderzoeken/documenten/pu blicaties/2022/06/29/bit-advies-moderniseren-examens 33 We recommend targeted adjustments to: 1. Identify a simpler solution 2. Optimize the development process 3. Recalibrate the expected costs of operations and maintenance
  29. Report Structure as Design Science • Background: § Conceptual framework,

    empirical facts • Risks: § Mostly empirical factual observations about past • Recommendations: § “Theories” predicting effect of interventions • All rigorously structured to identify most urgent interventions 35
  30. “Methodological Issues” • Goal: Identify risks and interventions • Assessment

    framework provides conceptual framework • Substantive data: (project) documentation at all levels, interviews, presentations, incident reporting, source code, … • Research methods: § Data analysis, iterative hypothesis refinement, discussions with stakeholders, discussions among assessment team § Guided by assessment framework 36 Scope / generalizations: Single system
  31. Drawing Lessons from Reports (I) • Several ministries receive 2-3

    reports per year • Are there recurring problems? • Should the ministry intervene at a higher level? • Quarterly discussion with ministry • Annual report with recurring themes 37 Scope / generalizations: Series of systems
  32. Drawing Lessons from Reports (II) • And what if we

    look at all 125 reports? • Reports are public – wonderful data set • Today: Initial analysis of these reports • Methods used: § Manual labeling of reports § Descriptive statistics § Series of case studies on (recurring) problems and advice § Member checking with assessors 38 Scope / generalizations: Public sector IT
  33. Key to Success • Never take anything for granted §

    “Dig a little deeper!” • Take a step back; force yourself to see what has been missed § Key factors that bring everything together • Own it • Time box it 39 Essential traits of the empirical researcher
  34. AI to the Rescue? • Capabilities of foundation models are

    mind blowing • This will affect many aspects of society, including government IT • The ambitions of Artificial General Intelligence reach even higher • Will (generative) AI solve our problems? 41 Sam Altman (img src Wikipedia)
  35. Good Process Data? • Rich, consistent, public, historical data about

    government software development projects? § “Mining software repositories” for gold § An AI powerhouse • Unfortunately: § Government projects are very diverse § Very little consistent data across projects § Most projects developed closed source 42 Florence Nightingale, 1855 “Causes of mortality in the army in the east”
  36. Coding Productivity? • Improving productivity is of little use …

    … if you don’t know what you want • Benefits come when combined with shorter feedback cycles • More work for the “problem owner!” § Give feedback, take into production, manage organizational change • Done well, the benefits then become: § Value delivered earlier; lower costs of failure 43
  37. Language Models for the Government? • Government and its (IT)

    projects are document-heavy • LLMs can easily generate plausible-sounding text no one takes responsibility for § This will slow down rather than speed up government (IT) projects • What’s needed instead: § Clear sense of direction and crisp documentation § Consensus building and accountability 44 Img: JuBi, Wikipedia
  38. Focused AI Timnit Gebru (SaTML 2023): “We should build smaller-scale

    systems (that are well-scoped and well-defined) for which we can provide specifications for expected behavior, tolerance and safety protocols.” 45 Timnit Gebru, img Wikipedia https://x.com/NicolasPapernot/status/1623885641380425728
  39. Modernization • IT systems have life spans of 30+ years

    • Constant change in user expectations, law, system landscape, technology, … • Large-scale “modernizations” high risk • Council recommendations: 1. Invest in understanding current landscape 2. Modernize in small steps 3. Avoid by pro-active improvements 46
  40. Philology 48 • The love of words • In depth

    qualitative analysis • LTI – The Language of the Third Reich § Evolution of use of words 1920-1945 § Diary notes § News paper clips, movies, ads, books, … • Deeply personal qualitative analysis Victor Klemperer (1881-1960)
  41. Key Take Aways 1. Government digitalization is facing huge challenges

    that demand our attention 2. The challenges are multi-disciplinary and transcend developer productivity 3. Well-functioning IT systems require long term stable leadership and investments 4. Problem context and theory building guide the choice of empirical methods 49