Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Navigating the security compliance maze of an M...

Avatar for Uwe L. Korn Uwe L. Korn
September 30, 2025

Navigating the security compliance maze of an ML service

While everyone is talking about the m(e/a)ss of bureaucracy, we want to show you hands-on what you could need to be doing to operate an ML service. We will give an overview of things like ISO-27001 certifications, Cyber Resilience Act or AIBOMs. We want to highlight their impact/intention and give advice on how integrate them into your development workflow.

This talk is written from a practiconer's perspective and will help you set up your project to make your compliance department happy. It isn't meant as a deep-dive into the individual standards.

Avatar for Uwe L. Korn

Uwe L. Korn

September 30, 2025
Tweet

More Decks by Uwe L. Korn

Other Decks in Programming

Transcript

  1. Navigating the security compliance maze of an ML service Uwe

    Korn - PyData Paris – September 30, 2025
  2. About me – Uwe • CTO at EconAI startup QuantCo

    • Technical background from Data Engineering & ML • Involved in Apache {Arrow, Parquet} and conda-forge • PyData Südwest co-organizer • Recently went through ISO-27001 certification • Currently dealing with CRA (Cyber Resilience Act) Personal Blog LinkedIn xhochy · GitHub @xhochy on BlueSky
  3. 5 What is “compliance”? • Follow regulations/rules/laws/… that guide certain

    behaviour. • This should reduce risk and build trust for a certain aspect. • Compliance is checked by… ◦ “Compliance officer” at the own company ◦ An external auditor ◦ If something goes wrong…
  4. 6 Why care about this? • Most companies have staff

    for these topics (CISO, …), why should a developer care? ◦ “Compliance officers” care about the fulfillment of the criteria, not the implementation ◦ Taking these things into account during the design will ease fulfilment ◦ Knowing the background/context of these requests makes them easier to fulfill • If you write code that gets used, you are affected by some law/regulation • Being proactive will make it easier for you • Don't let people scare you. The lockdown solutions will fulfil the requirements but are not necessary ◦ They also create things like shadow IT, which can be even more critical
  5. 7 Care about what? As we're in the EU, we

    will only look at the most common ones here: • ISO-27001 (see also SOC-2), BSI C5 “Information Security Management System”; ensures you think about InfoSec sufficiently • DORA (“digital operational resilience for the financial sector”) • NIS 2 (“network and information systems”, “…measures for a high common level of cybersecurity across the Union”); applies to critical infrastructure (e.g. energy, transport, …) • CRA (“Cyber Resilience Act”) “Products with digital elements” (from smart bulbs to smartphones (apps)) • (AI Act, “AI models in high risk applications”) • …more to come.
  6. 10 Risk Assessment Find out what might impact you, how

    severe, then mitigate risk. • There are standards/guides for this, e.g. ISO-27005 or NIST 800-30 • Go through: ◦ Identification (“What could go wrong?”) ◦ Analysis (“How likely & bad?”; likelihood x impact matrix) ◦ Evaluation (“Is it serious (enough)?”) ◦ Treatment (“What to do about it?”) This should give you a priotised list of things to work/focus on. You can also use this as a framework to argue why you don’t do something.
  7. 11

  8. 12 Software Supply Chain This is a risk for any

    type of project, all are based on third-party software. • This can lead to an incident even if you don’t have (world-reachable) service • Attacks include ◦ Malicious Package Update ◦ Typosquatting ◦ Source Repository Takeovers ◦ Build Pipeline Compromise • This applies to all dependencies (prod & dev) • You have hundreds of dependencies, incl. Phantom Dependencies
  9. 13 Software Supply Chain – Common Mitigations (I/II) You can

    protect your project a bit better by: • Generate & Ingest SBOMs (“system bill of materials”) ◦ Make sure you know what you run (& get alerts) ◦ Generate SBOMs with e.g. syft • Dependency Pinning and Automated Updates ◦ Use pixi, Dependabot, … • Vulnerability Scans ◦ …with scanners like grype or Trivy
  10. 14 Software Supply Chain – Common Mitigations (II/II) You can

    protect your project a bit better by: • Dependency Vetting: Don’t just trust any package ◦ Manually review your most important ones ◦ Use heuristics, e.g. lookup their OpenSSF Scorecard scores ◦ Model Cards for third-party models • Incident Response Plans ◦ Vulnerabilities happen; ensure to be fast and avoid exploitation • Who is writing your code? Understand the communities that shape your ecosystem
  11. 15 Your own code is your greatest liability/risk “Surprisingly 90%

    of the bugs we discover are in our code” • Bugs ~ Risk • Dependencies are often well-tested & get external scrutiny • Your code is at the forefront of your application Thus: • Have clear processes like mandated code review • Secure Coding Standards and Training • For (externally) reachable services: Pentests • Automated tooling like linting or static analysis 🥰 Auditors love this!
  12. 18 Simple tabular ML pipeline “Let’s train a model on

    some CSV files” pixi init pixi add jupyterlab polars lightgbm scikit-learn pixi list | wc -l # ~> 160 lines, i.e. 159 dependencies Let’s guess: Are there really 159 dependencies?
  13. 19 Simple tabular ML pipeline – SBOM Run syft to

    generate an SBOM syft dir:.pixi/envs/default -o cyclonedx-json=sbom.cdx.json 2382 packages
  14. 20 Simple tabular ML pipeline – Vulnerabilities? Run grype to

    check for vulnerabilities grype dir:.pixi/envs/default It’s a fresh environment, so there are probably none…
  15. 22 Trust your software? Either manually check every dependency… …

    or at least trust some heuristics, e.g. the OpenSSF Scorecard, e.g https://scorecard.dev/viewer/?uri=github.com/Quantco/pixi-pack
  16. 23 Risk Assessment What could go wrong? Don’t constrain this

    to the compliance regime you want to fulfil! • Ingestion of malicious or malformed CSV data data poisoning, injection, malformed entries leading to crashes or misbehaviour • Model poisoning or adversarial manipulation training on poisoned or manipulated data • Use of vulnerable versions of Polars, LightGBM, or other dependencies third-party code vulnerabilities • Poor versioning / lack of traceability data, code, model
  17. 24 Risk Assessment Ingestion of malicious or malformed CSV data

    Likelihood: Depends on the data source (low to medium) Impact: High (could lead to corrupted model, security vulnerability) Treatment: • Validate and sanitize inputs • Use schema validation (types, ranges, …) • Limit privileges of the process (least privilege)
  18. 25 Risk Assessment Poor versioning / lack of traceability Likelihood:

    High Impact: Medium (could lead to unexplainable results) Treatment: • Version code (e.g. git), models (e.g. MLFlow) and data (use a catalogue) • Keep data snapshots for released models • Log data lineage • Ensure reproducible builds/pipelines
  19. 26 Secure your own code “Surprisingly 90% of the bugs

    we discover are in our code” Let’s fix this! • Ensure sufficient test coverage, run them regularly • Extend your tests, e.g. use hypothesis (or fuzzing) • pre-commit hooks • Encode best practices with automation, e.g. by using branch protections in GitHub • Document your design decisions • Pin dependencies and get automated update (PRs)