Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CHINOG 12 - Building Trustworthy Network Automa...

CHINOG 12 - Building Trustworthy Network Automation, From Principles to Practice CHINOG 12

Trust is essential for successful network automation adoption.

When automation platforms exhibit predictable behaviors and transparent processes, teams can confidently delegate critical network operations. Building trustworthy automation doesn't happen by itself, it needs to be baked into the design of every workflows. This technical session examines core principles that build trust, including idempotency, declarative workflows, and robust version control. Using practical examples from production environments, we'll analyze how specific technical decisions affect automation reliability and team confidence. The presentation covers key implementation patterns like state verification, diff-based changes, and failure handling. Attendees will learn concrete approaches for building automation platforms that network teams can trust and rely on daily.

Avatar for Damien Garros

Damien Garros

May 16, 2025
Tweet

More Decks by Damien Garros

Other Decks in Technology

Transcript

  1. Damien Garros - CHINOG 12 - May 2025 Building Trustworthy

    Network Automation, From Principles to Practice
  2. Damien Garros - CHINOG 12 - May 2025 About me

    : Damien Garros Co-Founder and CEO of OpsMill Creator of Infrahub, a next generation Infrastructure data management platform (Source of Truth) Focused on Infrastructure as Code, Automation & Observability for 12+ years Previously leading Technical Architecture at Network to Code @dgarros damiengarros
  3. Damien Garros - CHINOG 12 - May 2025 Trust is

    essential for successful network automation adoption.
  4. Damien Garros - CHINOG 12 - May 2025 Press the

    button to upgrade your network Software Upgrade Automation User
  5. Damien Garros - CHINOG 12 - May 2025 Software Upgrade

    The person who developed it probably has a completely different perspective on it Automation Developer
  6. Damien Garros - CHINOG 12 - May 2025 Effort to

    build automation workflows Predictable Manageable Transparent Reliable Human Friendly Working Playbook Simple What we often focus on What is required to build Trust
  7. Damien Garros - CHINOG 12 - May 2025 Which cars

    do you trust the most ? Another perspective on this topic Predictable Manageable Reliable Human Friendly
  8. Damien Garros - CHINOG 12 - May 2025 Main Principles

    to build Trust Predictable Automation should produce consistent and repeatable outcomes every time it runs. Manageable Systems and workflows should be easy to configure, control, and update without hidden complexity. Transparent Automation should clearly show what it will do and what it has done — no surprises. Simple Solutions should avoid unnecessary complexity, making them easier to understand, audit, and maintain. Reliable Automation must handle failures gracefully and ensure that critical operations complete successfully. Human Friendly Interfaces and experiences should be designed with people in mind — intuitive, safe, and supportive of decision-making. Trust comes from visibility, control, and graceful failure handling — not just from correct execution.
  9. Damien Garros - CHINOG 12 - May 2025 Built on

    Mistakes. Refined by Experience. This presentation present some hard-earned knowledge based on years of trying and making mistakes. Building automation that’s predictable, manageable, transparent, and reliable isn’t easy. It takes time, and it takes care — but every step forward matters.
  10. Damien Garros - CHINOG 12 - May 2025 Idempotency is

    one of the cornerstone of reliability and simplicity in automation systems. Definition running the same operation multiple times has the same effect as running it once. Idempotency
  11. Damien Garros - CHINOG 12 - May 2025 Example of

    Idempotency in networking I need an IP address -> I need an IP address -> I need an IP address -> <- 10.0.0.1 <- 10.0.0.2 <- 10.0.0.3 NOT idempotent I need an IP address -> I need an IP address -> I need an IP address -> <- 10.0.0.1 <- 10.0.0.1 Idempotent <- 10.0.0.1
  12. Damien Garros - CHINOG 12 - May 2025 Example of

    Idempotency in networking My name is Bob and I need an IP address -> <- 10.0.0.1 <- 10.0.0.1 Idempotency uses a declarative approach to move the complexity of managing the state from the client .. to the server The example below works because the server is keeping the information that Bob was previously allocated the IP address 10.0.0.1 The laptop doesn’t need to know the current state of the system. The complexity is managed within the server to understand what needs to be done. My name is Bob and I need an IP address -> Bob = 10.0.0.1
  13. Damien Garros - CHINOG 12 - May 2025 Dry Runs

    Definition Show users exactly what will change before anything is executed Builds confidence and reduces fear of unintended consequences.
  14. Damien Garros - CHINOG 12 - May 2025 Dry Run

    mode (AKA check mode) Before executing any changes, the automation shows exactly what it would do, without actually doing it. This gives the operator a chance to review, approve, and catch mistakes early. “Here’s the diff - do you want to proceed?” Apply Dry Run
  15. Damien Garros - CHINOG 12 - May 2025 Dry Run

    mode - examples Ansible includes 2 options --diff & --check Check each modules for support @@ -7,7 +7,7 @@ access-list 101 permit tcp any host 192.168.1.1 eq 80 access-list 101 permit tcp any host 192.168.1.1 eq 443 -access-list 101 permit ip any any +access-list 101 deny ip any any access-list 101 remark End of ACL # aws_instance.example will be created + ami = "ami-abc123" + instance_type = "t3.micro" Terrarform plan, a built-in feature that is supported on all providers kubectl diff or ArgoCD show diffs between current cluster state and the desired YAML. spec: replicas: 2 -> 3
  16. Damien Garros - CHINOG 12 - May 2025 Transactional Definition

    Group changes so they either all succeed or can be rolled back cleanly if something fails. Prevents partial or broken changes
  17. Damien Garros - CHINOG 12 - May 2025 Transactional Transactional

    automation means grouping a set of changes so they either: All succeed (commit) → and the system moves to the new desired state Or none are applied (rollback) → leaving the system unchanged if something fails If failure occurs partway through, the automation ensures no “half-applied” or “broken” states remain. Rollback capabilities extend this by allowing the system to revert changes after they have been committed if issues are detected later.
  18. Damien Garros - CHINOG 12 - May 2025 Design Principles

    to build Trust Predictable Idempotent Manageable Transparent Simple Reliable Human Friendly Dry Run Main Principles Design Principles Transactional
  19. Damien Garros - CHINOG 12 - May 2025 Virtuous circle

    of Design Principles Idempotent Dry Run Transactional
  20. Damien Garros - CHINOG 12 - May 2025 Tools and

    Technologies that enable Trustworthy Automation
  21. Damien Garros - CHINOG 12 - May 2025 Tools and

    Technologies to build Trust Predictable Idempotent Manageable Transparent Simple Reliable Human Friendly Dry Run Testing Main Principles Design Principles Version Control Declarative Vs Imperative Tools and Technologies Transactional
  22. Damien Garros - CHINOG 12 - May 2025 Declarative Vs

    Imperative Imperative HOW Focuses on actions Declarative - WHAT Declarative WHAT Focuses on outcomes
  23. Damien Garros - CHINOG 12 - May 2025 Declarative Vs

    Imperative Imperative - HOW • Manually describe the step-by-step recipe. • If something goes wrong halfway, state may be inconsistent. Focuses on actions Declarative - WHAT • You describe the desired end state, not how to get there. • Easier to make idempotent and retry safely. Focuses on outcomes configure terminal interface GigabitEthernet0/1 switchport access vlan 10 exit Exit write memory interface: name: GigabitEthernet0/1 vlan: 10
  24. Damien Garros - CHINOG 12 - May 2025 Declarative Vs

    Imperative Declarative Imperative Configs CLI
  25. Damien Garros - CHINOG 12 - May 2025 Imperative Method

    Switch Vendor C Cloud G Firewall Vendor F Router Vendor J Firewall Vendor P Workflow A Workflow B Workflow C Workflow D
  26. Damien Garros - CHINOG 12 - May 2025 Declarative Vs

    Imperative Imperative workflows are composed of multiple steps, the more steps, the higher the complexity Number of steps in a workflow
  27. Damien Garros - CHINOG 12 - May 2025 Declarative Method

    Intent Store / Source of Truth Switch Vendor C Cloud G Firewall Vendor F Router Vendor J Firewall Vendor P Agent Agent Agent Agent Agent Workflow A Workflow B Workflow C Workflow D
  28. Damien Garros - CHINOG 12 - May 2025 Version Control

    Version control allows changes to be: • Prepared in isolation • Safely validated • Reviewed and only then integrated into the main automation environment.
  29. Damien Garros - CHINOG 12 - May 2025 Changes are

    done in a branch Change Test / Verify Review Change Test / Verify Review Deploy Deploy
  30. Damien Garros - CHINOG 12 - May 2025 Main benefits

    of Version Control Auditability and Traceability • See who changed what, when, and why. • Essential for post-mortems and compliance • Makes operations more transparent and safe Collaboration and Review (Change Management) CI/CD Pipelines Atomic changes • Team members can propose changes via PR • Prevents risky or unreviewed changes from being pushed directly into production. • Automation workflows can be triggered automatically • Changes can be tested and validated automatically before being deployed • Changes are grouped and committed as a single unit. • There is no “partial change” state
  31. Damien Garros - CHINOG 12 - May 2025 Testing Testing

    pushes you to design applications and workflows that are modular, observable, and deterministic. It encourages clear boundaries, clean inputs and outputs, and repeatable behaviors. Testable systems are a design choice
  32. Damien Garros - CHINOG 12 - May 2025 Testing Unit

    tests Integration tests End 2 End tests Function Workflow / API UI Devices
  33. Damien Garros - CHINOG 12 - May 2025 Automation workflow

    testing Reduce the complexity of your workflow Increase the test coverage
  34. Damien Garros - CHINOG 12 - May 2025 Select the

    right stack Ensure the libraries / tools you are dependent on provides Programmable interfaces Declarative behavior Developer Experience Idempotency Test friendly interfaces Traceability & Logging
  35. Damien Garros - CHINOG 12 - May 2025 The 3

    primary attributes, classify your data Role Capture the primary function of an object Status Kind Capture all the stages of the lifecycle of an object Capture the nature of an object
  36. Damien Garros - CHINOG 12 - May 2025 Enforce business

    processes as part of your automation workflows Maintenance windows are designed to ensure that no disruptive actions will be applied during business hours. Similar rules should be embedded directly within your playbook Ideally filter the valid target devices at the inventory level - Only arista devices - that are in maintenance mode
  37. Damien Garros - CHINOG 12 - May 2025 Enforce business

    processes as part of your automation workflows --- - name: "Upgrade Software image on Arista Devices" hosts: platform_arista gather_facts: false tasks: - name: "Validate if the device is in maintenance mode" meta: "end_play" run_once: true when: - "device.status != 'maintenance'" --- - name: "Upgrade Software image on Arista Devices" hosts: platform_arista:&status_maintenance gather_facts: false tasks: - name: "Upgrade Software image" ... Option 2 - Inline Validation Option 1 - Limited Inventory
  38. Damien Garros - CHINOG 12 - May 2025 Provide safe

    default options ─ fortinet ├── pb.policies.apply.yml ├── pb.policies.check.yml ─ load_balancers_external ├── pb.config.vips.apply.yml ├── pb.config.vips.check.yml Create different playbook for the same workflow but with different outcome. - Call out safe playbooks explicitly - Ensure default values are always Safe - Activate diff mode by default Prepare the change ansible-playbook pb.policies.yml -–check –-diff Apply the change ansible-playbook pb.policies.yml
  39. Damien Garros - CHINOG 12 - May 2025 Abstract Building

    Trustworthy Network Automation, From Principles to Practice Trust is essential for successful network automation adoption. When automation platforms exhibit predictable behaviors and transparent processes, teams can confidently delegate critical network operations. Building trustworthy automation doesn't happen by itself, it needs to be baked into the design of every workflows. This technical session examines core principles that build trust, including idempotency, declarative workflows, and robust version control. Using practical examples from production environments, we'll analyze how specific technical decisions affect automation reliability and team confidence. The presentation covers key implementation patterns like state verification, diff-based changes, and failure handling. Attendees will learn concrete approaches for building automation platforms that network teams can trust and rely on daily.