Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Netw...

LINEのネットワークオーケストレータをリニューアルした話 / LINE’s new Network Orchestrator

JANOG49 Meetingの登壇資料です
https://www.janog.gr.jp/meeting/janog49/lineorc/
講演者:福田 守昴
Verda Platform室 ネットワーク開発チーム所属

LINE Developers

January 27, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. @LINE ・Network Orchestrator development ・White box NOS development ・Telecom Infra

    Project ・IoT Gateway firmware development ・IoT protocol stack development ・enterprise NOS test and release engineering ・test automation system development Subaru Fukuda 2016.Apr - 2018.Apr 2018.Mar - 2019.Sep 2019.Oct - 2020.Oct 2020.Nov - NOW About Me
  2. What is Verda? 80,000+ Virtual Machine 40,000+ Baremetal 6,000+ Hypervisor

    NAT Load Balancer VM / Baremetal MySQL Elasticsearch Image Repo Shared Filesystem DNS App engine (like heroku) Controller And More… 3
  3. Problem Problem • Ansible server load is big • It

    takes a long time • manual operations are required. • To update database • To generate inventory • To run Ansible 9
  4. Agent Sync Config 11 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  5. Agent Sync Config 12 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  6. Agent Sync Config 13 Config Update Process 0) agent watch

    DB 1) operator update DB 2) agent detect the change 3) Update config (run Ansible)
  7. Agent Sync Config Config Update Process 0) agent watch DB

    1) operator update DB 2) agent detect the change 3) Update config (run Ansible) 14
  8. Agent Deployment Process PROVISION • SOME INITIAL SETUP • INSTALL

    Docker • DEPLOY AGENT ZTP SCRIPT • SETUP FOR SSH • PROVISION REQUEST 15
  9. Ansible Tag Cumulus Linux ARISTA target: localhost tag: cumulus target:

    localhost tag: arista - name: example-task1 XXX: XXXARG: "example" tags: cumulus - name: example-task1 XXX: XXXARG: "example" tags: arista 19 Environmental variable nos={cumulus | arista}
  10. Ansible Tag Cumulus Linux ARISTA target: localhost tag: cumulus target:

    localhost tag: arista - name: example-task1 XXX: XXXARG: "example" tags: cumulus - name: example-task1 XXX: XXXARG: "example" tags: arista 20
  11. Config Parameter Sheet 1. SWITCH • hostname, os-version, server-room, etc

    2. INTERFACE • mac, speed, mtu, ip, etc 3. BGP • AS, neighbor, peer-group 4. QOS • config for shaping 5. ROUTEMAP • ingress/egress routemap 6. PREFIXLIST • Ipv4/ipv6 prefixlist SWITCH INTERFACE BGP QOS ROUTEMAP PREFIXLIST 22
  12. Config Parameter Sheet 1. SWITCH • hostname, os-version, server-room, etc

    2. INTERFACE • mac, speed, mtu, ip, etc 3. BGP • AS, neighbor, peer-group 4. QOS • config for shaping 5. ROUTEMAP • ingress/egress routemap 6. PREFIXLIST • Ipv4/ipv6 prefixlist { "routemap-001": { "entries": [ { "action": "permit", "sequence": 10, "set_actions": [ { "action": "as-path prepend", "value": "auto auto auto auto auto" } ] } ] } … } EX)ROUTEMAP PARAMETER SHEET 23
  13. Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH ... SWITCH001/INTERFACE

    ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... ・・・ switch001 ・・・ 24
  14. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... switch001 ・・・ watch 25
  15. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... SWITCH002/SWITCH ... SWITCH002/INTERFACE ... SWITCH002/BGP ... ... ... switch001 watch 26 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS
  16. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 27 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible
  17. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 28 28 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS INTERFACE 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible
  18. SYNC-AGENT Handle Config Parameter Sheet KEY VALUE (JSON FORMAT) SWITCH001/SWITCH

    ... SWITCH001/INTERFACE ... SWITCH001/BGP ... SWITCH001/QOS ... SWITCH001/ROUTEMAP ... SWITCH001/PREFIXLIST ... ... ... switch001 watch 29 29 SWITCH INTERFACE ROUTEMAP PREFIXLIST BGP QOS 0) update SWITCH001/INTERFACE. 1) sync-agent detect the change and get the INTERFACE config pram sheet 2) sync-agent updates switch config ;run Ansible Update Config
  19. SYNC-AGENT Handle Config Parameter Sheet switch001 30 30 SWITCH INTERFACE

    ROUTEMAP PREFIXLIST BGP QOS - name: include config params include_vars: dir: ”CFG_PARAM_PATH" ・・・ playbook CFG_PARAM_PATH/XXX.json include_vars Imports Every Config Parameter Sheets As Ansible vars
  20. Vendor Agnostic? • Ensure operations • On Arista and Cumulus

    • On LINE’s Network • Need to change schema • When we introduce new vendor switches. • When we change our network architecture drastically. SWITCH INTERFACE BGP QOS ROUTEMAP PREFIXLIST 31
  21. Yang Schema RFC7951: JSON Encoding of Data Modeled with YANG

    YANG JSON define module interface { import ietf-inet-types { prefix "inet"; } import ietf-yang-types { prefix "yang"; } ... leaf mac_address { type yang:mac-address; } ... leaf-list ipv4 { type inet:ipv4-prefix; min-elements 0; } ... parameter sheet schema EXAMPLE SCHEMA 32
  22. Schema Driven Development CONFIG-PARAMETER CHANGE PROCESS 1. update schemas with

    yang 2. generate json schemas from yang schemas 3. deploy generated json schemas to API server API-SERVER make sure to validate the data just before update etcd. 34
  23. { ... "hostname": "SWITCH00X", "network_os": "cumulus", ... } DHCP Option

    Cumulus Linux ARISTA ... "mac": "xxxx.xxxx.xxxx", - "ip": "X.X.X.X/X" - "ztp-script option code": "XX", ... { ... "name": "eth0", "type": "management", "mac_address": "xxxx.xxxx.xxxx", "ip": ["X.X.X.X/X"], ... } config parameter sheet dhcpd.conf request response update dhcpd.conf 35
  24. Operator Trigger Config Update Process 0) agent watch DB 1)

    operator update DB 2) agent detect the change 3) Update config (run Ansible) 37
  25. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... server-config parameter 41 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  26. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... 42 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  27. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Detect SERVER001 43 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  28. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Get (key=SERVER001) 44 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  29. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... LLDP Add “192.0.2.0/24” to the prefix-list 45 CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist; run Ansible 6) watch SERVER001 SERVER 2) connect to a switch
  30. Connection Trigger KEY VALUE ... ... SERVER001 { ”hostname": SERVER001,

    “ipv4_prefixes”: [192.0.2.0/24], “ipv6_prefixes”: [] } ... ... CONFIG PARAM DB 1) Store prefixes for SERVER001 SYNC AGENT 3) Detect the connection of SERVER001 by LLDP 4) Get the prefixes for SERVER001 5) Update whitelist ; run Ansible 6) watch SERVER001 LLDP watch watch SERVER 2) connect to a switch 46
  31. Group Config Parameter Sheet KEY VALUE SW001/SWITCH { "hostname": "SW001",

    "switch_groups": ["GRP-A"], … } ... ... /SWGRP/GRP-A { "ipv4_prefixes": [{"action": "deny","prefix": "192.0.2.0/24"}], … } SW001 config-pram type=group Introduce new config-param; type=group . Multiple switches watch the entry but ... Batch change is dangerous. watch x 3 watch x1 50
  32. Sync-Group Config Parameter Sheet SW001 KEY VALUE SW001/SWITCH ... ...

    ... /SWGRP/GRP-A ... ... ... /SWGRP/GRP-A/SYNC_GRP … ... ... config-pram type=sync-group Introduce new config-param; type=group . Also, introduce new config-param; type=sync-group . Multiple switches watch the sync-group entry. 51
  33. Sync-Group Config Parameter Sheet JSON = ・・・ X group-sync-state-machine config

    parameter state-machine switches in the group STATE • DONE • NOT-YET • SYNC 52
  34. Group Sync 53 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  35. Group Sync 54 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  36. Group Sync 55 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  37. Group Sync 56 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  38. Group Sync 57 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  39. Group Sync 58 0) every switch’s state is DONE. 1)

    operator update SWG-A’s config param. 2) API-SERVER set every switch’s state to NOT-YET. 3) API-SERVER set SW001’s state to SYNC 4) sync-agent fetch SWG-A’s config param. 5) sync-agent update SW001’s config 6) sync-agent set SW001’s state to DONE 7) operator set the others to SYNC. 8) sync-agent fetch SWG-A’s config param. 9) sync-agent update switch’s config. 10) sync-agent set the both states to DONE.
  40. How To Join Group 60 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  41. How To Join Group 61 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  42. How To Join Group 62 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  43. How To Join Group 63 1) Operator updates SW004’s switch_groups.

    2) sync-agent fetches SWG-A PARAM. 3) sync-agent updates switch config. 4) sync-agent add SW004’s state machine. 5) sync-agent watch sync-group parameter.
  44. Monitoring • Any application which we develop includes prometheus exporter

    function. • service discovery by consul • Slack notification 66
  45. Future Work • Rollback feature • Dry-Run feature • Introduce

    k8s CR 2020.May 2021.Feb DEVELOPMENT MAINTENANCE NOW 70