Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MetalLB+FRR-k8s BGP Advertisement

Avatar for orimanabu orimanabu
November 23, 2025

MetalLB+FRR-k8s BGP Advertisement

OpenShiftでMetalLB+FRR-k8sを使ってLoadBalancer ServiceをBGP広告する資料です

Avatar for orimanabu

orimanabu

November 23, 2025
Tweet

More Decks by orimanabu

Other Decks in Technology

Transcript

  1. はじめに ▸ 本資料は、MetalLB + FRR-k8sに関して2025年11月時点の情報を元に作成した資料です ▸ OpenShift v4.20にMetalLB Operatorをインストールして検証しました ・

    EgressService以外は、OpenShiftでないのKubernetes環境でも同じように動くと思います ▸ upstreamのドキュメントにはFRR-k8sが “Experimental” と表現されていますが、 v4.17以降のOpenShiftに MetalLB Operatorを入れるとデフォルトで FRR-k8s backend設定になります 3 https://metallb.io/concepts/bgp/#frr-k8s-mode
  2. MetalLB ▸ クラウドのロードバランサーサービスがない環境 ) で type: loadbalancer のServiceを使うための仕組 み ▸

    2つの動作モード ・ L2モード: ・ GARPを飛ばすことでExternal IPへのトラフィックを吸い込む ・ どれか1台のノードがServiceのExternal IPを処理する ・ BGPモード: ・ External IPをBGPで広告する ・ 複数のノードがBGPで広告すると、対向ルータによって ECMPでロードバランスできる 4 本資料の対象は BGPモード
  3. MetalLBのBGPモード ▸ 3種類のBGPバックエンドから選択 ・ Native ・ MetalLBの初期実装、今はほとんど使われていない ・ FRR ・

    BGPスピーカとしてFRRoutingを使用する ・ speaker DaemonSet Podのサイドカーコンテナとして frrが動く ・ FRR-K8s ・ frrをspeaker Podから分離して、MetalLB以外のコンポーネントからも frrを使えるようにした新し い仕組み ・ MetalLB Operatorの内部実装的には2種類のデプロイ方法がある ・ MetalLB Operatorが直接frr-k8sをデプロイする ・ OpenShift上でかつ最近のバージョンであれば、 OpenShiftのCluster Network Operatorに依頼してfrr-k8sをデプロイしてもらう 5 本資料の対象は こっちのデプロイ方式
  4. FRR Configuration FRR Configuration 登場人物 6 MetalLB Operator controller speaker

    Cluster Network Operator BGPPeer BGP Advertisement FRR Configuration frr-k8s frr-k8s frr frr config metallb-system openshift-frr-k8s Custom Resource Pod Namespace Container DaemonSet IPAddress Pool MetalLB Network openshift-ovn-kubernetes ovnkube-node ※ OVN-Kubernetesが生成する FRRConfigurationについては本資料では扱っていません openshift-network-operator Manage M anage Manage
  5. r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24

    172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 検証環境 ▸ OpenShiftのノード(cp0..2, wk0..4)はルータr2配下にいて、 r2→r1経由でインターネットに出る。VRF Lite検証時は追加 NICをr3に接続する ・ cp[0-2]: 172.18.20.10[0-2] ・ wk[0-4]: 172.18.20.11[0-4] ▸ r1, r2, r3はループバックアドレスでBGPピアを張っている ・ r1: 172.18.0.1 ・ r2: 172.18.0.2 ・ r3: 172.18.0.3 ▸ 172.18.99.0/24 は管理用裏ネットワークのアドレス ▸ AS番号やその他のアドレスは右図のとおり ▸ r1, r2, r3はVyOS 1.5-stream-2025-Q2 ▸ OpenShiftはv4.20.2 Router NAT Switch OpenShift node VM Container 8
  6. VyOS configs r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 set interfaces ethernet eth0 address '172.18.99.101/24' set interfaces ethernet eth1 address '172.18.10.2/24' set interfaces ethernet eth2 address '172.18.12.1/24' set interfaces ethernet eth3 address '172.18.13.1/24' set interfaces loopback lo address '172.18.0.1/32' set nat source rule 100 outbound-interface name 'eth1' set nat source rule 100 source address '0.0.0.0/0' set nat source rule 100 translation address 'masquerade' set protocols bgp address-family ipv4-unicast set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.2 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.2 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.2 remote-as '65102' set protocols bgp neighbor 172.18.0.2 update-source 'lo' set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast default-originate set protocols bgp neighbor 172.18.0.3 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.3 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.3 remote-as '65103' set protocols bgp neighbor 172.18.0.3 update-source 'lo' set protocols bgp parameters router-id '172.18.0.1' set protocols bgp system-as '65101' set protocols ospf area 0.0.0.0 network '172.18.0.1/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.1' set protocols static route 0.0.0.0/0 next-hop 172.18.10.1 set interfaces ethernet eth0 address '172.18.99.103/24' set interfaces ethernet eth1 address '172.18.13.2/24' set interfaces ethernet eth2 address '172.18.30.1/24' set interfaces ethernet eth2 vif 2001 address '172.19.21.1/24' set interfaces ethernet eth2 vif 2002 address '172.19.22.1/24' set interfaces loopback lo address '172.18.0.3/32' set protocols bgp address-family ipv4-unicast network 172.18.30.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.3' set protocols bgp system-as '65103' set protocols ospf area 0.0.0.0 network '172.18.0.3/32' set protocols ospf area 0.0.0.0 network '172.18.13.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.3' 9 set interfaces ethernet eth0 address '172.18.99.102/24' set interfaces ethernet eth1 address '172.18.12.2/24' set interfaces ethernet eth2 address '172.18.20.1/24' set interfaces loopback lo address '172.18.0.2/32' set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' set protocols ospf area 0.0.0.0 network '172.18.0.2/32' set protocols ospf area 0.0.0.0 network '172.18.12.0/24' set protocols ospf interface lo passive set protocols ospf parameters router-id '172.18.0.2'
  7. 各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 cp0..2 wk0..4 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 r2 r3 r1 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 11251 11255 1942 0 0 6d13h01m 1 3 N/A 172.18.0.3 4 65103 11361 11248 1942 0 0 6d13h01m 1 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11255 11250 1940 0 0 6d13h01m 2 3 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 11247 11361 12427 0 0 6d13h01m 2 3 N/A 10 [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i
  8. OpenShift環境 ▸ v4.20.2 UPI on libvirt VMs ・ (注意) BGP機能がサポートされ

    るのはベアメタルのみです ▸ 追加Operator ・ MetalLB Operator ・ Nmstate ▸ Cluster Network Operatorの設定を右 のように変更 $ oc get network.operator cluster -o yaml | yq .spec additionalRoutingCapabilities: providers: - FRR clusterNetwork: - cidr: 10.128.0.0/16 hostPrefix: 24 defaultNetwork: ovnKubernetesConfig: egressIPConfig: {} gatewayConfig: ipForwarding: Global ipv4: {} ipv6: {} routingViaHost: true genevePort: 6081 ipsecConfig: mode: Disabled mtu: 1400 policyAuditConfig: destination: "null" maxFileSize: 50 maxLogFiles: 5 rateLimit: 20 syslogFacility: local0 routeAdvertisements: Enabled type: OVNKubernetes deployKubeProxy: false disableMultiNetwork: false disableNetworkDiagnostics: false logLevel: Normal managementState: Managed observedConfig: null operatorLogLevel: Normal serviceNetwork: - 10.200.0.0/16 unsupportedConfigOverrides: null useMultiNetworkPolicy: false BGPを使うのに必要 (MetalLB Operatorが自 動的に設定します) VRF設定時に必要 BGPを使うのに必要 (MetalLB Operatorが自 動的に設定します) 11
  9. OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: v1 kind: Namespace metadata: name: proj1 apiVersion: v1 kind: Service metadata: labels: app: hello name: hello-lb-l3 annotations: metallb.io/address-pool: pool-l3 metallb.io/loadBalancerIPs: 172.19.20.181 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Service Deployment 65801 13 hello-lb-l3 BGP Adv: MetalLB
  10. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 14 hello-lb-l3 apiVersion: metallb.io/v1beta2 kind: BGPPeer metadata: name: bgppeer-r2 namespace: metallb-system spec: myASN: 65801 peerAddress: 172.18.20.1 peerASN: 65102 #ebgpMultiHop: true nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: bgpadv1 namespace: metallb-system spec: ipAddressPools: - pool-l3 peers: - bgppeer-r2 nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" BGPPeer BGPAdvertisement apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: namespace: metallb-system name: pool-l3 spec: addresses: - 172.19.20.181-172.19.20.189 autoAssign: false IPAddressPool BGP Adv: MetalLB
  11. MetalLBが生成するFRRConfiguration ▸ MetalLB用のFRRConfigurationは全ノード分生成さ れる 15 $ oc -n openshift-frr-k8s get

    frrconfiguration NAME AGE metallb-cp0 4h30m metallb-cp1 4h30m metallb-cp2 4h30m metallb-wk0 4h30m metallb-wk1 4h30m metallb-wk2 4h30m metallb-wk3 4h21m metallb-wk4 4h30m $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk0 -o yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: ... spec: bgp: routers: [] nodeSelector: matchLabels: kubernetes.io/hostname: wk0 raw: {} ▸ MetalLBの経路広告をしていないノードについては、 中身はからっぽ BGP Adv: MetalLB
  12. MetalLBが生成するFRRConfiguration 16 $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o

    yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: ... spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} ▸ MetalLBの経路広告をしているノードの FRRConfiguration BGP Adv: MetalLB
  13. 対向ルータr2の設定 17 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [email protected]:~$ show configuration commands | match bgp set protocols bgp address-family ipv4-unicast network 172.18.20.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp neighbor 172.18.20.113 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.113 remote-as '65801' set protocols bgp neighbor 172.18.20.114 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.20.114 remote-as '65801' set protocols bgp parameters router-id '172.18.0.2' set protocols bgp system-as '65102' r2から各ノードへの ピア設定を追加 BGP Adv: MetalLB
  14. 各ルータのBGP neighbor status, BGP tables 18 r1 net10 net12 net13

    r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 25711 25705 2017 0 0 02w2d13h 2 4 N/A 172.18.0.3 4 65103 25814 25697 2017 0 0 02w2d13h 1 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.19.20.181/32 172.18.0.2 0 65102 65801 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 25696 25814 12505 0 0 02w2d13h 3 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.19.20.181/32 172.18.0.1 0 65101 65102 65801 i r1 r2 r3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 25705 25710 2028 0 0 02w2d13h 2 4 N/A 172.18.20.113 4 65801 1758 1797 2028 0 0 00:28:20 1 4 N/A 172.18.20.114 4 65801 1754 1785 2028 0 0 00:28:20 1 4 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *= 172.19.20.181/32 172.18.20.113 0 0 65801 i *> 172.18.20.114 0 0 65801 i Serviceの/32の経路 BGP Adv: MetalLB
  15. frrのrunning-config 19 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 deny any ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 172.19.20.181/32 ! ... ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit ! ... ! route-map 172.18.20.1-in permit 3 match ip address prefix-list 172.18.20.1-inpl-ipv4 exit ! route-map 172.18.20.1-in permit 4 match ipv6 address prefix-list 172.18.20.1-inpl-ipv4 exit ! ServiceのExternal IPを 広告 対向ルータr2との ピア設定 カスタムリソース FRRNodeState から、各ノードの frrの running-configが見れる 外から広告された経路は 受け取らない BGP Adv: MetalLB
  16. ノードwk3のfrrにvtyshで入る 20 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 1, local router ID is 172.18.20.113, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 172.19.20.181/32 0.0.0.0 0 32768 i Displayed 1 routes and 1 total paths 対向ルータr2との ピアリング状況 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.113, local AS number 65801 vrf-id 0 BGP table version 1 RIB entries 1, using 192 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 52 45 0 0 0 00:40:07 0 1 N/A Total number of neighbors 1 BGP Adv: MetalLB
  17. ServiceのIPアドレスに直接アクセスする 21 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 PodのIPアドレス $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-zkwst 1/1 Running 2 9d 10.128.7.14 wk3 <none> <none> [test@testvm0 ~]$ sudo tcptraceroute -n 172.19.20.181 80 Running: traceroute -T -O info -n -p 80 172.19.20.181 traceroute to 172.19.20.181 (172.19.20.181), 30 hops max, 60 byte packets 1 172.18.10.2 0.289 ms 0.259 ms 0.246 ms 2 172.18.12.2 0.691 ms 0.675 ms 0.654 ms 3 172.19.20.181 1.869 ms 1.851 ms 1.837 ms 4 * * * 5 * * * 6 172.19.20.181 12.712 ms 10.887 ms 12.527 ms 7 172.19.20.181 <syn,ack> 12.696 ms 12.423 ms 12.352 ms [test@testvm0 ~]$ curl http://172.19.20.181 Hello, World! Timestamp: 2025/11/21 02:33:12 Hostname: hello-c84644886-zkwst LocalAddress: 10.128.7.14 Gateway: 10.128.7.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.19.20.181 RemoteAddress: 100.64.0.8:47212 Service のIPアドレスに直接アクセス Service のIPアドレスにtcptraceroute Service のIPアドレス $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-lb-l3 LoadBalancer 10.200.182.169 172.19.20.181 80:30224/TCP 141m BGP Adv: MetalLB
  18. OpenShiftの設定 (1) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 23 lbsvc-vrf1 apiVersion: v1 kind: Namespace metadata: name: proj1 apiVersion: v1 kind: Service metadata: labels: app: hello name: lbsvc-vrf1 annotations: metallb.io/address-pool: pool-vrf1 metallb.io/loadBalancerIPs: 172.19.20.190 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: apps/v1 kind: Deployment metadata: labels: app: hello name: hello spec: replicas: 1 selector: matchLabels: deployment: hello template: metadata: labels: deployment: hello spec: containers: - image: quay.io/manabu.ori/hello name: hello nodeSelector: node-role.kubernetes.io/worker-virt: "" Namespace Service Deployment MetalLB+VRF
  19. OpenShiftの設定 (2) r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 24 lbsvc-vrf1 apiVersion: metallb.io/v1beta2 kind: BGPPeer metadata: name: bgppeer-vrf1 namespace: metallb-system spec: myASN: 65801 peerAddress: 172.19.11.1 peerASN: 65103 vrf: vrf1 #ebgpMultiHop: true nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: bgpadv-vrf1 namespace: metallb-system spec: ipAddressPools: - pool-vrf1 peers: - bgppeer-vrf1 nodeSelectors: - matchLabels: node-role.kubernetes.io/worker-virt: "" BGPPeer BGPAdvertisement apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: namespace: metallb-system name: pool-vrf1 spec: addresses: - 172.19.20.190-172.19.20.194 autoAssign: false IPAddressPool MetalLB+VRF
  20. OpenShiftの設定 (3) r1 net10 12 net13 r2 r3 20 .18.10.0/24

    2.0/24 172.18.13.0/24 0.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 25 lbsvc-vrf1 routes: config: - destination: 0.0.0.0/0 metric: 150 next-hop-address: 172.19.11.1 next-hop-interface: vlan1001 table-id: 11 route-rules: config: - ip-to: 10.200.0.0/16 priority: 998 route-table: 254 - ip-to: 10.128.0.0/16 priority: 998 route-table: 254 - ip-to: 169.254.0.0/17 priority: 998 route-table: 254 apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: enp3s0-wk3 spec: nodeSelector: kubernetes.io/hostname: wk3 desiredState: interfaces: - name: vrf1 type: vrf state: up vrf: port: - vlan1001 route-table-id: 11 ipv4: dhcp: false enabled: false - name: vlan1001 type: vlan state: up ipv4: address: - ip: 172.19.11.113 prefix-length: 24 dhcp: false enabled: true vlan: base-iface: enp3s0 id: 1001 - name: enp3s0 type: ethernet state: up ipv4: dhcp: false enabled: false MetalLB+VRF
  21. - asn: 65801 neighbors: - address: 172.19.11.1 asn: 65103 disableMP:

    false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.190/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.190/32 vrf: vrf1 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} MetalLBが生成するFRRConfiguration 26 $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o yaml apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: creationTimestamp: "2025-11-21T00:20:43Z" generation: 25 name: metallb-wk3 namespace: openshift-frr-k8s resourceVersion: "27291209" uid: 5950248d-a6d8-4ee9-bed0-247e6daf24ef spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 vrf1のコンフィグ MetalLB+VRF
  22. 対向ルータr3の設定 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 27 lbsvc-vrf1 [email protected]:~$ sh conf comm | match bgp set protocols bgp address-family ipv4-unicast network 172.18.30.0/24 set protocols bgp address-family ipv4-unicast network 172.19.11.0/24 set protocols bgp neighbor 172.18.0.1 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.18.0.1 ebgp-multihop '2' set protocols bgp neighbor 172.18.0.1 remote-as '65101' set protocols bgp neighbor 172.18.0.1 update-source 'lo' set protocols bgp neighbor 172.19.11.113 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.19.11.113 remote-as '65801' set protocols bgp neighbor 172.19.11.114 address-family ipv4-unicast soft-reconfiguration inbound set protocols bgp neighbor 172.19.11.114 remote-as '65801' set protocols bgp parameters router-id '172.18.0.3' set protocols bgp system-as '65103' r3から各ノードへの ピア設定を追加 [email protected]:~$ sh conf comm | match vif set interfaces ethernet eth2 vif 1001 address '172.19.11.1/24' MetalLB+VRF
  23. 各ルータのBGP neighbor status, BGP tables r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 28 lbsvc-vrf1 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.2 4 65102 27509 27500 2025 0 0 02w3d18h 2 6 N/A 172.18.0.3 4 65103 27609 27492 2025 0 0 02w3d18h 3 6 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 172.18.20.0/24 172.18.0.2 0 0 65102 i *> 172.18.30.0/24 172.18.0.3 0 0 65103 i *> 172.19.11.0/24 172.18.0.3 0 0 65103 i *> 172.19.20.181/32 172.18.0.2 0 65102 65801 i *> 172.19.20.190/32 172.18.0.3 0 65103 65801 i [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 27491 27609 12513 0 0 02w3d18h 3 6 N/A 172.19.11.113 4 65801 1019 1024 12513 0 0 16:50:26 1 6 N/A 172.19.11.114 4 65801 1019 1024 12513 0 0 16:50:27 1 6 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 172.18.0.1 0 65101 65102 i *> 172.18.30.0/24 0.0.0.0 0 32768 i *> 172.19.11.0/24 0.0.0.0 0 32768 i *> 172.19.20.181/32 172.18.0.1 0 65101 65102 65801 i *> 172.19.20.190/32 172.19.11.113 0 0 65801 i *= 172.19.11.114 0 0 65801 i r1 r2 r3 [email protected]:~$ show bgp summary ... Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.0.1 4 65101 27500 27508 2044 0 0 02w3d18h 4 6 N/A 172.18.20.113 4 65801 472 477 2044 0 0 07:48:18 1 6 N/A 172.18.20.114 4 65801 472 477 2044 0 0 07:48:18 1 6 N/A [email protected]:~$ show ip bgp ... Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.0.1 0 0 65101 i *> 172.18.20.0/24 0.0.0.0 0 32768 i *> 172.18.30.0/24 172.18.0.1 0 65101 65103 i *> 172.19.11.0/24 172.18.0.1 0 65101 65103 i *= 172.19.20.181/32 172.18.20.114 0 0 65801 i *> 172.18.20.113 0 0 65801 i *> 172.19.20.190/32 172.18.0.1 0 65101 65103 65801 i Serviceの/32の経路 MetalLB+VRF
  24. frrのrunning-config r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 29 lbsvc-vrf1 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! router bgp 65801 vrf vrf1 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.19.11.1 remote-as 65103 ! address-family ipv4 unicast network 172.19.20.190/32 neighbor 172.19.11.1 activate neighbor 172.19.11.1 route-map 172.19.11.1-vrf1-in in neighbor 172.19.11.1 route-map 172.19.11.1-vrf1-out out exit-address-family exit ! ... vrf1のコンフィグ MetalLB+VRF
  25. ノードwk3のfrrにvtyshで入る r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 30 lbsvc-vrf1 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp vrf vrf1' BGP table version is 3, local router ID is 172.19.11.113, vrf id 465 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 172.19.20.190/32 0.0.0.0 0 32768 i Displayed 1 routes and 1 total paths $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp vrf vrf1 summary' IPv4 Unicast Summary (VRF vrf1): BGP router identifier 172.19.11.113, local AS number 65801 vrf-id 465 BGP table version 3 RIB entries 1, using 192 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.19.11.1 4 65103 32 24 0 0 0 00:15:10 0 1 N/A Total number of neighbors 1 対向ルータr3との ピアリング状況 MetalLB+VRF
  26. ServiceのIPアドレスに直接アクセスする r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 31 lbsvc-vrf1 PodのIPアドレス $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES hello-c84644886-zkwst 1/1 Running 2 11d 10.128.7.14 wk3 <none> <none> [test@testvm0 rpmbuild]$ sudo tcptraceroute 172.19.20.190 80 Running: traceroute -T -O info -p 80 172.19.20.190 traceroute to 172.19.20.190 (172.19.20.190), 30 hops max, 60 byte packets 1 _gateway (172.18.10.2) 0.313 ms 0.269 ms * 2 172.18.13.2 (172.18.13.2) 0.590 ms * * 3 172.19.20.190 (172.19.20.190) 1.001 ms * * 4 * * * 5 * * * 6 172.19.20.190 (172.19.20.190) <syn,ack> 10.571 ms * * [test@testvm0 rpmbuild]$ curl http://172.19.20.190 Hello, World! Timestamp: 2025/11/22 08:30:27 Hostname: hello-c84644886-zkwst LocalAddress: 10.128.7.14 Gateway: 10.128.7.1 Headers: Accept: [*/*] User-Agent: [curl/7.76.1] Host: 172.19.20.190 RemoteAddress: 100.64.0.9:35036 Service のIPアドレスに直接アクセス Service のIPアドレスにtcptraceroute Service のIPアドレス $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-lb-l3 LoadBalancer 10.200.182.169 172.19.20.181 80:30224/TCP 32h lbsvc-vrf1 LoadBalancer 10.200.202.186 172.19.20.190 80:32612/TCP 12m MetalLB+VRF
  27. ノードwk3上のルーティングテーブル 32 [core@wk3 ~]$ ip -4 -br addr show lo

    UNKNOWN 127.0.0.1/8 ovn-k8s-mp0 UNKNOWN 10.128.7.2/24 br-ex UNKNOWN 172.18.20.113/24 169.254.0.2/17 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip -4 -br addr show vrf vrf1 vlan1001@enp3s0 UP 172.19.11.113/24 [core@wk3 ~]$ ip route show vrf vrf1 default via 172.19.11.1 dev vlan1001 proto static metric 150 172.19.11.0/24 dev vlan1001 proto kernel scope link src 172.19.11.113 metric 400 デフォルトVRFのインターフェースIPアドレス VRF vrf1 のインターフェースIPアドレス デフォルトVRFのルーティングテーブル VRF vrf1 のルーティングテーブル [core@wk3 ~]$ ip rule show 0: from all lookup local 30: from all fwmark 0x1745ec lookup 7 998: from all to 10.128.0.0/16 lookup main proto static 998: from all to 10.200.0.0/16 lookup main proto static 998: from all to 169.254.0.0/17 lookup main proto static 1000: from all lookup [l3mdev-table] 5999: from all fwmark 0x3f0 lookup main 32766: from all lookup main 32767: from all lookup default [core@wk3 ~]$ ip vrf show Name Table ----------------------- vrf1 11 ルーティングルール VRF [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 MetalLB+VRF
  28. 非対称性に注意 33 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 外からServiceにアクセスするときはvrf1を通る Podから外に出るときはdefault vrfを通る MetalLB+VRF
  29. Podから外に出るときはdefault vrfを通る 34 MetalLB+VRF r1 net10 net12 net13 r2 r3

    net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 Podから外に出るときはdefault vrfを通る [root@wk3 ~]# nsenter -t $(crictl inspect $(crictl ps | awk '/hello/ {print $1}') | jq -r .info.pid) -n tracepath -n 172.18.10.90 1?: [LOCALHOST] pmtu 1400 1: 172.18.10.90 1.686ms asymm 2 1: 172.18.10.90 0.675ms asymm 2 2: 10.128.7.2 1.180ms 3: 172.18.20.1 1.303ms 4: 172.18.12.1 1.042ms 5: 172.18.10.90 1.277ms reached Resume: pmtu 1400 hops 5 back 5 testvm0 のIPアドレスにtraceroute
  30. Podから外に出るときはdefault vrfを通る 35 MetalLB+VRF r1 net10 net12 net13 r2 r3

    net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://testvm0 ルータr2上のtcpdump ルータr3上のtcpdump ソースアドレスは ノードのアドレス
  31. Podから外に出るときはdefault vrfを通る 36 MetalLB+VRF [root@wk3 /]# tcpdump -nni any port

    80 and host 172.18.10.90 ... 14:19:27.829086 ad92f172b775705 P IP 10.128.7.14.58644 > 172.18.10.90.80: Flags [S], seq 2879890650, win 65280, options [mss 1360,sackOK,TS val 1356028543 ecr 0,nop,wscale 7], length 0 14:19:27.829946 ovn-k8s-mp0 In IP 10.128.7.14.58644 > 172.18.10.90.80: Flags [S], seq 2879890650, win 65280, options [mss 1360,sackOK,TS val 1356028543 ecr 0,nop,wscale 7], length 0 14:19:27.829979 br-ex Out IP 172.18.20.113.58644 > 172.18.10.90.80: Flags [S], seq 2879890650, win 65280, options [mss 1360,sackOK,TS val 1356028543 ecr 0,nop,wscale 7], length 0 14:19:27.830058 enp1s0 Out IP 172.18.20.113.58644 > 172.18.10.90.80: Flags [S], seq 2879890650, win 65280, options [mss 1360,sackOK,TS val 1356028543 ecr 0,nop,wscale 7], length 0 14:19:27.830763 enp1s0 In IP 172.18.10.90.80 > 172.18.20.113.58644: Flags [S.], seq 3293201688, ack 2879890651, win 65160, options [mss 1460,sackOK,TS val 120455014 ecr 1356028543,nop,wscale 7], length 0 14:19:27.830776 br-ex In IP 172.18.10.90.80 > 172.18.20.113.58644: Flags [S.], seq 3293201688, ack 2879890651, win 65160, options [mss 1460,sackOK,TS val 120455014 ecr 1356028543,nop,wscale 7], length 0 14:19:27.830796 ovn-k8s-mp0 Out IP 172.18.10.90.80 > 10.128.7.14.58644: Flags [S.], seq 3293201688, ack 2879890651, win 65160, options [mss 1460,sackOK,TS val 120455014 ecr 1356028543,nop,wscale 7], length 0 14:19:27.831292 ad92f172b775705 Out IP 172.18.10.90.80 > 10.128.7.14.58644: Flags [S.], seq 3293201688, ack 2879890651, win 65160, options [mss 1460,sackOK,TS val 120455014 ecr 1356028543,nop,wscale 7], length 0 14:19:27.831332 ad92f172b775705 P IP 10.128.7.14.58644 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1356028546 ecr 120455014], length 0 14:19:27.831371 ad92f172b775705 P IP 10.128.7.14.58644 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1356028546 ecr 120455014], length 76: HTTP: GET / HTTP/1.1 14:19:27.831674 ovn-k8s-mp0 In IP 10.128.7.14.58644 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1356028546 ecr 120455014], length 76: HTTP: GET / HTTP/1.1 14:19:27.831686 br-ex Out IP 172.18.20.113.58644 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1356028546 ecr 120455014], length 76: HTTP: GET /HTTP/1.1 14:19:27.831701 enp1s0 Out IP 172.18.20.113.58644 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1356028546 ecr 120455014], length 76: HTTP: GET / HTTP/1.1 ノード上でtcpdump
  32. Egress Service ▸ 「MetalLBでVRFを切った場合でも、Podから出るパケットはdefault VRFを通る」という課題に対応するため のOVN-Kubernetesの機能 ▸ BGP広告している type: LoadBalancer

    のServiceに対して、EgressServiceを作成する ・ EgressServiceでは以下を設定する ・ 紐づけるService ・ どのVRF (ルーティングルール番号 ) を通すか ・ パケットのソースアドレスを何にするか (ServiceのExternal IP or ノードのアドレス) ▸ OpenShift v4.20ではTechPreview 38
  33. Egress Service ▸ .metadata.name , .metadata.namespace は、紐づけるLoadBalancer Serviceと同一にする ▸ .spec.network

    にはLoadBalancer Serviceを広告するVRFのルーティングルール番号を指定する ▸ .spec.sourceIPBy の設定によってソースIPアドレスが変わる ・ sourceIPBy: LoadBalancer ➔ ServiceのExternal IPがソースIPアドレスとなる ・ sourceIPBy: Network ➔ ノードのvrfがソースIPアドレスとなる 39 apiVersion: v1 kind: Service metadata: labels: app: hello name: lbsvc-vrf1 annotations: metallb.io/address-pool: pool-vrf1 metallb.io/loadBalancerIPs: 172.19.20.190 spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: deployment: hello type: LoadBalancer apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: lbsvc-vrf1 spec: sourceIPBy: "Network" nodeSelector: matchLabels: node-role.kubernetes.io/worker-virt: "" network: "11" Service EgressService
  34. Egress Serviceを使うと... 40 MetalLB+VRF+Egress Service r1 net10 net12 net13 r2

    r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 外からServiceにアクセスするときはvrf1を通る Podから外に出るときもvrf1を通る
  35. Egress Serviceを使うと... 41 r1 net10 net12 net13 r2 r3 net20

    172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 Podから外に出るときはdefault vrfを通る [root@wk3 ~]# nsenter -t $(crictl inspect $(crictl ps | awk '/hello/ {print $1}') | jq -r .info.pid) -n tracepath -n 172.18.10.90 1?: [LOCALHOST] pmtu 1400 1: 172.18.10.90 2.026ms asymm 2 1: 172.18.10.90 0.693ms asymm 2 2: 10.128.7.2 1.140ms 3: 172.19.11.1 1.094ms 4: 172.18.13.1 1.171ms 5: 172.18.10.90 1.071ms reached Resume: pmtu 1400 hops 5 back 5 testvm0 のIPアドレスにtraceroute MetalLB+VRF+Egress Service
  36. sourceIPBy: LoadBalancerのとき 42 r1 net10 net12 net13 r2 r3 net20

    172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://testvm0 ルータr2上のtcpdump ルータr3上のtcpdump ソースアドレスは Serviceのアドレス MetalLB+VRF+Egress Service
  37. sourceIPBy: LoadBalancerのとき 43 [root@wk3 /]# tcpdump -nni any port 80

    and host 172.18.10.90 ... 14:17:55.163503 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164363 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164398 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [S], seq 988630598, win 65280, options [mss 1360,sackOK,TS val 1355935878 ecr 0,nop,wscale 7], length 0 14:17:55.164991 enp3s0 In IP13 (invalid) 14:17:55.164994 vlan1001 In IP 172.18.10.90.80 > 172.19.20.190.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165039 ovn-k8s-mp0 Out IP 172.18.10.90.80 > 10.128.7.14.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165732 ad92f172b775705 Out IP 172.18.10.90.80 > 10.128.7.14.47726: Flags [S.], seq 1746093189, ack 988630599, win 65160, options [mss 1460,sackOK,TS val 120362348 ecr 1355935878,nop,wscale 7], length 0 14:17:55.165796 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.165843 ad92f172b775705 P IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 14:17:55.166157 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.166173 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 0 14:17:55.166232 ovn-k8s-mp0 In IP 10.128.7.14.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 14:17:55.166247 vlan1001 Out IP 172.19.20.190.47726 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355935880 ecr 120362348], length 76: HTTP: GET / HTTP/1.1 ノード上でtcpdump MetalLB+VRF+Egress Service
  38. sourceIPBy: Networkのとき 44 r1 net10 net12 net13 r2 r3 net20

    172.18.10.0/24 172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 vrf1 vlan1001 vlan1001 172.19.11.0/24 65801 lbsvc-vrf1 対向ルータでtcpdump curl http://testvm0 ルータr2上のtcpdump ルータr3上のtcpdump ソースアドレスは vrf1内の VLANのアドレス MetalLB+VRF+Egress Service
  39. sourceIPBy: Networkのとき 45 [root@wk3 /]# tcpdump -nni any port 80

    and host 172.18.10.90 ... 14:15:25.225491 ad92f172b775705 P IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [S], seq 82032646, win 65280, options [mss 1360,sackOK,TS val 1355785940 ecr 0,nop,wscale 7], length 0 14:15:25.226538 ovn-k8s-mp0 In IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [S], seq 82032646, win 65280, options [mss 1360,sackOK,TS val 1355785940 ecr 0,nop,wscale 7], length 0 14:15:25.226575 vlan1001 Out IP 172.19.11.113.43690 > 172.18.10.90.80: Flags [S], seq 82032646, win 65280, options [mss 1360,sackOK,TS val 1355785940 ecr 0,nop,wscale 7], length 0 14:15:25.227187 enp3s0 In IP13 (invalid) 14:15:25.227188 vlan1001 In IP 172.18.10.90.80 > 172.19.11.113.43690: Flags [S.], seq 964627448, ack 82032647, win 65160, options [mss 1460,sackOK,TS val 120212410 ecr 1355785940,nop,wscale 7], length 0 14:15:25.227204 ovn-k8s-mp0 Out IP 172.18.10.90.80 > 10.128.7.14.43690: Flags [S.], seq 964627448, ack 82032647, win 65160, options [mss 1460,sackOK,TS val 120212410 ecr 1355785940,nop,wscale 7], length 0 14:15:25.227574 ad92f172b775705 Out IP 172.18.10.90.80 > 10.128.7.14.43690: Flags [S.], seq 964627448, ack 82032647, win 65160, options [mss 1460,sackOK,TS val 120212410 ecr 1355785940,nop,wscale 7], length 0 14:15:25.227600 ad92f172b775705 P IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 0 14:15:25.227659 ad92f172b775705 P IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 76: HTTP: GET / HTTP/1.1 14:15:25.227811 ovn-k8s-mp0 In IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 0 14:15:25.227818 vlan1001 Out IP 172.19.11.113.43690 > 172.18.10.90.80: Flags [.], ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 0 14:15:25.228004 ovn-k8s-mp0 In IP 10.128.7.14.43690 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 76: HTTP: GET / HTTP/1.1 14:15:25.228018 vlan1001 Out IP 172.19.11.113.43690 > 172.18.10.90.80: Flags [P.], seq 1:77, ack 1, win 510, options [nop,nop,TS val 1355785942 ecr 120212410], length 76: HTTP: GET / HTTP/1.1 14:15:25.228253 enp3s0 In IP13 (invalid) ノード上でtcpdump MetalLB+VRF+Egress Service
  40. OpenShiftの設定 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24 172.18.12.0/24

    172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 apiVersion: frrk8s.metallb.io/v1beta1 kind: FRRConfiguration metadata: name: receive-all namespace: openshift-frr-k8s spec: bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 toReceive: allowed: mode: all FRRConfiguration 65801 47 hello-lb-l3 MetalLB route import 受けた経路を全て インポートする設定
  41. MetalLBが生成するFRRConfiguration 48 $ oc -n openshift-frr-k8s get frrconfiguration receive-all -o

    yaml | yq .spec bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false toReceive: allowed: mode: all $ oc -n openshift-frr-k8s get frrconfiguration metallb-wk3 -o yaml | yq .spec bgp: routers: - asn: 65801 neighbors: - address: 172.18.20.1 asn: 65102 disableMP: false dualStackAddressFamily: false passwordSecret: {} port: 179 toAdvertise: allowed: mode: filtered prefixes: - 172.19.20.181/32 toReceive: allowed: mode: filtered prefixes: - 172.19.20.181/32 nodeSelector: matchLabels: kubernetes.io/hostname: wk3 raw: {} この2つのFRRConfigurationがマージされて、外部 からの経路を受け取れるようになる 手で追加した FRRConfiguration MetalLBが生成した FRRConfiguration MetalLB route import
  42. FRRConfigurationのマージ ▸ 基本方針: 複数のFRRConfigurationを、「コンフィグが拡張する (できることが増える)」方針でマージする ・ よりneighborを増やす ・ より多くのプレフィックスを許可する ▸

    流れ ・ 複数のFRRConfigurationで設定内容に矛盾がないかをチェックする ・ コンフリクトがあったらマージせず前の FRRConfigurationを使用する ・ エラーになる例: ・ 同じVRFで同じルータに対して異なる ASN設定がある ・ 同じneighbor(同じアドレス、ポート番号 )に対して異なるASN設定がある ・ 同じ名前で異なる設定内容の BFDプロファイルがある ・ ラベルセレクタで指定した各ノードに対して、マージした FRRのconfigを生成する ・ 全てのルータ設定を組み合わせる ・ 各ルータ設定内では、全てのプレフィックスと neighborをマージする ・ 各neighborでは、全てのフィルタをマージする ・ より多くの経路を扱えるフィルタを優先する 49 MetalLB route import
  43. frrのrunning-config 50 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc get frrnodestates.frrk8s.metallb.io wk3 -oyaml | yq .status.runningConfig Building configuration... ... ! router bgp 65801 no bgp ebgp-requires-policy no bgp default ipv4-unicast bgp graceful-restart preserve-fw-state no bgp network import-check neighbor 172.18.20.1 remote-as 65102 ! address-family ipv4 unicast network 172.19.20.181/32 neighbor 172.18.20.1 activate neighbor 172.18.20.1 route-map 172.18.20.1-in in neighbor 172.18.20.1 route-map 172.18.20.1-out out exit-address-family exit ! ip prefix-list 172.18.20.1-inpl-ipv4 seq 1 permit any ip prefix-list 172.18.20.1-allowed-ipv4 seq 1 permit 172.19.20.181/32 ! ... ! route-map 172.18.20.1-out permit 1 match ip address prefix-list 172.18.20.1-allowed-ipv4 exit ! ... ! route-map 172.18.20.1-in permit 3 match ip address prefix-list 172.18.20.1-inpl-ipv4 exit ! route-map 172.18.20.1-in permit 4 match ipv6 address prefix-list 172.18.20.1-inpl-ipv4 exit ! ServiceのExternal IPを 広告 対向ルータr2との ピア設定 カスタムリソース FRRNodeState から、各ノードの frrの running-configが見れる 外から広告された経路を受 け取る MetalLB route import
  44. ノードwk3のfrrにvtyshで入る 51 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show ip bgp' BGP table version is 9, local router ID is 172.18.20.113, vrf id 0 Default local pref 100, local AS 65801 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0/0 172.18.20.1 0 65102 65101 i *> 172.18.20.0/24 172.18.20.1 0 0 65102 i *> 172.18.30.0/24 172.18.20.1 0 65102 65101 65103 i *> 172.19.11.0/24 172.18.20.1 0 65102 65101 65103 i *> 172.19.20.181/32 0.0.0.0 0 32768 i Displayed 5 routes and 5 total paths 対向ルータr2との ピアリング状況 $ oc -n openshift-frr-k8s exec -c frr frr-k8s-vk2jh -- vtysh -c 'show bgp summary' IPv4 Unicast Summary (VRF default): BGP router identifier 172.18.20.113, local AS number 65801 vrf-id 0 BGP table version 9 RIB entries 8, using 1536 bytes of memory Peers 1, using 725 KiB of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 172.18.20.1 4 65102 2257 2241 0 0 0 14:46:16 4 1 N/A Total number of neighbors 1 外から受け取った経路 MetalLB route import
  45. ノードwk3上のルーティングテーブル 52 r1 net10 net12 net13 r2 r3 net20 172.18.10.0/24

    172.18.12.0/24 172.18.13.0/24 172.18.20.0/24 .1 .2 .2 .1 .2 .1 .90 65101 65102 65103 .1 .1 testvm0 net30 172.18.30.0/24 65801 hello-lb-l3 [core@wk3 ~]$ ip route show default via 172.18.20.1 dev br-ex proto dhcp src 172.18.20.113 metric 48 10.128.0.0/16 via 10.128.7.1 dev ovn-k8s-mp0 10.128.7.0/24 dev ovn-k8s-mp0 proto kernel scope link src 10.128.7.2 10.200.0.0/16 via 169.254.0.4 dev br-ex src 169.254.0.2 mtu 1400 169.254.0.0/17 dev br-ex proto kernel scope link src 169.254.0.2 169.254.0.1 dev br-ex src 172.18.20.113 169.254.0.3 via 10.128.7.1 dev ovn-k8s-mp0 172.18.20.0/24 dev br-ex proto kernel scope link src 172.18.20.113 metric 48 172.18.30.0/24 nhid 890 via 172.18.20.1 dev br-ex proto bgp metric 20 172.19.11.0/24 nhid 890 via 172.18.20.1 dev br-ex proto bgp metric 20 BGPで受け取った経路 MetalLB route import
  46. 参考文献 ▸ 本資料ではShowNet Iconを使わせていただきました ・ https://github.com/interop-tokyo-shownet/shownet-icons ▸ MetalLB ・ https://metallb.io/

    ・ https://github.com/metallb/metallb ▸ FRR-k8s ・ https://github.com/metallb/frr-k8s ▸ FRRouting ・ https://frrouting.org/ ・ https://github.com/FRRouting/frr 54
  47. 参考文献 ▸ Split FRR - Proposal to move FRR to

    a stand alone component ・ https://github.com/metallb/metallb/blob/main/design/splitfrr-proposal.md ▸ blog: FRR-k8s as a BGP backend for MetalLB ・ https://www.redhat.com/ja/blog/frr-k8s-bgp-backend-metallb ▸ slide: Bringing routes to Kubernetes nodes via BGP: introducing frr-k8s ・ https://archive.fosdem.org/2024/schedule/event/fosdem-2024-1818-bringing-routes-to-kub ernetes-nodes-via-bgp-introducing-frr-k8s/ ▸ slide: MetalLB and FRR: a match made in heaven ・ https://archive.fosdem.org/2023/schedule/event/network_metallb_and_frr/ 55
  48. MetalLB OperatorとOpenShift ▸ OpenShiftはv4.19.14から、CNIプラグインのOVN-KubernetesがBGP対応しており、MetalLBと OVN-Kubernetesがfrr-k8sを共有する ▸ Cluster Network Operator (CNO)

    のカスタムリソースを設定すると frr-k8sがDaemonSetとして openshift-frr-k8s namespaceにデプロイされる ▸ MetalLB Operatorは、CNOがfrr-k8sをデプロイ済みであればそれを使用し、そうでなければ CNOの設定 をして、CNOにfrr-k8sをデプロイしてもらう 56
  49. MetalLB OperatorとOpenShift ▸ OpenShift上で動いていれば、バックエンドを frr-k8sにし、frr-k8sをCluster NetworkOperatorからデプロイ するようにする ・ OpenShift上、かつ環境変数 DEPLOY_FRRK8S_FROM_CNO

    が true であればBGPのバックエンド を frr-k8s-external にする 58 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/params/params.go#L136-L138 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/params/params.go#L23-L25
  50. MetalLB OperatorとOpenShift ▸ OpenShift上で動いていれば、バックエンドを frr-k8sにし、frr-k8sをCluster NetworkOperatorからデプロイ するようにする ・ CNOのカスタムリソースnetwork.operatorの additionalRoutingCapabilities

    を設定する ➔ CNOが openshift-frr-k8s namespaceにfrr-k8sをデプロイする 59 https://github.com/metallb/metallb-operator/blob/4b6d32e74622818ea8ee853e6aab393ac70f0 eae/pkg/openshift/openshift.go#L44-L62