Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making your Kubernetes-based log collection reliable & durable with Vector

Palark
November 24, 2023

Making your Kubernetes-based log collection reliable & durable with Vector

Tech talk by Maksim Nabokikh, Platform Lead @ Palark, presented at KCD (Kubernetes Community Days) Austria 2023 and OSMC (Open Source Monitoring Conference) 2023.

Vector is an Open Source high-performance solution for collecting & processing your observability data. In this talk, Maksim shares our experience using it for log collection in hundreds of Kubernetes clusters.

Find more resources for this talk:
* Blog article: Collecting logs in Kubernetes with Vector: Benefits, architecture, real cases.
* YouTube video.

P.S. Subscribe to the Palark tech blog to get our latest articles on DevOps, SRE, Kubernetes, and more!

Palark

November 24, 2023
Tweet

More Decks by Palark

Other Decks in Technology

Transcript

  1. Making your
    kubernetes-based
    log collection reliable
    & durable with vector
    VECTOR
    MAKSIM NABOKIKH
    Platform Lead

    View full-size slide

  2. DISCLAIMER
    During this talk preparation,
    no Kubernetes clusters were hurt

    View full-size slide

  3. DISCLAIMER
    During this talk preparation,
    no Kubernetes clusters were hurt
    Just kidding, in reality,
    there were ple-e-e-enty of outages

    View full-size slide

  4. ABOUT
    PALARK
    We offer all-in-one DevOps-as-a-Service and pick
    the best Open Source projects to fulfill our client goals
    16 70
    Years in Linux,
    DevOps & Kubernetes
    Managed
    Kubernetes clusters
    15 90
    Awesome
    engineers
    Tech posts at
    blog.palark.com

    View full-size slide

  5. PLAN
    LOGS IN KUBERNETES
    Let’s recall what to collect
    in Kubernetes
    WHAT IS VECTOR
    And in which way
    it is applicable
    PRACTICAL USE
    Exciting operating (Ops)
    experience cases
    1
    2
    3

    View full-size slide

  6. LOGS
    IN KUBERNETES

    View full-size slide

  7. LOGS IN KUBERNETES: POD LOGS
    Log file location path consists of a pod name, container name, and UID
    Format and location of files depends on the CRI settings
    Max size and rotation depends on the kubelet settings
    kubernetes.io/docs/concepts/cluster-administration/logging/
    /var/log/pods
    pod-1 pod-2
    kubelet
    stdout
    stderr
    stdout
    stderr

    View full-size slide

  8. LOGS IN KUBERNETES: NODE SERVICES
    Files in the /var/log directory (probably)
    Max size and rotation configured by journald
    Format can be anything…
    kubernetes.io/docs/concepts/cluster-administration/logging/
    containerd kubelet audit logs syslog

    View full-size slide

  9. LOGS IN KUBERNETES: EVENTS
    Can only be collected from the Kubernetes API
    Can be collected as either logs, metrics, or traces
    kubernetes.io/docs/concepts/cluster-administration/logging/
    apiVersion: v1
    kind: Event
    count: 1
    metadata:
    name: standard-worker-1.178264e1185b006f
    namespace: default
    reason: RegisteredNode
    firstTimestamp: '2023-09-06T19:08:47Z'
    lastTimestamp: '2023-09-06T19:08:47Z'
    involvedObject:
    apiVersion: v1
    kind: Node
    name: standard-worker-1
    uid: 50fb55c5-d97e-4851-85c6-187465154db6
    message: 'Registered Node standard-worker-1 in Controller'

    View full-size slide

  10. LOGS IN KUBERNETES: EVENTS
    Can only be collected from the Kubernetes API
    Can be collected as either logs, metrics, or traces
    kubernetes.io/docs/concepts/cluster-administration/logging/
    apiVersion: v1
    kind: Event
    count: 1
    metadata:
    name: standard-worker-1.178264e1185b006f
    namespace: default
    reason: RegisteredNode
    firstTimestamp: '2023-09-06T19:08:47Z'
    lastTimestamp: '2023-09-06T19:08:47Z'
    involvedObject:
    apiVersion: v1
    kind: Node
    name: standard-worker-1
    uid: 50fb55c5-d97e-4851-85c6-187465154db6
    message: 'Registered Node standard-worker-1 in Controller'

    View full-size slide

  11. LOGS IN KUBERNETES: EVENTS
    Can only be collected from the Kubernetes API
    Can be collected as either logs, metrics, or traces
    kubernetes.io/docs/concepts/cluster-administration/logging/
    apiVersion: v1
    kind: Event
    count: 1
    metadata:
    name: standard-worker-1.178264e1185b006f
    namespace: default
    reason: RegisteredNode
    firstTimestamp: '2023-09-06T19:08:47Z'
    lastTimestamp: '2023-09-06T19:08:47Z'
    involvedObject:
    apiVersion: v1
    kind: Node
    name: standard-worker-1
    uid: 50fb55c5-d97e-4851-85c6-187465154db6
    message: 'Registered Node standard-worker-1 in Controller'

    View full-size slide

  12. LOGS IN KUBERNETES: EVENTS
    Can only be collected from the Kubernetes API
    Can be collected as either logs, metrics, or traces
    kubernetes.io/docs/concepts/cluster-administration/logging/
    apiVersion: v1
    kind: Event
    count: 1
    metadata:
    name: standard-worker-1.178264e1185b006f
    namespace: default
    reason: RegisteredNode
    firstTimestamp: '2023-09-06T19:08:47Z'
    lastTimestamp: '2023-09-06T19:08:47Z'
    involvedObject:
    apiVersion: v1
    kind: Node
    name: standard-worker-1
    uid: 50fb55c5-d97e-4851-85c6-187465154db6
    message: 'Registered Node standard-worker-1 in Controller'

    View full-size slide

  13. LOGS IN KUBERNETES
    kubernetes.io/docs/concepts/cluster-administration/logging/
    What we can collect? Source
    Pod logs Files
    Node services logs Files
    Events Kubernetes API

    View full-size slide

  14. LOGS IN KUBERNETES
    kubernetes.io/docs/concepts/cluster-administration/logging/
    What we can collect? Source
    Pod logs Files
    Node services logs Files
    Events Kubernetes API

    View full-size slide

  15. WHAT IS
    VECTOR

    View full-size slide

  16. WHAT IS VECTOR
    A lightweight, ultra-fast tool
    for building observability pipelines
    vector.dev

    View full-size slide

  17. WHAT IS VECTOR
    A lightweight, ultra-fast tool
    for building observability pipelines
    vector.dev

    View full-size slide

  18. WHAT IS VECTOR
    An open source, efficient tool
    for building log collecting pipelines
    vector.dev

    View full-size slide

  19. WHAT IS VECTOR
    Vendor agnostic
    You do not need to rewrite Vector in Rust
    Performance by design and continuous benchmarking
    Flexible building block
    vector.dev
    An open source, efficient tool
    for building log collecting pipelines

    View full-size slide

  20. VECTOR’S ARCHITECTURE

    View full-size slide

  21. VECTOR’S ARCHITECTURE
    Remap
    Filter
    Aggregate
    Collect Transform Send
    File
    K8s
    Socket
    9 in total
    40 in total 52 in total

    View full-size slide


  22. VECTOR’S ARCHITECTURE
    Remap
    Filter
    Aggregate
    Collect Transform Send
    File
    K8s
    Socket
    9 in total
    40 in total 52 in total
    Vector Remap
    Language (VRL)

    View full-size slide

  23. VECTOR REMAP LANGUAGE

    View full-size slide

  24. VECTOR REMAP LANGUAGE
    [transforms.filter_severity]
    type = "filter"
    inputs = ["logs"]
    condition = '.severity != "info"'

    View full-size slide

  25. VECTOR REMAP LANGUAGE
    [transforms.filter_severity]
    type = "filter"
    inputs = ["logs"]
    condition = '.severity != "info"'
    [transforms.sanitize_kubernetes_labels]
    type = "remap"
    inputs = ["logs"]
    source = '''
    if exists(.pod_labels."controller-revision-hash") {
    del(.pod_labels."controller-revision-hash")
    }
    if exists(.pod_labels."pod-template-hash") {
    del(.pod_labels."pod-template-hash")
    }
    '''

    View full-size slide

  26. VECTOR REMAP LANGUAGE
    [transforms.filter_severity]
    type = "filter"
    inputs = ["logs"]
    condition = '.severity != "info"'
    [transforms.sanitize_kubernetes_labels]
    type = "remap"
    inputs = ["logs"]
    source = '''
    if exists(.pod_labels."controller-revision-hash") {
    del(.pod_labels."controller-revision-hash")
    }
    if exists(.pod_labels."pod-template-hash") {
    del(.pod_labels."pod-template-hash")
    }
    '''
    [transforms.backslash_multiline]
    type = "reduce"
    inputs = ["logs"]
    group_by = ["file", "stream"]
    merge_strategies."message" = "concat_newline"
    ends_when = '''
    matched, err = match(.message, r'[^\\]$');
    if err != null {
    false;
    } else {
    matched;
    }
    '''

    View full-size slide

  27. LOG COLLECTING TOPOLOGIES

    View full-size slide

  28. LOG COLLECTING TOPOLOGIES
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    storage
    Distributed

    View full-size slide

  29. LOG COLLECTING TOPOLOGIES
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    aggregator
    storage
    aggregator
    storage
    Distributed Centralized

    View full-size slide

  30. LOG COLLECTING TOPOLOGIES
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    aggregator
    storage
    aggregator
    storage
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    queue storage
    Distributed Centralized
    Stream

    View full-size slide

  31. LOG COLLECTING TOPOLOGIES
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    aggregator
    storage
    aggregator
    storage
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    queue storage
    Distributed Centralized
    Stream

    View full-size slide

  32. LOG COLLECTING TOPOLOGIES
    aggregator
    storage
    aggregator
    storage
    queue storage
    Distributed Centralized
    Stream
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper
    log-shipper

    View full-size slide

  33. VECTOR IN KUBERNETES

    View full-size slide

  34. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  35. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  36. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    volumes:
    - name: var-log
    hostPath:
    path: /var/log/
    - name: vector-data-dir
    hostPath:
    path: /mnt/vector-data
    - name: localtime
    hostPath:
    path: /etc/localtime
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  37. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    volumes:
    - name: var-log
    hostPath:
    path: /var/log/
    - name: vector-data-dir
    hostPath:
    path: /mnt/vector-data
    - name: localtime
    hostPath:
    path: /etc/localtime
    volumeMounts:
    - name: var-log
    mountPath: /var/log/
    readOnly: true
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  38. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    volumes:
    - name: var-log
    hostPath:
    path: /var/log/
    - name: vector-data-dir
    hostPath:
    path: /mnt/vector-data
    - name: localtime
    hostPath:
    path: /etc/localtime
    volumeMounts:
    - name: var-log
    mountPath: /var/log/
    readOnly: true
    terminationGracePeriodSeconds: 120
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  39. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    volumes:
    - name: var-log
    hostPath:
    path: /var/log/
    - name: vector-data-dir
    hostPath:
    path: /mnt/vector-data
    - name: localtime
    hostPath:
    path: /etc/localtime
    volumeMounts:
    - name: var-log
    mountPath: /var/log/
    readOnly: true
    terminationGracePeriodSeconds: 120
    shareProcessNamespace: true
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  40. VECTOR IN KUBERNETES
    github.com/deckhouse/deckhouse/blob/main/modules/460-log-shipper/templates/daemonset.yaml
    apiVersion: apps/v1
    kind: DaemonSet
    volumes:
    - name: var-log
    hostPath:
    path: /var/log/
    - name: vector-data-dir
    hostPath:
    path: /mnt/vector-data
    - name: localtime
    hostPath:
    path: /etc/localtime
    volumeMounts:
    - name: var-log
    mountPath: /var/log/
    readOnly: true
    terminationGracePeriodSeconds: 120
    shareProcessNamespace: true
    /var/log
    /vector-data
    /etc/vector
    Vector Reloader Kube RBAC proxy
    log-shipper
    Vector – collects logs
    Reloader – validates config and reloads
    Kube RBAC proxy – protects metrics
    Node File System

    View full-size slide

  41. PRACTICAL
    USE

    View full-size slide

  42. CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  43. $ lsof -nP | grep '(deleted)'
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  44. $ lsof -nP | grep '(deleted)'
    vector 6331 root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    vector 6331 root 44r REG 253,3 10239 33665268 /var/log/.../1.log (deleted)
    vector 6331 6628 vector-wo root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    vector 6331 6628 vector-wo root 44r REG 253,3 10239 33665268 /var/log/.../1.log (deleted)
    vector 6331 6629 vector-wo root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  45. $ lsof -nP | grep '(deleted)'
    vector 6331 root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    vector 6331 root 44r REG 253,3 10239 33665268 /var/log/.../1.log (deleted)
    vector 6331 6628 vector-wo root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    vector 6331 6628 vector-wo root 44r REG 253,3 10239 33665268 /var/log/.../1.log (deleted)
    vector 6331 6629 vector-wo root 25r REG 253,3 10602 72738831 /var/log/.../1.log (deleted)
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  46. Vector
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  47. Vector /var/log/pods
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  48. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 10Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  49. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 20Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  50. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  51. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  52. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  53. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 10Mb
    kubelet
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  54. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  55. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    Loki
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  56. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    Loki
    429
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  57. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 50Mb
    kubelet
    Loki
    429
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  58. Vector /var/log/pods
    /var/log/pods/{uid}/1.log 10Mb
    kubelet
    Loki
    429
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  59. Vector /var/log/pods
    /var/log/pods/{uid}/1.log
    kubelet
    Loki
    429
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    10Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  60. Vector /var/log/pods
    /var/log/pods/{uid}/1.log
    kubelet
    Loki
    429
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    10Mb
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  61. Vector /var/log/pods
    /var/log/pods/{uid}/1.log
    kubelet
    Loki
    429
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    10Mb
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  62. Vector /var/log/pods
    /var/log/pods/{uid}/1.log
    kubelet
    Loki
    429
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    10Mb
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    /var/log/pods/{uid}/1.log (DELETED) 50Mb
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  63. HOW TO SOLVE?
    1. Tune buffer settings
    Blocking (default) Drop Newest
    In Memory (default) Disk buffer
    Max events 1000 (default) 10000
    2. Rule of a thumb
    Let logs go out of the node as quick as possible
    3. If you brave enough
    sysctl -w fs.file-max=1000 (unsafe)
    vector.dev/docs/about/under-the-hood/architecture/buffering-model/
    CASE #1: NO SPACE LEFT ON THE DEVICE

    View full-size slide

  64. CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  65. uid=a uid=b
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    Vector Prometheus
    a
    b
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  66. uid=a uid=b
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    Vector Prometheus
    a
    b
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  67. uid=c uid=d
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  68. uid=c uid=d
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  69. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    e
    f
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  70. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    e
    f
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  71. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    e
    f
    metric_relabel_configs:
    - regex: 'file'
    action: labeldrop
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  72. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    e
    f
    metric_relabel_configs:
    - regex: 'file'
    action: labeldrop
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  73. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/a/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/b/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    a
    b
    c
    d
    e
    f
    HOW TO SOLVE? expire_metrics_secs=60
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  74. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/c/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/d/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    c
    d
    e
    f
    HOW TO SOLVE? expire_metrics_secs=60
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  75. uid=f uid=e
    vector_checkpoints_total{file=”/var/log/pods/e/{container}/1.log”}
    vector_checkpoints_total{file=”/var/log/pods/f/{container}/1.log”}
    Vector Prometheus
    e
    f
    HOW TO SOLVE? expire_metrics_secs=60
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  76. HOW TO SOLVE? expire_metrics_secs=60
    vector_component_errors_total
    time
    7
    3
    3
    errors
    4
    m
    ore
    errors
    expiration
    triggered
    3
    errors
    empty!
    This behavior makes
    the result of the rate
    PromQL function
    equal to zero.
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  77. HOW TO SOLVE? expire_metrics_secs=60
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  78. HOW TO SOLVE? expire_metrics_secs=60
    Patch for Vector
    to remove the file label
    CASE #2: PROMETHEUS EXPLODED

    View full-size slide

  79. CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  80. Vector
    Vector
    Vector
    Kubernetes
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  81. Vector
    Vector
    Vector
    Kubernetes
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  82. Vector
    Vector
    Vector
    Kubernetes
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  83. control-plane node
    memory consumption
    etcd
    memory consumption
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  84. Vector
    Vector
    Vector
    Kubernetes
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  85. Vector
    Vector
    Vector
    Kubernetes
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  86. Vector
    Vector
    Vector
    Kubernetes
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME
    110 pods
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  87. Vector
    Vector
    Vector
    Kubernetes
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME
    110 pods
    etcd
    /registry///
    ALL pods
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  88. Vector
    Vector
    Vector
    Kubernetes
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME
    110 pods
    etcd
    /registry///
    ALL pods
    RAM↑ RAM↑
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  89. Vector
    Vector
    Vector
    Kubernetes
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME
    110 pods
    etcd
    /registry///
    ALL pods
    RAM↑ RAM↑
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  90. HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  91. 1. Cache read (resourceVersion=0)
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME&resourceVersion=0
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  92. 1. Cache read (resourceVersion=0)
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME&resourceVersion=0
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  93. 1. Cache read (resourceVersion=0)
    LIST /api/v1/pods?fieldSelector=spec.nodeName=$NODE_NAME&resourceVersion=0
    use_apiserver_cache=true
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  94. 1. Cache read (resourceVersion=0)
    2. Limit concurrent requests (Priority and Fairness API)
    apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
    kind: PriorityLevelConfiguration
    metadata:
    name: limit-list-custom
    spec:
    type: Limited
    limited:
    assuredConcurrencyShares: 5
    limitResponse:
    queuing:
    handSize: 4
    queueLengthLimit: 50
    queues: 16
    type: Queue
    apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
    kind: FlowSchema
    metadata:
    name: limit-list-custom
    spec:
    priorityLevelConfiguration:
    name: limit-list-custom
    distinguisherMethod:
    type: ByUser
    rules:
    - resourceRules:
    - apiGroups: [""]
    clusterScope: true
    namespaces: ["*"]
    resources: ["pods"]
    verbs: ["list", "get"]
    subjects:
    - kind: ServiceAccount
    serviceAccount:
    name: ***
    namespace: ***
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  95. 1. Cache read (resourceVersion=0)
    2. Limit concurrent requests (Priority and Fairness API)
    apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
    kind: PriorityLevelConfiguration
    metadata:
    name: limit-list-custom
    spec:
    type: Limited
    limited:
    assuredConcurrencyShares: 5
    limitResponse:
    queuing:
    handSize: 4
    queueLengthLimit: 50
    queues: 16
    type: Queue
    apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
    kind: FlowSchema
    metadata:
    name: limit-list-custom
    spec:
    priorityLevelConfiguration:
    name: limit-list-custom
    distinguisherMethod:
    type: ByUser
    rules:
    - resourceRules:
    - apiGroups: [""]
    clusterScope: true
    namespaces: ["*"]
    resources: ["pods"]
    verbs: ["list", "get"]
    subjects:
    - kind: ServiceAccount
    serviceAccount:
    name: ***
    namespace: ***
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  96. 1. Cache read (resourceVersion=0)
    2. Limit concurrent requests (Priority and Fairness API)
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  97. 1. Cache read (resourceVersion=0)
    2. Limit concurrent requests (Priority and Fairness API)
    3. Use kubelet API instead of Kubernetes
    Pods metadata can be fetched by requesting the /pods endpoint
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  98. 1. Cache read (resourceVersion=0)
    2. Limit concurrent requests (Priority and Fairness API)
    3. Use kubelet API instead of Kubernetes
    HOW TO SOLVE?
    CASE #3: KUBERNETES CONTROL PLANE OUTAGE

    View full-size slide

  99. CONCLUSION
    1. Great to build platforms
    2. Vector is awesome, seriously, deploy it today
    3. Share practical cases and learn together

    View full-size slide

  100. github.com/werf
    github.com/palark
    THANK YOU!
    Q&A
    @nabokihms
    [email protected]
    OPEN SOURCE
    TOOLS
    OUR BLOGS AND
    SOCIAL MEDIA
    CONTACT US
    palark.com
    twitter.com/palark_com
    MAKSIM
    NABOKIKH
    Platform Lead

    View full-size slide