Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CNCF Toronto Shopify Meetup Slides 26-03-25

cncf-canada-meetups
March 28, 2025
8

CNCF Toronto Shopify Meetup Slides 26-03-25

cncf-canada-meetups

March 28, 2025
Tweet

Transcript

  1. Agenda 🎤 Introduction & CNCF Announcements ⚡ Lightning Talks! Networking

    ⚡ More Talks! 📥 Event survey & Linux certification training giveaway
  2. 17

  3. 19

  4. CubeFS Graduates!!! 🎓 Cloud-Native Storage: Distributed file and object storage

    system optimized for cloud-native applications. Multi-Protocol Support: Compatible with S3, POSIX, and HDFS access protocols. CNCF Graduated Project: Achieved graduated status within the Cloud Native Computing Foundation as of December 2024.
  5. Kubescape Reaches Incubation!!! Kubescape is an open-source Kubernetes security platform

    that provides comprehensive security coverage, from left to right across the entire development and deployment lifecycle. It offers hardening, posture management, and runtime security capabilities to ensure robust protection for Kubernetes environments. It saves Kubernetes users and admins precious time, effort, and resources.
  6. Dapr Agents A framework built on top of Dapr that

    combines stateful workflow coordination with advanced Agentic AI features. Dapr Agents is the best way to build systems of agents fit for enterprise use cases.
  7. Kairos and K0s Kairos: An open-source project that enables the

    creation of immutable, bootable Linux images for edge devices, supporting various distributions and integrating with Kubernetes distributions like k3s. k0s: A lightweight, single-binary Kubernetes distribution designed for simplicity and flexibility, suitable for deployment on various infrastructures including bare-metal, edge, and IoT environments. Secure Edge Images: Pre-hardened, minimal operating system images tailored for running container workloads at the edge, often incorporating security features like Trusted Platform Module (TPM) support and Secure Boot to ensure system integrity. https://kairos.io/ READ THE CNCF BLOG
  8. And a little bit about: • Open Source Framework: Build

    custom developer portals. • Unified Interface: Access tools, services, and docs in one place. • CNCF Incubation: Part of the Cloud Native Computing Foundation. Key Features: • Software Catalog: Manage all software assets centrally. • Software Templates: Quickly set up projects with standardized tooling. • TechDocs: Simplify documentation with "docs like code." • Plugins: To further expand Backstage’s customizability and functionality!
  9. Stage 1 Laying the Foundation Stage 2 First Contributions: Fixing

    Docs and Enhancing Release Workflow Stage 3 Making Substantial Contributions Stage 4 Stage 5 Becoming an Active Community Member
  10. First PR: Fixing Docs • Simple fix, but crucial for

    learning the contributing process. • Gained experience with PR submission and review workflow. • Familiarized myself with GOVERNANCE and CONTRIBUTING docs. • Link to PR: https://github.com/backstage/community- plugins/pull/497
  11. Tackling an Open Issue: Adding GitHub Tags • Goal: Link

    npm releases to specific Git commits for better traceability. • Challenge 1: Unfamiliarity with GitHub workflows; required on-the-fly learning. • Challenge 2: Difficulty testing workflow changes and verifying script functionality. • Solution: Sought help from maintainers, who suggested adding log statements for debugging. • Discovered valuable community resources, including the Backstage Discord and SIG (Special Interest Group) meetings, for future support.
  12. Adding Permissions to Catalog Endpoints Issue: Two endpoints, /analyze-location and

    /validate-entity weren’t protected by permissions. Solution: Implemented permission checks to restrict access. Challenges: Understanding the Permissions Framework in Backstage – relied on Discord community and documentation for guidance.
  13. Making LDAP Processor Configuration More Flexible Issue: Some LDAP servers

    use attribute names that are different from the hardcoded one, breaking user-group mappings. Solution: Made dnAttributeName and uuidAttributeName (important for user-group mappings) configurable to support different LDAP schemas. Challenges: No local LDAP setup—community testing was key.
  14. Repository Maintenance • Resolved a CVE: @backstage/backend-common relied on a

    vulnerable jsonpath-plus dependency via @kubernetes/client-node. • Removed the vulnerable dependency from ~8 plugins, proactively safeguarding the repository. • Implemented Knip reports to identify unused dependencies. • Developed a script to generate reports for all plugins. • Evaluating reports for potential inclusion in PR workflows.
  15. Community Engagement Proactively reviewed pull requests, a significant learning experience

    Regularly participated in community-plugins SIG meetings Provided guidance and support on Discord
  16. Helping Others With Their Open Source Journey! • Providing thorough

    and constructive PR reviews. • Guiding contributors to good first issues. • Discussing implementation options and best practices. Enhancing Community Processes • Improving issue triage for faster response times. • Setting clear expectations for plugin maintainers. Community Engagement • Facilitating productive discussions during the Community Plugins SIG meetings. • Helping out by answering questions on Discord
  17. community support is invaluable Don't hesitate to ask—open-source thrives on

    collaboration KEY TAKEAWAYS hands-on experimentation is key Dive into the code and experiment. growth happens outside your comfort zone Embrace challenges to learn and grow.
  18. 46

  19. 47

  20. 48

  21. 49

  22. 50

  23. 51

  24. 52

  25. 53

  26. 54

  27. 55

  28. 56

  29. 57

  30. Hi, I’m Edgar Site Reliability Engineer at Switchboard Labs SRE

    since 2021 Passed the PCA exam in June 2024 with a score of 96% Introduction 59
  31. This is not... A lecture on everything you need to

    memorize to pass the PCA (there’s quite a bit) This is... • A demo of the concepts and tools the PCA covers (so you know what it is you’ll be memorizing) About my talk 61
  32. Prometheus to the rescue! 1. Use prometheus to collect metrics

    2. Query the metrics with PromQL 3. Visualize the metrics with Grafana 4. Use Alert Manager to notify us when something goes wrong = “This will be on the exam” 63
  33. Metrics Exposition Format # HELP http_requests_total Count of all HTTP

    requests Description # TYPE http_requests_total counter Type http_requests_total{code="200",method="get"} 36 # HELP version Version information about this binary # TYPE version gauge version{version="v1.0.1"} 1 64 http://my-website.com/metrics Labels Value Name
  34. Metrics Types Great for: • Number of requests served •

    tasks completed • Errors Great for: • Current memory usage • Temperature • Concurrent requests Counter 65 Gauge
  35. Histograms and Summaries Great for: Request durations Response sizes In

    general - for graphing the distribution of numerical data. Metrics Types (continued) 66 https://dyladan.me/histograms/2023/05/03/histograms-vs-summaries/
  36. Let’s collect the metrics Kubernetes service-discovery 68 # /etc/prometheus/prometheus.yml scrape_configs:

    - job_name: 'my-app' kubernetes_sd_configs: - role: pod namespaces: names: - my-namespace
  37. Let’s collect the metrics We’ll use static configs # /etc/prometheus/prometheus.yml

    scrape_configs: - job_name: 'prometheus-example-app' static_configs: - targets: ['prometheus-example-app:8080'] 69
  38. PromQL Prometheus expressions 70 Instant vector Range Vector Scalar A

    set of time series containing a single sample for each time series, all sharing the same timestamp Set of time series containing a range of data points over time for each time series Simple numeric floating point value http_requests_total{code="200"} http_requests_total{code="200"}[5m] 5
  39. Aggregation operators Used to aggregate the elements of a single

    instant vector, resulting in a new vector of fewer elements with aggregated values. PromQL 71 sum count topk calculate sum over dimensions Count number of elements in the vector Largest k elements by sample value sum( prometheus_http_requests_total ) count( prometheus_http_requests_total ) topk( 5, Prometheus_http_requests_total )
  40. PromQL Useful functions 72 rate(v range-vector) irate(v range-vector) increase(v range-vector)

    clamp(v instant-vector, min scalar, max scalar) Calculates the per-second average rate of increase of the time series in the range vector. Calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Returns the increase in the time series in the range vector. Clamps the sample values of all elements in v to have a lower limit of min and an upper limit of max. rate(http_requests_total{c ode="200"}[5m]) irate(http_requests_total{ code="200"}[5m]) increase(http_requests_tot al{code="200"}[5m]) clamp(http_requests_total{ code="200”}, 30,100)
  41. Time for beautiful graphs Let’s install grafana... 73 # data

    source http://prometheus:9090 # scrape interval 5s
  42. Prometheus Rules Alert manager 76 # prometheus-rules.yml groups: - name:

    prometheus-example-app rules: - alert: HighHttpInternalErrorCount expr: increase(http_requests_total{job="prometheus-example-app", code=~"5[0-9][0-9]"}[1m]) > 20 for: 1m keep_firing_for: 1m labels: severity: page annotations: summary: High 5xx error count
  43. # alertmanager.yml route: receiver: 'pagerduty' group_by: ['alertname'] group_wait: 30s group_interval:

    5m repeat_interval: 3h receivers: - name: 'pagerduty' pagerduty_configs: - routing_key_file: /etc/alertmanager/pd-routing-key.example Alertmanager routing Alert manager 77
  44. Practice exams Udemy - Prometheus Certified Associate Practice Exams Includes

    4 practice exams, 60 questions each Explanations https://www.udemy.com/course/prometheus-certified-associate-practice-exams/ Exam tips 79