Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Where should I run my Database? Databases on Ku...

OnGres
March 23, 2023

Where should I run my Database? Databases on Kubernetes?

As of today, there are two main ways to run your database: in the cloud, consumed as a service; and self-hosted.

Self-hosting was the only option before cloud; was replaced as the default option by DBaaS; and is now making a comeback. With reason.But would you self-host your database as it was done before? Surely not.
Enter DBaaS-like services on Kubernetes. We will explore:
* What Kelsey Hightower thinks about the topic.
* Why you should use operators for databases on Kubernetes.
* What capabilities databases on Kubernetes provide vs what cloud does.
* How to decide where you should run your database.
* What’s the current landscape of solutions to run your database on Kubernetes.

This talk also featured a short live demo to showcase how to run databases on EKS with StackGres (https://stackgres.io), an open source Postgres operator.

OnGres

March 23, 2023
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. Databases on Kubernetes: Yay or Nay? Where should I run

    my database? Databases on Kubernetes? Alvaro Hernandez @ahachete
  2. Databases on Kubernetes: Yay or Nay? ` whoami ` Alvaro

    Hernandez <[email protected]> aht.es • Founder & CEO, OnGres • 20+ years Postgres user and DBA • Mostly doing R&D to create new, innovative software on Postgres • More than 120 tech talks, most about Postgres • Founder and President of the NPO Fundación PostgreSQL • AWS Data Hero
  3. Databases on Kubernetes: Yay or Nay? Possible options to run

    your DB • On-prem (or cloud instances) • DBaaS (managed service) • Kubernetes (cloud or on-prem)
  4. Databases on Kubernetes: Yay or Nay? apt-get install postgresql #

    yes but well... # will you deploy this to prod? How to deploy Postgres
  5. Databases on Kubernetes: Yay or Nay? OK, we need to

    tune the database 2-8h Postgres DBA
  6. Databases on Kubernetes: Yay or Nay? We need to add

    connection pooling pg_bench, scale 2000, m4.large (2 vCPU, 8GB RAM, 1k IOPS) 4-16h DevOps / pgDBA
  7. Databases on Kubernetes: Yay or Nay? And High Availability! 8-24h

    DevOps / pgDBA • HA software (e.g. Patroni) • Distributed configuration • Entrypoint: ◦ DNS? ◦ Virtual IP? ◦ External discovery service (e.g. Consul)?
  8. Databases on Kubernetes: Yay or Nay? Do you backup your

    data? 4-16h DevOps • Backup software (e.g. WAL-G, pgBackRest) • Backup Storage • Backups lifecycle management • Backup testing / restoration
  9. Databases on Kubernetes: Yay or Nay? You wouldn’t deploy Postgres

    without monitoring, would you? 8-24h DevOps / pgDBA
  10. Databases on Kubernetes: Yay or Nay? Do you leave Postgres

    logs on each server? 4-48h DevOps • Configure CSV logging • Add a logging agent (e.g. FluentBit) to export logs • Add a logging collector (e.g. Fluentd) to collect logs, write code to store it and manage lifecycle. • Or use a paid logs-as-a-Service
  11. Databases on Kubernetes: Yay or Nay? DBaaS (e.g. RDS) •

    They provide great value: ◦ High availability with automated failover ◦ Automated backups ◦ Monitoring ◦ Typically a bit of database parameter tuning • But be aware of what they don’t: ◦ No database support (not infra support, I mean db support!) ◦ Deep parameter tuning. Query tuning. DDL tuning. ◦ Day 2 operations like bloat removal, reindex, etc. ◦ ChatGPT is not managing your DB yet!
  12. Databases on Kubernetes: Yay or Nay? Be aware of DBaaS

    costs vs instances • Good service costs money • Instances cost: 85%-150% more expensive: ◦ E.g. RDS vs EC2 is 1.85x ◦ Plus you need an extra instance (N+1) for high availability ◦ Estimate price overhead as 1.8*(N+1)/N → N the number of instances • Storage costs: ◦ AWS: higher cost on RDS (gp2, gp3 overpriced vs EC2) ◦ Pay separately for I/O ops (e.g. Aurora)
  13. Databases on Kubernetes: Yay or Nay? Managed service == you

    can’t do anything you want • Not all Postgres extensions are available: ◦ RDS: 80 ◦ E.g. StackGres: 160+, adding new every week ◦ No/few clouds support Timescale (Apache + TSL) or Citus • Connection pooling: ◦ RDS: not by default, additional cost (RDS Proxy). ◦ Other DBaaS not even an option. • Limited automation for “Day 2 operations”
  14. Databases on Kubernetes: Yay or Nay? What Kelsey Hightower thinks

    https://twitter.com/kelseyhightower/status /1624081136073994240
  15. Databases on Kubernetes: Yay or Nay? Meeting Kubernetes half way

    • Kelsey Hightower argues that you need to “fight” K8s to run stateful workloads. • Certainly, a bit. But is doable. • Operators have done this already. Don’t run databases on Kuberntes “by hand”, use operators.
  16. Databases on Kubernetes: Yay or Nay? Deploy a simple cluster

    with Kubernetes (w/ StackGres) 1h CKA apiVersion: stackgres.io/v1 kind: SGCluster metadata: name: simple spec: instances: 2 postgres: version: 'latest' pods: persistentVolume: size: '100Gi'
  17. Databases on Kubernetes: Yay or Nay? Deploy an advanced cluster

    with Kubernetes (w/ StackGres) 4-16h CKA • Create YAMLs for several CRDs • Create Ingress if needed • Expose Web Console (Ingress/LB) • Integrate with GitOps
  18. Databases on Kubernetes: Yay or Nay? • Kubernetes also allows

    to automate Day 2 operations • CKA is enough, mostly no Postgres expertise needed • E.g. Day 2 operations implemented in StackGres: ◦ Repack ◦ Vacuum ◦ Repack ◦ Minor version upgrade ◦ Major version upgrade ◦ Controlled restart ◦ Benchmark Automating Day 2 operations
  19. Databases on Kubernetes: Yay or Nay? Postgres operators for Kubernetes

    Fully Open Source • CloudNativePG • KubeDB • Kubegres (unmaintained?) • Percona • StackGres • Zalando • New upcoming operators… • … Proprietary/paid-for (production) • Crunchydata • EnterpriseDB • Fujitsu • VMware Tanzu • …