Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL and Vitess (and Kubernetes) at HubSpot

MySQL and Vitess (and Kubernetes) at HubSpot

In 2017-2018, to simplify the deployment, allow easier automation and have better scaling options, HubSpot MySQL databases, previously run on RDS and EC2 instances, were migrated to Vitess and Kubernetes. In this talk, we share our experience of 4+ years running Percona Server with Vitess on Kubernetes in AWS: things that worked well, others that we had to adapt, and some open problems we are still facing. We will cover sharding, the automation tooling, and our journey to multi-region. The audience will be presented with a real-world, large MySQL/Vitess/Kubernetes deployment: you will learn from what we built, and hopefully avoid some of our mistakes. Come to this talk to see how HubSpot operate 800+ MySQL MySQL replication clusters / keyspaces with Vitess and Kubernetes.

Jean-François Gagné

May 25, 2023
Tweet

Other Decks in Technology

Transcript

  1. by Jean-François Gagné
    and Swetha Narayanaswamy
    (presented at Percona Live Denver 2023)
    MySQL and Vitess (and Kubernetes)
    at

    View Slide

  2. The Speakers
    Jean-François Gagné
    System / Infrastructure Engineer
    and MySQL Expert
    Swetha Narayanaswamy
    Director, Engineering,
    Data Infrastructure

    View Slide

  3. Agenda
    Quick Introduction to Vitess and Kubernetes
    Presentation of HubSpot MySQL Infrastructure
    Zoom in Percona Server and MySQL at HubSpot
    Zoom in Kubernetes and Vitess at HubSpot

    View Slide

  4. HubSpot Organisation
    “We run thousands of MySQL databases,
    supporting 1500 engineers and major portions of the product, and we sleep at night !”
    ● What we strive for
    ○ High-class developer experience enabling self-service provisioning and management
    ○ Enable clean safe high velocity development of backend microservices that depend on
    MySQL
    ● How we operate
    ○ Small autonomous teams
    ○ Split into Orchestration (resiliency), Services (developer self-service), Performance
    (scaling)
    ○ Engineers from all three teams balance the role of guiding the rest of the organization

    View Slide

  5. HubSpot Organisation
    “We run thousands of MySQL databases,
    supporting 1500 engineers and major portions of the product, and we sleep at night !”
    ● Things we got right
    ○ Ease and reliability of provisioning a new keyspace / database
    ○ Tools built to support development and operation
    ○ Workflows we’ve built to safely upgrade the system
    ● Things we’re working on
    ○ Automation around sharding (like self-service resharding)
    ○ Client connection primitives (No 1:1 client to database connection, breaks circuit breakers)
    ○ Improvements to our operator

    View Slide

  6. Vitess: middleware for MySQL
    Vitess Overview
    Vitess provides all the nuts and bolts to run MySQL at scale, including:
    ● Scalability and reliability data-plane features (like sharding, replica reads, and query killing)
    ● Data optimisation features (like query buffering and caching, and connection pooling)
    ● Operation features (like reparent, backups, resharding, and move-table)
    ● Generic vReplication feature for change-data-capture
    ● Service Discovery (vtGate) backed by a service registry (Topology Server)
    ● MySQL sidecar (vtTablet) providing both data and control plane functions
    ● UIs (vtAdmin, vtCtld and vtOrc) for administrations and automation (including failover)
    ● Probably others (above not exhaustive)

    View Slide

  7. Vitess Overview - Terminology
    ● Shard (Keyspace-Shard): keyspace subset: MySQL primary and its replicas (a replica-set)
    ● Keyspace: a Vitess logical schema / database
    ○ The simplest keyspace (not sharded) is a single keyspace-shard
    ○ A sharded keyspace contains many keyspace-shards
    ● Reparent: the action of changing the primary for a keyspace-shard
    ○ There two types of reparent: planned and emergency
    (also called switchover and failover in non-Vitess context)
    (omitting details about reparentShard and reparentTablet)

    View Slide

  8. Vitess Overview - Diagram
    (from https://vitess.io/docs/16.0/overview/architecture/)

    View Slide

  9. Vitess Overview - Diagram (data-plane)

    View Slide

  10. Kubernetes
    Kubernetes Overview
    Kubernetes is platform for managing containerized workloads
    ● It provides basic mechanisms for the deployment, maintenance, and scaling of applications
    ● Containers: OS-level virtualization / Linux cgroup
    ● Pod: the scheduling unit of Kubernetes, contains one or more co-located containers
    (Similar to a vm, a pod is assigned an ip address)
    ● Replica-Set: stable set of pods running at any given time
    (Kubernetes guarantee a specified number of pods, with self-healing when losing one)

    View Slide

  11. HubSpot MySQL Infrastructure
    HubSpot MySQL Infrastructure
    ● Inventory of what we have
    ● How we got there: migration to Vitess and Kubernetes (from RDS and AWS/EC2)
    ● Hublets: our current multi-region deployment

    View Slide

  12. HubSpot MySQL Infrastructure - What we have - Environment
    Overview of our environment:
    ● 2 active AWS regions (others for backups): US East and Germany
    ● 2 production Kubernetes clusters, 2 development / QA clusters, 2 test clusters
    ● 2 production Vitess clusters, 2 development / QA clusters, 15+ clusters for tests
    ● Leverage EBS for storage, and snapshot for data history (we also do backups in S3)
    (snapshots are quick to use – restores need copying data – but are slow to “hydrate” in AWS)

    View Slide

  13. HubSpot MySQL Infrastructure - What we have - Vitess & MySQL (Percona Server)
    Overview of our Vitess & MySQL installation:
    ● Vitess version 14 (was v5, we just finished upgrading in Q1 of 2023)
    ● Percona Server 5.7 (we will eventually go to 8.0 or 8.1)
    ● 800+ Production keyspaces in US East (+75% for Germany, x2 for QA)
    ● 10+ sharded keyspaces, with up to 32 shards
    ● 1000+ primaries / keyspace-shards
    (each primary has at least two replicas, sometimes more)
    ● 3100+ mysqld (primaries and replicas)
    ● We do not persist / flush trx to disk, but we persist to a replica when committing on the primary
    ○ We run with sync_binlog = 0 and innodb_flush_log_at_trx_commit = 2
    (for replication throughput with remote storage / EBS, the MTR solution is still painful)
    ○ We use lossless semi-sync for data persistence (that is why we have min. two replicas)

    View Slide

  14. HubSpot MySQL Infrastructure - What we have - Tooling
    100+ Schema Changes / Migrations per week (counting all envs; most of them with gh-ost)
    Overview of our tooling:
    ● We have our own fork of gh-ost and Vitess
    ● We use upstream Percona Server, XtraBackup, and Percona Toolkit
    ● A lot of developer tooling:
    ○ SQL Cop, Explain tool, Slow log viewer, …
    ○ Query Report based on the digest table (more about this in MySQL-dedicated section)
    (if you want to know more, Mali will talk about our tooling at 3:30pm - Evergreen D)
    (Mali’s talk title: How To Not Gatekeep The Database - Safely)
    (Mali’s talk subject: How 1500 developers do most of their work without dba involvement)

    View Slide

  15. HubSpot MySQL Infrastructure - How we got there - Migration from RDS and AWS/EC2
    Why did we migrate to Vitess and Kubernetes
    ● Goals
    ○ Create a self-service platform: a DB in minutes & zero manual actions
    ○ Resource efficiency: Right-size containers, and better backups (not exclusively on snaps)
    ○ Scale: Data size agnostic way to choose DB to use
    ● Results
    ○ Order of magnitude of scale vs previous system
    ○ Eliminate manual actions for operational work and adopt K8s API driven automation
    ○ Vitess for operation (backup/restore), replication topology management (failovers)
    ● Post Migration
    ○ Running in AWS US East, replicas in different availability zones
    ○ Backup replicated to a secondary region
    ○ We have our own Kubernetes operator and adjacent automation tooling

    View Slide

  16. HubSpot MySQL Infrastructure - Vitess Hublets
    Our Journey to Multi-Region: An Introduction
    https://product.hubspot.com/blog/developing-an-eu-data-center
    ● In July 2021, HubSpot becomes multi-region by launchings Hublets
    ● Legacy becomes the na1 Hublet, and the eu1 Hublet is added
    → x2 Vitess clusters, x2 Kubernetes clusters, +75% keyspaces (for Prod and QA)
    ● To provide unique ids across Hublets, we designed and built vTickets in our fork of Vitess
    ● To provide global data, we implement Cross-Hublet Replication (na1 → eu1)

    View Slide

  17. HubSpot MySQL Infrastructure - vTickets
    Our Journey to Multi-Region: vTickets, Hublet-Compatible MySQL Auto-Increment
    https://product.hubspot.com/blog/our-journey-to-multiregion-vtickets
    ● 75% of the keyspaces have at least one table with an auto-increment
    ● vTickets is a HubSpot extension to Vitess providing unique id generation across Hublets
    ● vTickets is based on the vSequence feature of Vitess (for unique ids across shards)
    ● allocation of vSequences is at the vtGate level, vTickets work at the vtTablet level
    ● vTickets is more resilient and “quicker” (less latency) than vSequences
    ● vTickets might allocate out-of-order ids at the global level (never at the Hublet or Shard level)
    (id 105 might be allocated on a shard while 110 is already allocated on another shard or Hublet)

    View Slide

  18. HubSpot MySQL Infrastructure - vTickets - High Level Design
    ● Next id in Zookeper
    ● vTickets Service
    ● Srv called by vtTablet

    View Slide

  19. HubSpot MySQL Infrastructure - vTickets - Full Design with Caching and Prefetching
    ● One shard shown, but there can be many
    ● One vTickets Service per Hublet (local)
    ● One Global vTicket Service
    (called by Local Services)
    ● Cache in Local Services
    (resilience to wide area network outages)
    ● Cache in vtTablet
    (reducing insert latency)
    (resilience to Local Srv. transient outages)
    ● Prefetching in vtTablet and Local Services
    (more resilience and fewer tail latencies)

    View Slide

  20. HubSpot MySQL Infrastructure - Cross Datacenter Replication
    Our Journey to Multi-Region: Cross Datacenter MySQL Data Replication
    https://product.hubspot.com/blog/mysql-data-replication
    ● Binlog Processor as a na1 replica with replication filters
    ● Cron job pushing binary logs to S3, and pulling and applying them in eu1
    ● Bug#91812: mysqlbinlog | mysql fails with GTID and rpl filters on ts default (unfixed in May 2023)

    View Slide

  21. Percona Server at HubSpot
    Percona Server at HubSpot
    ● We are running Percona Server 5.7
    ● Percona Server is a close fork of Oracle MySQL (MySQL is covered in the next section)
    ● Percona Server has improvements over Oracle MySQL, some of which are super useful !
    (These improvements come with a minor drawback)

    View Slide

  22. Percona Server at HubSpot - HubSpot is happy with Percona Server
    ● The Extended Slow Log gives good InnoDB insight in slow queries (we wish these were in P_S)
    ● XtraBackup and lock-free hot backup allows non-intrusive backups (not lock-free with MySQL)
    (even if these are plan B, plan A being cold backups, hot backups are sometimes needed)
    ● Percona Server also has bugs fixes not in MySQL (more about this in the MySQL section)
    ● We had to work around a crashing bug in STOP SLAVE SQL_THREAD with MTR enabled
    (Percona fixed Bug#96400 - still unfixed in MySQL - but regression fixed later in PS-8030)
    https://jfg-mysql.blogspot.com/2023/05/avoiding-stop-slave-crash-with-mtr-in-ps-5.7.html
    ● There were deadlock problem with Multi-Threaded Replication (MTR) and backup lock
    (PS-3345 fixed in 5.7.28-31; but many other problems in MySQL 5.7 with MTR)
    ● We had problems with backoff Free Waits (easily fixable in configuration)
    https://jfg-mysql.blogspot.com/2022/11/tail-latencies-in-percona-server-because-innodb-stalls-on-empty-free-list.html
    Good job Percona for Making MySQL Better !

    View Slide

  23. MySQL at HubSpot
    MySQL at HubSpot
    ● HubSpot runs with the slow and general logs enabled
    ● Comments in queries for traceability and analysis (I wish there was something better)
    ● Look forward to setting long_query_time in query hints in 8.0 !
    ● A lot of things that are good to know
    ● But first…

    View Slide

  24. MySQL at HubSpot - Oracle is a good steward of MySQL !
    New features in 8.0 is what was asked (take advantage of new things without a major upgrade)
    ● Breaking binary in-place downgrade is a logical consequence of this
    (I wished replicating from newer sources to older replicas was supported, for upgrade rollback)
    The new LTS & Innovation Releases Model
    https://twitter.com/MySQL/status/1641581668220772353
    ● Looking for more information about
    this, especially 8.0 being EOL in 2025/26

    View Slide

  25. MySQL at HubSpot - #loveSemiSync
    There is uncertainty about the future of Semi-Synchronous Replication:
    ● I understood at FOSDEM 2023 that Oracle is not improving Semi-Sync
    ● I am disappointed: HubSpot runs Semi-Sync without major pain, and it is great for lossless HA
    (IMHO, Group Replication is too complicated, and simpler is better because more robust)
    (simpler also allows easier troubleshooting when something goes wrong)
    ● Two unfixed bugs from June 2021 (there are probably others): Bug#104012 & Bug#104015
    ● Join me in supporting Semi-Sync with #loveSemiSync (below on Twitter and LinkedIn)

    View Slide

  26. MySQL at HubSpot - A few things that are good to know
    Multi-threaded replication (MTR) is painful in 5.7
    ● Bug#89247: deadlocks, fixed in 5.7.31 and 8.0.21: patch by Percona, thanks !
    ● Bug#107574: deadlocks [...] when setting read_only (wrongfully closed, maybe Percona will fix in PS-8411)
    ● Because of deadlocks, we do not run with MTR always on, we have a enable/disable script
    ● Also, replication load metric does not work with MTR (another reason to disable)
    http://www.markleith.co.uk/2012/07/24/a-mysql-replication-load-average-with-performance-schema/
    Other things that are good to know:
    ● The SUPER_READ_ONLY feature is useful to avoid writing to replica by mistake
    ○ When having to write to a replica, need disabling at the global level, which is risky
    ○ And disabling SUPER_READ_ONLY deadlocks MTR
    ○ A session override would be useful: Bug#111108

    View Slide

  27. MySQL at HubSpot - More good to know
    More things that are good to know:
    ● Commit stalls on binary logs rotation with sync_binlog = 0
    Nice finding by Vadim from Percona: https://twitter.com/jfg956/status/991149562722070528
    (with auto-commit, shows as a slow INSERT, UPDATE or DELETE)
    (in the cloud and for replication speed: sync_binlog = 0 and trx_commit = 2)
    (to be “cloud ready”, MySQL needs to work well with these settings)
    ● Care with Foreign Key: Bug#102586: FK CASCADE breaks with RBR and multiple-table DELETE
    ● Care with Generated Columns: Bug#107438: Generated Columns are NULL on replicas (in 5.7)
    (open June 2022, 8.0 not affected, not fixed in May 2023 - this is a data corruption bug !)
    ● Online ALTER had been failing with Duplicate Entry for 10 years !
    https://jfg-mysql.blogspot.com/2021/11/duplicate-entry-in-alter-table-and-in-optimize-table.html

    View Slide

  28. MySQL at HubSpot - Performance Schema
    Performance Schema is super useful
    ● Love the query observability provided by events_statements_summary_by_digest
    ○ ps-collect-digest: takes a copy of this table, run regularly by cron (every minute for us)
    ○ Delta-ing copies shows what happened during a time interval
    ○ We store these copies in MySQL for querying with SQL: use-case for writing to replica !
    ● Annoyances about digests and other metrics:
    ○ where … in (?) vs … in (...): 2 digests for many tuples, or just one, in the IN-clause
    ○ where a=:a
    0
    OR a=:a
    1
    vs … OR a=:a
    2
    …: as many digests as there are OR variations
    ○ where b=false vs … b=true: 2 different digests (instead of true/false being a ?)
    ○ Rows examined blindspot because index condition pushdown
    https://jfg-mysql.blogspot.com/2022/03/rows-examined-not-trustworthy-because-index-condition-pushdown.html
    ○ Another rows examined blindspot (I have not blogged about it yet, involve 0 row sent)
    (where a in (a
    1
    ,a
    2
    ...a
    n
    ) AND b in (b
    1
    ,b
    2
    ...b
    m
    ) scans n x m rows, but “examines” 0 !)
    ● We have plans for ps-collect-{table,file,mutex} !
    ● Miss InnoDB Metrics, and accounting features in P_S (query by host, user, cnx attributes)

    View Slide

  29. Choice made many years ago at HubSpot: all tables are created with KEY_BLOCK_SIZE=8
    ● Probably for saving disk space (but sometimes it does not, or a smaller kbs would be better)
    ● But it hides bad design (excessive use of JSON, VARCHAR as ENUMs, …)
    ● And it violates standard database scaling best-practice: be biased for doing things client-side
    Using KEY_BLOCK_SIZE=8 by default might be a sub-optimal optimization:
    ● It makes things more complicated, and might consume CPU on a CPU-bound workload
    ● It is useful, but should be used on a case-by-case basis with optimized block size
    https://jfg-mysql.blogspot.com/2017/08/adventure-innodb-table-compression.html
    InnoDB Table Compression is also complicated (fully understanding it is non-trivial):
    ● It stresses the InnoDB Free List (and no good metrics)
    https://jfg-mysql.blogspot.com/2022/12/free-pages-consumption-by-innodb-table-compression.html
    ● It has a weird IO behavior (I need to blog about this)
    ● And it has many unsolved bugs…
    MySQL at HubSpot - InnoDB Table Compression (ROW_FORMAT=COMPRESSED )

    View Slide

  30. MySQL at HubSpot - InnoDB Table Compression Bugs
    Some bugs (list non-exhaustive):
    ● Bug#59550: Innodb_buffer_pool_pages_misc goes wrong (from 2011 !)
    ● Bug#84439: Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1
    ● Bug#84467: ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs
    ● Bug#107074: The column uncompress_ops capped at 2^31-1 in I_S.INNODB_CMP
    And Valerii Kravchuk has a full blog post about them:
    http://mysqlentomologist.blogspot.com/2018/06/on-innodb-data-compression-in-mysql.html
    ● “If you hit some problem with this feature [InnoDB Compression] you have to live with it”
    ● “[...] compression options do not get much attention from Oracle developers recently”

    View Slide

  31. MySQL at HubSpot - InnoDB Table Compression Bugs - not fixed by Oracle but fixed by Percona !
    Bug#107059: Bad compression stats in I_S.INNODB_CMP starting in 5.7.27 (8.0 not affected)
    ● Regression introduced in July 2019, bug opened in April 2022 (these are not used by many)
    ● Patch provided literally hours after the bug was opened (thanks huahua xu !)
    ● Still not fixed in May 2023: seriously ! – So I opened PS-8749 while preparing this talk, and…
    Thanks again Percona !
    You make MySQL better !
    https://twitter.com/jfg956/status/1655949959953547270

    View Slide

  32. MySQL at HubSpot - InnoDB Table Compression Bugs - Cascading Crashing Bug !
    We also hit a crashing bug with InnoDB Table Compression which can lead to cascading failure
    ● Bug#111063: Assert: !dummy_big_rec on rollback of delete or update of row_format=compressed
    ● This is exacerbated by Vitess query killer: a killed query rolls-back, which crashes MySQL
    ● Retrying this query after failover will probably lead to another kill, and another crash !
    ● Unsatisfactory fix : rebuilding the table (ALTER TABLE … FORCE ) removes the crashing condition
    Even if uncommon, this bug had us rethink our usage of InnoDB Table Compression
    ● This bug only happens on very specific rows (only a few), but in many of our tables
    ● We are currently working on decompressing most/all of our tables (this might save us CPU)
    ● We had to patch gh-ost because of a bugs around removing compression
    (Bug#70534: Removing table compression leaves compressed keys)
    Learning: only use InnoDB Table Compression when needed
    More learning: test your cascading failure prevention mechanism

    View Slide

  33. Kubernetes at HubSpot
    Kubernetes at HubSpot
    ● Historically, HubSpot was using Mesos and a home-developed scheduler (Singularity)
    ● Kubernetes is slowly replacing Singularity, and our MySQL DBaaS Platform runs on Kubernetes
    ● Running MySQL in Kubernetes is great: all the common tasks are automated
    ● But uncommon things not taken into account by automation are tedious
    (testing replica, upgrades, or other things you would creatively do with less constraint env.)
    ● The container world is a different world than the vm or bare-metal world
    ○ Thinking in a “vm way” in a container world generates technical debt
    ○ There are new “problems” in a container world (like CPU throttling)

    View Slide

  34. Kubernetes at HubSpot - A Great Success for 5 Years !
    We DO NOT agree with this !
    ● HubSpot is successfully running
    MySQL in Kubernetes since 2018 !
    ● In Production !
    ● It has been a great success for 5 years !
    ● And it is a smooth ride !
    ● (We also run Zookeeper in Kubernetes)
    ● (We even run HBase on ARM in Kubernetes)

    View Slide

  35. Kubernetes at HubSpot - Getting used to the Container / Kubernetes World
    An example of “vm concept” we brought in our Kubernetes Operator:
    ● Performance class for keyspaces: x-small (for tests), small (default), medium, large, xl, 2xl…
    (control CPU and RAM allocation, and other MySQL params)
    ● From RAM, Max Cnx, and other MySQL params, we back-compute InnoDB Buffer Pool Size
    → this is sub-optimal, BP size should be 1st level params and RAM should be computed from it
    ● So we design medium “vms” / pod, not medium “MySQL Instances”: we will fix this soon
    (in new design, allocate BP size and forward-compute RAM for the MySQL container)
    (no more BP size shrinkage when increasing Max Cnx or other MySQL param)
    Kubernetes allows the storage of configuration in Custom Resources
    ● Works well with everything in a single Kubernetes Clusters
    ● But when a MySQL deployment spawns 2 clusters, keeping things in sync is complicated
    ● Hublets also duplicated all this (not only we mange QA and Prod, but now also na1 and eu1)
    ● An external / centralized asset database would probably make sense

    View Slide

  36. Kubernetes at HubSpot - CPU Throttling
    Another new thing in the Container World is CPU Throttling
    ● Linux CGroup enforces CPU limit on container by pausing their scheduling, 100 ms in our env.
    ● If throttled, a Primary Key lookup usually taking 1 ms might take 101 ms
    ● A MySQL Container with 10 CPU with Max Cnx of 100 can easily get throttled
    ○ Setting Max Cnx to 10 when a container has 10 CPU works for CPU-Bound workload
    ○ This does not work for IO-Bound: 100 Max Cnx needed to use all IOPS
    ○ Tuning mixed workloads is very complicated (more complicated as workloads change)
    ● Maybe opportunity for the a Thread Pool, not tested yet at HubSpot
    (More about CPU Throttling: https://danluu.com/cgroup-throttling/)

    View Slide

  37. Kubernetes at HubSpot - Uncommon Operations Challenge Automation
    The good thing about Kubernetes is to have everything automated
    ● But there is always something you need that will deviate or challenge automation
    ● One thing we had from a long time ago: Stop File in container Startup Scripts
    (If the file exists, pause startup – sleep infinity – and allow troubleshooting)
    ● Something we added 12 months ago: testing replicas
    (to make sure a replica is not restarted while testing, or worse reparented to)
    (These days, we do not use this much, instead we spawn snap pods from EBS snapshots)
    ● Something we did not have 2 years ago and was a heavy lift: MySQL Upgrades

    View Slide

  38. Kubernetes at HubSpot - MySQL Upgrades
    Two years ago, our MySQL version management was a variable in our docker image
    ● All pods were running the same MySQL version
    ● Upgrading was changing this variable and “deploying” the new image on all pods
    ● Hopefully, when doing a version change, things went well
    ○ If not, hopefully we could rollback by re-deploying with the older version
    ○ This does not work with 8.0, and some downgrade do not work with 5.7 either
    Since then, we built full automation for upgrades, allowing upgrading one keyspace at a time
    ● A set of upgraded pods running an upgraded dockerfiles is spawned
    ● Reparent to an upgraded pod to switch version, reparent to un-upgraded pod for rolling-back
    ● And we needed such rollback during our 5.7 upgrades (IODKU becoming super slow)
    ● This assumes replication from vN+1 to vN works, not officially supported, but can be tested
    ● And make sure a failover / reparent does not upgrade or downgrade ;-)
    Learning: always have a rollback plan, especially at scale !

    View Slide

  39. Vitess at HubSpot
    Vitess at HubSpot
    ● Vitess is a Great Middleware for running MySQL
    ○ MySQL Operations made easier by standardizing many functions
    (reparent, failover automation, backups, connection pool, …)
    ○ Vitess also provides useful data-plane features (like cnx pooling, query killing, “sharding”)
    ● Some features are missing (like Quality of Service and Smart Circuit Breakers)
    ● Upgrades are complicated
    ● And easy, transparent, efficient, resilient, and general purpose sharding is a myth !

    View Slide

  40. Vitess at HubSpot - General Advice / Need to Know
    Strong advice: if deploying Vitess, plan for using your own fork (even if you do not have patches yet)
    ● A very commun answer to Vitess feature requests: “would you like to contribute this?”
    ● If you end-up contributing something, your fork allows you to take advantage of it earlier
    ● Also, your needs might not be “acknowledged” by Vitess, so you will need fixing things yourself
    OOM of vtTablet (we hoped upgrade from v5 to v14 would solve, still there)
    ● Hypothesis: growing RAM usage because unbounded buffering, exacerbated by slow consumer
    ● Happens with XtraBackup, and large result-set for Cross-shard queries
    The Vitess data-plane is not a lean stack:
    ● Query parsing and Result-Set handling happens in vtGates and vtTablet
    ● This requires a lot of CPU (and RAM for result buffering)
    ● But one could use Vitess without vtGates (to take advantage of administration features)

    View Slide

  41. Vitess at HubSpot - Sharding
    Easy, transparent, efficient, resilient, and general purpose sharding is a myth !
    ● Converting an unsharded keyspace to sharded needs effort (probably an initial split)
    ○ some tables in an unsharded keyspace will probably stay unsharded
    ○ needed manual work as MoveTable unsatisfactory in v5: no rollback
    ● Resharding is easier-ish
    (broken in Vitess v5, we fixed it in our fork, did some resharding, no attempt in v14 yet)
    ● Devs do not have a good mental model of sharding: they write cross-shard queries by mistake
    (the query optimizer is not well understood, it is worse for a distributed query optimizer)
    Sharding abstraction and reliability / performance are “orthogonal”
    ● Batching Primary Key reads needs to be shard-aware (to avoid problems with X-shard queries)
    ● To be efficient, batch jobs need to be shard-aware (to avoid hot-spotting one shard to another)
    ● To be resilient, applications needs to be shard-aware (to limit blast radius of problematic shard)
    ● Other problems when an application is not shard-aware related to performance / reliability

    View Slide

  42. Vitess at HubSpot - Quality of Service and Circuit Breaking
    HubSpot is currently working on implementing Query Priority and Load Scheding
    ● When a vtTablet Connection Pool reaches a certain queue size, fast-fail low priority queries
    ● Goal: allow high priority query to succeed (instead of timing-out)
    ● We have a working POC on this, and this is being discussed in the Vitess Slack
    https://vitess.slack.com/archives/C04N16YRWDC/p1682090035972359
    Smart Circuit Breaking with Vitess
    ● Circuit breakers in the application allows alleviating load when something goes wrong
    ● When one shard of a sharded keyspace is impaired, our circuit breakers are tripping
    ● Making client-side circuit breakers shard-aware is not easy
    (application send queries to keyspace, not shard; → shard-abstraction is a double-edge sword)
    ● No good solution to this yet (we disable circuit breaking on sharded keyspace)

    View Slide

  43. Vitess at HubSpot - Vitess needs a better Upgrade Story !
    Aggressive release schedule, restricted support timeframe, and limited compatibility → complicated !
    ● Vitess releases 3 versions a year, with an official upgrade path only with n+1 (no LTS)
    ● The support timeframe of only 1 year leaves little time for stable operations
    ● Vitess users either have to operate “unsupported”, or have to run a continuous upgrade race
    → neither options are appealing !
    At the beginning of 2022, HubSpot was still running Vitess v5 (released in 2020)
    ● In 2022, we started a major “in a single step” upgrade to v13 (released in February 2022)
    ● On the way, we pivoted to upgrading to v14 (released in June 2022)
    ● We completed the upgrade a few weeks ago (in Q1 of 2023)
    ● Since then: v15 released in October 2022, v16 in February 2023 (and v17 scheduled June 2023)
    ● We will probably do another upgrade project in the next 12 months
    Vitess needs a better Upgrade Story !

    View Slide

  44. Vitess at HubSpot - HubSpot Upgrade from Vitess v5 to v14
    The foundation for our upgrade “in a single step” is the Cell concept of Vitess
    ● From our v5 Cell, we build a new v14 Cell, using MySQL repl. from v5, with its vtGates, etc…
    ● First upgrade step: is a reparent to a v14 vtTablet (replica reads are still in v5)
    ● We had to patch the v5 vtGates for compatibility with v14 vtTablet: our fork was essential here !
    ● Rolling-back: reparent back to v5
    ● Rolling-forward: point the application to the v15 vtGates (we used DNS for this)
    ● Rolling-back once on v15 vtGates: back at v5 vtGate, and then reparent to v5 vtTablet
    At each step of the process, we can roll-back, and we needed this
    ● A cross-shard query went from being super fast to super slow (SELECT DISTINCT )
    (regression in vtGate query optimizer: making things right made them slower)
    (solution: enabling Schema Tracking in vtGate, beta feature in v11, stable in v14)
    ● A DELETE query, on which the vtGate rewrite changed, also went from fast to slow
    Repeated learning: always have a rollback plan, especially at scale !

    View Slide

  45. Conclusion
    Conclusion
    ● After 5 years, we are happy with running MySQL with Vitess in Kubernetes
    ● Automation is a never finished project !
    ● Kubernetes is mostly handled / mastered; but our Operator is continuously improved
    ● MySQL and Vitess are good, but they could be better (neither are fully fitting our needs yet)
    ● We are committed to Vitess and we continue investing in it

    View Slide

  46. Learnings
    ● What is working well
    ○ 3100+ mysqld and we don’t think about them individually
    ○ Failover automation: Topology and built-in methods to reparent, promote, create new replica
    ○ Vitess Slack: for getting help, discussing new features, open bugs, and monthly meetings
    ○ Backups
    ● Areas of improvement
    ○ Upgrade cycle: a year from v5 to v14, now we have automation, should be faster next time
    ○ Vitess not 100% compatible with MySQL 8: Window functions and CTEs
    ● Best practices
    ○ Adhere as closely as possible to Vitess best practices (avoid unofficial workarounds)
    ○ Have a good understanding of when you want to be sharded vs unsharded preferably before
    you create and start using it
    ○ Upgrade: Upfront test, Query replay, duplicate cluster, then shift traffic (not in-place upgrade)

    View Slide

  47. Links - General
    ● Some historical links:
    ○ Mesos, HubSpot, and the Singularity
    https://product.hubspot.com/blog/mesos-hubspot-and-the-singularity
    ○ Vitess at HubSpot: How We Moved Hundreds of MySQL Databases into Kubernetes
    https://kccna18.sched.com/event/Gras/vitess-at-hubspot-how-we-moved-hundreds-of-mysql-databases-into-kubernetes-tom-petr-alexander-charis-hubspot
    ● Introducing HubSpot’s EU Data Center: https://www.hubspot.com/eu-data-centre
    ● Our Journey to Multi-Region: An Introduction
    https://product.hubspot.com/blog/developing-an-eu-data-center
    ● Our Journey to Multi-Region: VTickets, Hublet-Compatible MySQL Auto-Increment
    https://product.hubspot.com/blog/our-journey-to-multiregion-vtickets
    ● Our Journey to Multi-Region: Cross Datacenter MySql Data Replication
    https://product.hubspot.com/blog/mysql-data-replication
    ● HubSpot Quality of Service Discussion in Vitess Slack
    https://vitess.slack.com/archives/C04N16YRWDC/p1682090035972359

    View Slide

  48. Links - Speaker’s Blog
    ● Question about Semi-Synchronous Replication: the Answer with All the Details
    https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
    ● The consequences of sync_binlog != 1
    https://archive.fosdem.org/2020/schedule/event/sync_binlog/
    ● Duplicate Entry in ALTER TABLE (and in OPTIMIZE TABLE)
    https://jfg-mysql.blogspot.com/2021/11/duplicate-entry-in-alter-table-and-in-optimize-table.html
    ● Rows Examined not Trustworthy because of Index Condition Pushdown
    https://jfg-mysql.blogspot.com/2022/03/rows-examined-not-trustworthy-because-index-condition-pushdown.html
    ● Tail Latencies in Percona Server because of InnoDB Stalls on Empty Free List
    https://jfg-mysql.blogspot.com/2022/11/tail-latencies-in-percona-server-because-innodb-stalls-on-empty-free-list.html
    ● Avoiding a STOP SLAVE Crash with MTR in Percona Server older than 5.7.37-40
    https://jfg-mysql.blogspot.com/2023/05/avoiding-stop-slave-crash-with-mtr-in-ps-5.7.html

    View Slide

  49. Links - Percona Server and MySQL Bugs
    ● LP#1527463: Waiting for binlog lock (became PS-3345)
    ● Bug#96400: MTS STOP SLAVE takes over a minute [...] (open Aug. 1st 2019, still not fixed)
    Percona fixed in PS-5824 (5.7.28-31 and 8.0.17-8) but introduced a crash fixed in PS-7147 (8.0.19-10)
    5.7 forgotten from above, fixed later in PS-8030 (reported in 2022-01-11, releases on 3-31 in 5.7.37-40)
    ● Bug#102586: Foreign Key ON DELETE CASCADE breaks with RBR and multiple-table DELETE
    ● Bug#102988: mysqlbinlog | mysql breaks with GTIDs and [...] (test case for Bug#91812)
    ● Bug#104012: FLUSH BINARY LOGS blocks when a client is waiting for semi-sync ACK
    ● Bug#104015: Losing semi-sync acks could stall the primary
    ● Bug#104626: Remove failure of Online ALTER because concurrent Duplicate entry
    ● Bug#107438: JSON generated column values are NULL on replicas
    ● Bug#107574: MTR deadlocks [...] changing read_only (wrongfully closed, Percona might fix PS-8411)
    ● Bug#111108: Provide a way to override SUPER_READ_ONLY at the session level

    View Slide

  50. Links - InnoDB Table Compression - Bugs and Posts
    ● Bug#59550: Innodb_buffer_pool_pages_misc goes wrong (from 2011 !)
    ● Bug#70534: Removing table compression leaves compressed keys
    ● Bug#84439: Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1
    ● Bug#84467: ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs
    ● Bug#107059: Bad cmp. stats in I_S.INNODB_CMP [...] (sister bug, opened later, quickly fixed: PS-8749)
    ● Bug#107074: The column uncompress_ops capped at 2^31-1 in I_S.INNODB_CMP
    ● Bug#111063: Assert: !dummy_big_rec on rollback of delete or update of row_format=compressed
    ● An Adventure in InnoDB Table Compression (for read-only tables)
    https://jfg-mysql.blogspot.com/2017/08/adventure-innodb-table-compression.html
    ● Free Page Consumption by InnoDB Table Compression (Percona backoff Algo. part #2)
    https://jfg-mysql.blogspot.com/2022/12/free-pages-consumption-by-innodb-table-compression.html
    ● And Valerii Kravchuk post: On InnoDB Data Compression in MySQL
    http://mysqlentomologist.blogspot.com/2018/06/on-innodb-data-compression-in-mysql.html

    View Slide

  51. Links - Other Good References
    ● The problem with MySQL foreign key constraints in Online Schema Changes
    https://code.openark.org/blog/mysql/the-problem-with-mysql-foreign-key-constraints-in-online-schema-changes
    ● Things that don’t work well with MySQL’s FOREIGN KEY implementation
    https://code.openark.org/blog/mysql/things-that-dont-work-well-with-mysqls-foreign-key-implementation
    ● A MySQL Replication Load Average with Performance Schema
    http://www.markleith.co.uk/2012/07/24/a-mysql-replication-load-average-with-performance-schema/
    ● The container throttling problem: https://danluu.com/cgroup-throttling/

    View Slide

  52. Links - Other HubSpot Data Infrastructure
    ● Embrace and Replace: Migrating ZooKeeper into Kubernetes
    https://product.hubspot.com/blog/zookeeper-to-kubernetes-migration
    ● Improving Reliability: Building a Vitess Balancer to Minimize MySQL Downtime
    https://product.hubspot.com/blog/improving-reliability
    ● Healing HBase Locality At Scale
    https://product.hubspot.com/blog/healing-hbase-locality-at-scale
    ● How to Get Better at Updating Your Data Infrastructure
    https://product.hubspot.com/blog/updating-data-infrastructure
    ● Our Journey to Multi-Region: Supporting Cross-Region Kafka Messaging
    https://product.hubspot.com/blog/kafka-aggregation
    ● Saving Millions on Logging: Finding Relevant Savings & Delivering Savings
    https://product.hubspot.com/blog/savings-logging-part1
    https://product.hubspot.com/blog/savings-logging-part2
    ● Launching HBase on ARM
    https://product.hubspot.com/blog/hbase-on-arm

    View Slide

  53. Thanks !

    View Slide