$30 off During Our Annual Pro Sale. View Details »

Open Source Database Infrastructure with Vitess

Shlomi Noach
February 07, 2021

Open Source Database Infrastructure with Vitess

This session reveals four experimental Vitess developments that automate away complex database operations. With these developments Vitess is able to run its own database infrastructure, transparently to the user, and take control of risky and elaborate situations and operations.

We will briefly explain the Vitess architecture and how it supports said control, and discuss the following developments:

- Throttling: pushback for massive writes.
- Table life cycle: safe and lazy DROP TABLE operations.
- Online DDL: automating, scheduling and managing online schema migrations.
- HA, failovers and cluster healing via vitess/orchestrator (aka vtorc).

Vitess is a CNCF open source database clustering system for horizontal scaling of MySQL.

---
Presented in FOSDEM (online) 2021, see https://fosdem.org/2021/schedule/event/vitess/

Shlomi Noach

February 07, 2021
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. Open Source
    Database Infrastructure
    with Vitess
    Shlomi Noach
    PlanetScale
    FOSDEM 2021

    View Slide

  2. About me
    Engineer at PlanetScale
    Author of open source projects orchestrator, gh-ost,
    freno and others
    Maintainer for Vitess
    Blog at http://openark.org
    github.com/shlomi-noach
    @ShlomiNoach

    View Slide

  3. Founded Feb. 2018 by co-creators of Vitess
    ~45 employees
    HQ Mountain View, remote team

    View Slide

  4. Vitess
    A database clustering system for horizontal scaling of
    MySQL
    ● CNCF graduated project
    ● Open source, Apache 2.0 licence
    ● Contributors from around the community

    View Slide

  5. Agenda
    Vitess architecture overview
    Database infrastructure; experimental and in
    development:
    - Throttling
    - Table life cycle
    - Online DDL
    - HA/failovers

    View Slide

  6. Vitess architecture basics
    How the Vitess architecture enables transparent
    database infrastructure operations

    View Slide

  7. Vitess architecture basics
    Consider a common replication cluster

    View Slide

  8. Vitess architecture basics
    Each MySQL server is assigned a vttablet
    - A daemon/sidecar
    - Controls the mysqld process
    - Interacts with the mysqld server
    - Typically on same host as mysqld

    View Slide

  9. Vitess architecture basics
    In production you have multiple clusters

    View Slide

  10. Vitess architecture basics
    User and application traffic is routed via
    vtgate
    - A smart, stateless proxy
    - Speaks the MySQL protocol
    - Impersonates as a monolith MySQL
    server
    - Relays queries to vttablets

    View Slide

  11. Vitess architecture basics
    A vitess deployment will run multiple
    vtgate servers for scale out

    View Slide

  12. Vitess architecture basics
    vtgate must transparently route queries
    to correct clusters, to relevant shards
    app
    app
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded
    ?

    View Slide

  13. Vitess architecture basics
    Queries route based on schema & sharding scheme
    app
    app
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded
    USE commerce;
    SELECT order_id, price
    FROM orders
    WHERE customer_id=4;

    View Slide

  14. Vitess architecture basics
    topo: distributed key/value store
    - Stores the state of vitess: schemas,
    shards, sharding scheme, tablets,
    roles, etc.
    - etcd/consul/zookeeper
    - Small dataset, mostly cached by
    vtgate
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded

    View Slide

  15. vtctld: control daemon
    - Runs ad hoc operations
    - API server
    - Reads/writes topo
    - Uses locks
    - Operates on tablets
    Vitess architecture basics
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded

    View Slide

  16. Throttling
    Pushback for massive writes, maintain low replication
    lag.
    Based on GitHub’s freno, github.com/github/freno, a
    cooperative throttling service
    Implemented in vttablet
    https://vitess.io/docs/reference/features/tablet-throttler/

    View Slide

  17. Throttling
    Based on replication lag
    Vitess has an internal heartbeat mechanism, similar to
    pt-heartbeat, injecting TIMESTAMP records on the
    primary, read on replicas

    View Slide

  18. Throttling
    Vitess is knowledgeable about servers in a cluster:
    - Primary
    - Replica
    - Non serving replica (OLAP)
    - Backup servers
    By default, vitess only takes into account lag on serving
    replicas. Override with:
    vttablet -throttle_tablet_types=...

    View Slide

  19. vttablet throttler
    The primary tablet of each shard (MySQL replication
    cluster) polls relevant replicas for lag
    Periodically consults topo for changes in replication
    topology and tablet roles
    Serves HTTP API endpoint: /throttler/check
    - Returns HTTP 200 OK when lag is good
    - Other HTTP codes to pushback writes

    View Slide

  20. vttablet throttler
    Implemented internally:
    - Table lifecycle
    - Online DDL
    Ideas for the future:
    - Enforce throttling for massives updates, e.g.:
    UPDATE my_table SET
    new_colunm=price*rate

    View Slide

  21. Table lifecycle
    An automated garbage collector for old tables

    View Slide

  22. DROP TABLE here_be_trouble;

    View Slide

  23. DROP TABLE alternatives
    - RENAME TABLE TO _something_else for quick
    recovery in case of regret
    - Purge table data, possibly with SQL_LOG_BIN=0
    Requires throttling, best avoid concurrent purges.
    - Wait X days till table pages are evicted from buffer
    pool
    - Actually DROP
    - Potentially directly TRUNCATE on replicas
    - Or use BLACKHOLE hacks
    How do you automate/manage/track all these?

    View Slide

  24. Vitess table lifecycle
    A table can be in one of these states:
    - In use
    - HOLD: renamed and kept intact for X days
    - PURGE: rows actively being purged
    - EVAC: wait X days to evict pages from buffer pool
    - DROP: ready for an actual DROP TABLE
    - Gone

    View Slide

  25. Vitess table lifecycle
    Examples:
    - _vt_HOLD_6ace8bcef73211ea87e9f875a4d24e90_20210130093000
    Table held intact until 2021-01-30 09:30:00, then transitioned into next phase
    - _vt_PURGE_6ace8bcef73211ea87e9f875a4d24e90_20210131182000
    Table is in purging process. It will transition into next phase once it is completely
    purged
    - _vt_EVAC_6ace8bcef73211ea87e9f875a4d24e90_20210207071500
    Table remains in evac until 2021-02-07 07:15:00, then transitioned into next
    phase

    View Slide

  26. Purging tables
    vttablet on primary is charged with purging table data
    - Single table at a time
    - DELETE FROM LIMIT 50 in iterations
    - SQL_LOG_BIN=0
    - Using tablet throttler, low priority requests

    View Slide

  27. Vitess table lifecycle
    With table name encoding scheme:
    - The process is stateless
    - Vitess auto-discovers relevant tables
    - Will always do the right thing
    - We do lose context to the original table

    View Slide

  28. Vitess table lifecycle
    Transition states controlled by:
    vttablet -table_gc_lifecycle=
    Examples:
    - “hold,purge,evac,drop” (the default)
    - “hold,drop”: keep intact for X days, then drop
    - “drop”: just drop
    https://vitess.io/docs/reference/features/table-lifecycle/

    View Slide

  29. Online DDL
    Schema changes made easy

    View Slide

  30. ALTER TABLE here_be_trouble
    ADD COLUMN i INT NOT NULL;

    View Slide

  31. ALTER TABLE alternatives
    pt-online-schema-change and gh-ost, adding
    operational complexity:
    - External tools
    - Remote login
    - Discovery
    - Accounts
    - Scheduling
    - Formalize, execute
    - Throttling
    - Tracking
    - Interrupting

    View Slide

  32. Operational complexity
    Often outside ownership of the developers

    View Slide

  33. Online DDL
    Vitess‘ architecture can own most of the complexity:
    - External tools: executed by vttablet
    - Remote login: not required
    - Discovery: vitess knows the topology
    - Accounts: vttablet can create on your behalf
    - Scheduling: use topo to coordinate migrations
    - Formalize, execute: vttablet, on primary server
    - Throttling: using tablet throttler
    - Tracking: via vitess infrastructure
    - Interrupting: via vitess infrastructure

    View Slide

  34. Online DDL
    mysql> SET @@ddl_strategy=’gh-ost’; -- also ‘pt-osc’
    mysql> ALTER TABLE no_problem
    ADD COLUMN i INT NOT NULL;
    +--------------------------------------+
    | uuid |
    +--------------------------------------+
    | 7e9cd911_4b37_11eb_a80f_f875a4d24e90 |
    +--------------------------------------+
    1 row in set (0.01 sec)

    View Slide

  35. Online DDL: flow
    Application issues ALTER TABLE statement
    app
    app
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded
    USE commerce;
    ALTER TABLE orders ADD
    COLUMN due_date
    TIMESTAMP NOT NULL;

    View Slide

  36. Online DDL: flow
    vtgate receives statement,
    but does not pass it on to
    tablets.
    Instead, it notes the
    migration request in topo
    app
    app
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded

    View Slide

  37. vtctld detects migration requests and
    ensures distribution to relevant shards
    Online DDL: flow
    commerce
    shard 0
    commerce
    shard 1
    internal
    unsharded

    View Slide

  38. vttablet on primary receives migration request from vtctld
    - Persists internally
    - Schedules
    - Prepares script
    - Creates one-off credentials
    - Runs gh-ost or pt-osc
    - Uses tablet throttler
    - Tracks
    - Cleans up
    - Feeds artifact tables into the garbage collector
    Online DDL: flow

    View Slide

  39. Online DDL: track, cancel, retry
    $ vtctlclient OnlineDDL commerce show 8a797518_f25c_11ea_bab4_0242c0a8b007
    +-----------------+-------+--------------+-------------+------------+--------------------------------------+----------+---------------------+---------------------+------------------+
    | Tablet | shard | mysql_schema | mysql_table | ddl_action | migration_uuid | strategy | started_timestamp | completed_timestamp | migration_status |
    +-----------------+-------+--------------+-------------+------------+--------------------------------------+----------+---------------------+---------------------+------------------+
    | test-0000000401 | c0- | vt_commerce | demo | alter | 8a797518_f25c_11ea_bab4_0242c0a8b007 | gh-ost | 2020-09-09 05:23:32 | | running |
    | test-0000000201 | 40-80 | vt_commerce | demo | alter | 8a797518_f25c_11ea_bab4_0242c0a8b007 | gh-ost | 2020-09-09 05:23:32 | 2020-09-09 05:23:33 | complete |
    | test-0000000301 | 80-c0 | vt_commerce | demo | alter | 8a797518_f25c_11ea_bab4_0242c0a8b007 | gh-ost | 2020-09-09 05:23:32 | | running |
    | test-0000000101 | -40 | vt_commerce | demo | alter | 8a797518_f25c_11ea_bab4_0242c0a8b007 | gh-ost | 2020-09-09 05:23:32 | | running |
    +-----------------+-------+--------------+-------------+------------+--------------------------------------+----------+---------------------+---------------------+------------------+
    $ vtctlclient OnlineDDL commerce cancel 2201058f_f266_11ea_bab4_0242c0a8b007
    +-----------------+--------------+
    | Tablet | RowsAffected |
    +-----------------+--------------+
    | test-0000000401 | 1 |
    | test-0000000101 | 1 |
    | test-0000000201 | 1 |
    | test-0000000301 | 1 |
    +-----------------+--------------+
    $ vtctlclient OnlineDDL commerce retry 2201058f_f266_11ea_bab4_0242c0a8b007
    +-----------------+--------------+
    | Tablet | RowsAffected |
    +-----------------+--------------+
    | test-0000000101 | 1 |
    | test-0000000201 | 1 |
    | test-0000000301 | 1 |
    | test-0000000401 | 1 |
    +-----------------+--------------+

    View Slide

  40. Online DDL: more than ALTER
    CREATE and DROP statements can also participate in online DDL logic. Both go
    through topo and scheduled by vttablet, can be tracked, cancelled, etc.
    In fact, DROP statements are modified to RENAME statements, e.g.:
    mysql> DROP TABLE i_hope_nobody_uses_this;
    Intercepted by vtgate and transformed into:
    RENAME TABLE i_hope_nobody_uses_this TO
    _vt_HOLD_b0d1fb34450a11ebb980f875a4d24e90_20210203094500;

    View Slide

  41. Online DDL
    Puts ownership back in the hands of the developers
    - Zero dependencies using gh-ost on linux_amd64
    (comes with gh-ost precompiled)
    - Auto retry in case of failover
    Future work:
    - Use vreplication instead of gh-ost/pt-osc
    - Continuously migrate while resharding while
    reparenting
    https://vitess.io/docs/user-guides/schema-changes/managed-online-schema-changes/

    View Slide

  42. vtorc
    Orchestrator integration

    View Slide

  43. MySQL replication clusters
    - Are not primitives
    - Are not identifiable
    - Only exist as meta information
    But mean everything to us!

    View Slide

  44. - Observe
    - Accept reality
    - Assign metadata such as cluster alias
    - Detect failure, failover
    But otherwise does not know if the cluster meets your
    product expectation
    orchestrator’s approach

    View Slide

  45. Common example
    After a split brain scenario and failover
    we end up with two distinct clusters.
    Which one is the “real” production
    cluster?
    MySQL does not know
    orchestrator uses heuristics based on
    failover history and bookkeeping

    View Slide

  46. Vitess knows
    Vitess keeps known schemas, shards, clusters,
    server roles, all in topo
    It keeps a state

    View Slide

  47. The old vitess-orchestrator
    integration
    Works, until it doesn’t:
    - Conflicting operations
    - Conflicting opinions
    - Too much information need to pass back and
    forth

    View Slide

  48. vtorc
    An orchestrator spin-off, tightly integrated within
    vitess.
    Has direct access to topo
    Is goal oriented. Its mission is to make replication
    clusters converge to vitess’ expected state

    View Slide

  49. vtorc scenarios, superseding orchestrator scenarios

    View Slide

  50. vtorc
    Work in progress
    Future:
    - Custom defined availability/durability rules
    (imply failover rules, semi-sync rules etc.)

    View Slide

  51. Database infrastructure
    Vitess becomes a database infrastructure framework in
    an attempt to reduce overall relational database
    complexity

    View Slide

  52. Resources
    Docs: vitess.io/docs/
    Code: github.com/vitessio/vitess
    Slack: vitess.slack.com

    View Slide

  53. Thank you!
    Questions?
    github.com/shlomi-noach
    @ShlomiNoach

    View Slide