$30 off During Our Annual Pro Sale. View Details »

its-time-for-relational-databases-to-focus-on-developers_shlomi-noach.pdf

 its-time-for-relational-databases-to-focus-on-developers_shlomi-noach.pdf

The relational model is one of the oldest surviving models in computer science. And while it is still popular, it is gradually losing the developers’ good opinion. While relational databases have invested efforts in optimizing speed, latency and overall read and write performance, they have not adapted to the evolving change in developer’s needs.

The relational model comes with an operational complexity, and schema changes remain an operational blocker, outside the domain and ownership of the very developers who plan those changes. As a result, developers perceive relational databases unlike any other production system and seek ways to avoid schema changes.

Exactly what went wrong? And what would it take for relational databases to close the gap and regain the developer’s confidence? This session highlights the problems around making and deploying schema changes. We discuss scheduling, operational conflicts, production impact and blast radius, blue/green deployments, idempotency and more. We provide a paradigm that relational databases can follow, illustrated by Vitess, an OSS CNCF project which implements that paradigm.

Shlomi Noach

October 25, 2022
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. It’s time for relational databases to
    focus on developers
    Shlomi Noach
    Reversim Summit 2022

    View Slide

  2. The relational model

    View Slide

  3. ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128) NOT NULL,
    DROP KEY instance_tag_idx
    DROP TABLE instance_tags
    Schema changes (DDLs)

    View Slide

  4. About me
    Engineer at PlanetScale; previously at GitHub,
    Booking.com, Outbrain
    Author of open source projects orchestrator, gh-ost,
    freno and others
    Maintainer for Vitess
    Blog at http://openark.org
    github.com/shlomi-noach
    @ShlomiNoach

    View Slide

  5. Vitess
    A database clustering system for horizontal scaling of
    MySQL
    ● CNCF graduated project
    ● Open source, Apache 2.0 licence
    ● Contributors from around the community
    ● Originally developed at YouTube to mass scale their
    load

    View Slide

  6. MySQL-compatible serverless database platform, built
    for developers
    Main sponsors for Vitess

    View Slide

  7. ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128) NOT NULL,
    DROP KEY instance_tag_idx
    DROP TABLE instance_tags
    Avoiding schema changes

    View Slide

  8. Why?

    View Slide

  9. Interacting with the
    relational database

    View Slide

  10. Schema change as an operation
    The relational database does not have a concept of
    developers. It only has the concept of operations people.

    View Slide

  11. Schema change as an operation

    View Slide

  12. Schema change as a deployment
    We should apply our deployment terminology and
    practices to schema changes

    View Slide

  13. Operational time
    ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128) NOT NULL;
    [Three days later…]
    Query OK, 0 rows affected (70h 12m 56s)

    View Slide

  14. Operational overhead

    View Slide

  15. External online schema change
    tools
    ● pt-online-schema-change
    ● gh-ost
    ● pg-osc

    View Slide

  16. app

    View Slide

  17. app

    View Slide

  18. app
    Row copy

    View Slide

  19. app
    Row copy

    View Slide

  20. app
    Row copy
    Apply changes

    View Slide

  21. app

    View Slide

  22. app

    View Slide

  23. app

    View Slide

  24. Operational conflicts

    View Slide

  25. View Slide

  26. Blast radius

    View Slide

  27. user

    View Slide

  28. user
    10%
    deployment

    View Slide

  29. user
    20%
    deployment

    View Slide

  30. user
    50%
    deployment

    View Slide

  31. user
    100%
    deployment

    View Slide

  32. app

    View Slide

  33. app

    View Slide

  34. Blue-green deployments

    View Slide

  35. user

    View Slide

  36. user

    View Slide

  37. user

    View Slide

  38. user

    View Slide

  39. user
    ?
    ?
    ?
    ?

    View Slide

  40. Undeployment (revert)

    View Slide

  41. app

    View Slide

  42. app

    View Slide

  43. app

    View Slide

  44. Apply changes
    app

    View Slide

  45. Apply changes
    app
    sync

    View Slide

  46. Apply changes
    sync
    app

    View Slide

  47. Apply changes
    app

    View Slide

  48. Apply changes
    app

    View Slide

  49. Undeploy via
    blue-green deployments

    View Slide

  50. Undeploy as first class citizen

    View Slide

  51. Redeployment

    View Slide

  52. CREATE TABLE IF NOT EXISTS node_health (
    hostname varchar(128)
    CHARACTER SET ascii NOT NULL,
    token varchar(128) NOT NULL,
    last_seen_active timestamp NOT NULL
    DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (hostname, token)
    ) ENGINE=InnoDB DEFAULT CHARSET=ascii
    “Base schema”

    View Slide

  53. ALTER TABLE node_health
    DROP PRIMARY KEY,
    ADD PRIMARY KEY (hostname, token)
    ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128)
    CHARACTER SET utf8 NOT NULL
    ALTER TABLE node_health
    MODIFY last_seen_active timestamp NOT NULL
    DEFAULT CURRENT_TIMESTAMP
    “patches”

    View Slide

  54. if flags2&BinlogThroughGTID != 0 {
    dataSize, pos, ok := readUint32(data, pos)
    if !ok {
    return logFile, logPos, position, readPacketErr
    }
    if gtid := string(data[pos : pos+int(dataSize)]); gtid != "" {
    position, err = DecodePosition(gtid)
    if err != nil {
    return logFile, logPos, position, err
    }
    }
    }
    return logFile, logPos, position, nil
    Source code

    View Slide

  55. View Slide

  56. ALTER TABLE node_health
    DROP PRIMARY KEY,
    ADD PRIMARY KEY (hostname, token)
    ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128)
    CHARACTER SET utf8 NOT NULL
    ALTER TABLE node_health
    MODIFY last_seen_active timestamp NOT NULL
    DEFAULT CURRENT_TIMESTAMP
    Human calculated diff

    View Slide

  57. CREATE TABLE IF NOT EXISTS node_health (
    hostname varchar(128)
    CHARACTER SET ascii NOT NULL,
    token varchar(128) NOT NULL,
    last_seen_active timestamp NOT NULL
    DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (hostname, token)
    ) ENGINE=InnoDB DEFAULT CHARSET=ascii
    Declarative schema

    View Slide

  58. CREATE TABLE IF NOT EXISTS node_health (
    hostname varchar(128)
    CHARACTER SET ascii NOT NULL,
    token varchar(128) NOT NULL,
    last_seen_active timestamp NOT NULL
    DEFAULT CURRENT_TIMESTAMP,
    extra_info varchar(128)
    CHARACTER SET utf8 NOT NULL
    PRIMARY KEY (hostname, token)
    ) ENGINE=InnoDB DEFAULT CHARSET=ascii
    Declarative schema

    View Slide

  59. ALTER TABLE node_health
    ADD COLUMN extra_info varchar(128)
    CHARACTER SET utf8 NOT NULL
    Imperative deployments
    via DDL statements

    View Slide

  60. Idempotent deployments

    View Slide

  61. More to discuss
    Asynchronous and scheduled deployments, job
    management, load shedding and throttling,
    discoverability, multi change deployments, multi target
    deployments, artifacts and garbage collection

    View Slide

  62. How did we let this happen?

    View Slide

  63. What’s next?

    View Slide

  64. Thank you!
    @ShlomiNoach
    github.com/shlomi-noach
    Images source: https://www.pexels.com/

    View Slide