$30 off During Our Annual Pro Sale. View Details »

Automating schema migration flow with GitHub Actions, skeema & gh-ost

Shlomi Noach
February 01, 2020

Automating schema migration flow with GitHub Actions, skeema & gh-ost

Schema migration is more than running an ALTER TABLE. It is about designing, reviewing, approving, queuing, scheduling, executing, auditing, controlling and versioning the changes.

At GitHub we run multiple migrations per day, and much of this flow used to be manual, taking a significant toll from the databases team. In this session we illustrate how we automated away migration using free and open source solutions, and based on trusted development flow.

We highlight the use of the skeema tool, with GitHub Actions, git flow and gh-ost.

Shlomi Noach

February 01, 2020
Tweet

More Decks by Shlomi Noach

Other Decks in Technology

Transcript

  1. Automating Schema
    Migrations
    with GitHub Actions, skeema & gh-ost
    Shlomi Noach
    GitHub
    FOSDEM 2020

    View Slide

  2. About me
    @github/database-infrastructure
    Author of orchestrator, gh-ost, freno, ccql
    and others.
    Blog at http://openark.org

    github.com/shlomi-noach

    @ShlomiNoach

    View Slide

  3. GitHub

    Built for developers
    Busy and growing
    40M+ developers

    3M organizations

    44M repositories created in the past year
    Actions, Packages, 

    Security Advisories & Updates, 

    Code Navigation & Search,

    Notifications, Sponsors,

    Mobile, …

    View Slide

  4. Incentive
    With MySQL as the backend, new features imply schema
    changes: new tables, new columns, index changes, iterative
    schema changes, experiments.

    View Slide

  5. What’s in a migration?
    More than CREATE, ALTER or DROP TABLE

    View Slide

  6. What’s in a migration?
    Designing, coding, local testing, review process, queueing,
    scheduling, executing, controlling, auditing, versioning…

    View Slide

  7. Ownership
    Developer

    Developer

    Peer

    DBA

    DBA

    DBA

    DBA

    DBA

    DBA/SRE

    DBA

    DBA

    DBA

    Developer

    Developer
    Design code + schema change

    Publish

    Review

    Review

    Formalize statement/command

    Locate

    Schedule

    Run

    Audit/monitor/control

    Cut-over/complete migration

    Cleanup

    Notify

    Deploy

    Merge

    View Slide

  8. Ownership, our previous state
    Developer

    Developer

    Peer

    DBA

    DBA

    DBA

    DBA

    DBA

    gh-ost/chatops

    DBA

    DBA

    DBA

    DBA

    DBA
    Design code + schema change

    Publish

    Review

    Review

    Formalize statement/command

    Locate

    Schedule

    Run

    Audit/monitor/control

    Cut-over/complete migration

    Cleanup

    Notify

    Deploy

    Merge

    View Slide

  9. A complex flow
    Multiple domains (code, MySQL, production, communication).
    Multiple environments (dev, production).
    Multiple owners (devs, DBAs, SREs).

    View Slide

  10. A combinatory solution
    Loosely coupled, independent components.
    Each solves an aspect of the problem.
    Orchestrated to create an automated flow.

    View Slide

  11. Code
    A schema change should be presented as code.
    Coupled with application code.
    Versioned.

    View Slide

  12. Code
    At GitHub, we use git.
    We also happen to author GitHub for code hosting, versioning
    and management.

    View Slide

  13. Pull Request
    The change
    Review
    CI
    Discussion

    View Slide

  14. Code:

    New PR

    View Slide

  15. Code: 

    New PR

    View Slide

  16. What’s the migration?
    From code to SQL statement

    View Slide

  17. skeema
    https://www.skeema.io/
    https://github.com/skeema/skeema
    Open source
    Developed by Evan Elias

    View Slide

  18. schema/

    .skeema

    my_schema1/

    .skeema

    some_table.sql

    another_table.sql

    my_schema2/

    .skeema

    foo.sql

    bar.sql

    View Slide

  19. $ cat .skeema


    [skeema-diff-ci]

    host=127.0.0.1

    port=3306

    user=root

    View Slide

  20. $ cat .my_schema1/some_table.sql


    CREATE TABLE `some_table` (

    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

    `hostname` varchar(128) NOT NULL,

    `time_created` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,

    `time_updated` datetime DEFAULT NULL,

    `random_hash` char(40) CHARACTER SET ascii DEFAULT NULL,

    PRIMARY KEY (`id`)

    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    View Slide

  21. $ skeema push env-name

    # Connects to MySQL server, applies filesystem schema onto
    database
    $ skeema pull env-name

    # Imports schema definition from MySQL onto filesystem

    $ skeema diff env-name

    # Prints schema migrations changing the state of MySQL server to
    filesystem


    View Slide

  22. skeema
    Where?

    View Slide

  23. GitHub Actions
    Kick off workflows with GitHub events like push, issue creation, or
    a new release.
    An action runs in a container on GitHub’s infrastructure (default).
    Action has repository’s context and can operate on the repository.

    View Slide

  24. GitHub Actions
    Run skeema from within Action.
    Fetch skeema as part of Action flow.

    View Slide

  25. GitHub Action: skeema-diff, simplified
    skeema-diff:

    runs-on: ubuntu-latest

    steps:

    - uses: actions/checkout@v2

    with:

    ref: master

    - name: push master schema to MySQL

    env:

    MYSQL_PWD: root

    run: |

    skeema push skeema-diff-ci

    - uses: actions/checkout@v2

    - name: skeema diff

    skeema diff skeema-diff-ci --allow-unsafe

    View Slide

  26. GitHub Action: skeema-diff
    - uses: actions/checkout@v2

    with:

    ref: ${{ github.event.pull_request.base.sha }}

    - name: push master schema to MySQL

    …

    - uses: actions/checkout@v2

    with:

    ref: ${{ github.event.pull_request.head.sha }}

    - name: skeema diff

    /tmp/skeema-ci/skeema push skeema-diff-ci --allow-unsafe --ddl-
    wrapper='echo "\n-- skeema:ddl:begin\n"{DDL}";\n-- skeema:ddl:end"' | sed
    -e 's/^USE /-- skeema:ddl:use /g' | sed -n '/^-- skeema:ddl:use /p;/^--
    skeema:ddl:begin/,/^-- skeema:ddl:end/p' | tee /tmp/skeema-ci/skeema-
    diff.sql

    View Slide

  27. Action/

    skeema

    View Slide

  28. Action/

    skeema

    View Slide

  29. gh-ost
    GitHub’s online schema migration tool.
    Low-impact (nearly no-impact) in production.
    Auditable, configurable, controllable.
    Open source
    https://github.com/github/gh-ost/

    https://github.blog/2016-08-01-gh-ost-github-s-online-migration-tool-for-mysql/

    https://speakerdeck.com/shlominoach/githubs-online-schema-migrations-for-mysql

    View Slide

  30. skeefree
    Name coined by Tom Krouper
    A service to orchestrate the flow: PR, Actions/skeema, gh-ost.
    Developed internally at GitHub

    View Slide

  31. skeefree
    Probes and detects schema change PRs
    Analyzes skeema changes
    Initiates and follows up on review/approval status
    Schedules the migration
    Runs the migration (gh-ost/direct)
    Follows up and reports on PR

    View Slide

  32. Flow: 

    New PR

    View Slide

  33. Flow: 

    New PR

    View Slide

  34. Flow: 

    CI build

    View Slide

  35. Flow: 

    skeema

    analysis

    View Slide

  36. Flow: 

    review

    & label

    View Slide

  37. Flow: 

    skeefree
    analysis

    View Slide

  38. Flow: 

    Review request

    View Slide

  39. Flow: 

    migration

    execution

    View Slide

  40. Flow: 

    deploy

    & merge

    View Slide

  41. Ownership: skeefree
    Developer

    Developer

    Peer

    DBA

    skeema/CI

    skeefree

    skeefree

    skeefree

    gh-ost/chatops

    skeefree

    GC

    skeefree

    Developer

    Developer
    Design code + schema change

    Publish

    Review

    Review

    Formalize statement/command

    Locate

    Schedule

    Run

    Audit/monitor/control

    Cut-over/complete migration

    Cleanup

    Notify

    Deploy

    Merge

    View Slide

  42. Impact
    Database team work reduced to minutes per week.
    Developers have visibility into status. Get notified on their PR.
    Better time utilization; migrations start executing as soon as
    possible, not based on a human availability.

    View Slide

  43. skeefree
    Uses internal services in GitHub’s infrastructure
    • Inventory service
    • MySQL discovery service
    • Chatops integration
    • Internal libraries (e.g. logging)

    View Slide

  44. Open Source
    skeefree is coupled with GitHub’s infrastructure:
    - Inventory service

    - MySQL discovery

    - Chat/chatops
    We nonetheless hope that the community finds it useful and are
    releasing it in partial state.
    Release to be announced.

    View Slide

  45. Questions?
    github.com/shlomi-noach
    @ShlomiNoach
    Thank you!

    View Slide