Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Citus 10 Open Source & Columnar Storage for Pos...

Citus 10 Open Source & Columnar Storage for Postgres | contributing today | Claire Giordano & Nils Dijk

Citus 10 is out! A spectacular new release from our Citus open source team. Citus 10 gives you columnar storage for Postgres, Citus on a single node—plus, we’ve open sourced the shard rebalancer. Come see a demo & learn how the Citus extension gives you Postgres at any scale, from a single node to a distributed cluster. And how easy it is to give Citus a try.

Citus Data

March 17, 2021
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. What is Citus? • Distributed tables • Reference tables •

    & more, as of Citus 10 Extension to Postgres (not a fork!) • Add nodes • Rebalance Simplicity & flexibility of using PostgreSQL, at scale • Scale transactional workloads • Scale analytical workloads • Mixed workloads too Multi-purpose:
  2. Why

  3. Why Citus, Reason #1: Postgres limited to single node Capacity

    / execution time issues: § Working set does not fit in memory § Reaching limits of network-attached storage (IOPS) / CPU § Analytical query takes too long § Data transformations are single-threaded (e.g. insert..select) § Autovacuum cannot keep up with transactional workload § …
  4. • Joins • Functions • Constraints • Indexes: B-tree, GIN,

    BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Why Citus, Reason #2: Because Postgres includes:
  5. COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn A

    Citus cluster consists of multiple Postgres nodes with the Citus extension. CREATE EXTENSION citus; SELECT citus_add_node(…); SELECT citus_add_node(…); SELECT citus_add_node(…); CREATE EXTENSION citus; CREATE EXTENSION citus; CREATE EXTENSION citus;
  6. APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns', 'company_id'); METADATA

    COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster
  7. APPLICATION SELECT FROM GROUP BY campaign_id, avg(spend) AS avg_campaign_spend campaigns

    campaign_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_102 … SELECT company_id sum(spend), count(spend) … FROM campaigns_101 … SELECT company_id sum(spend), count(spend) … FROM campaigns_103 … How Citus distributes queries across the database cluster
  8. CREATE TABLE users( id bigserial primary key, name text); SELECT

    create_distributed_table( 'users', 'id’); SELECT count(*) FROM users; easy
  9. CREATE TABLE events( ts timestamptz, i int, n numeric, s

    text); CREATE TABLE events_columnar( ts timestamptz, i int, n numeric, s text) USING columnar;
  10. Citus Columnar && Range Partitioning in Postgres CREATE TABLE events(

    ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); CREATE TABLE events_2021_jan PARTITION OF events FOR VALUES FROM ('2021-01-01') TO ('2021-02-01'); CREATE TABLE events_2021_feb PARTITION OF events FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');
  11. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." aka.ms/blog-petabyte-scale-analytics
  12. Questions? [email protected] [email protected] Citus repo on GitHub aka.ms/citus Citus Public

    Slack for open source Q&A slack.citusdata.com Citus Docs docs.citusdata.com Definitive Citus 10 blog post by Marco aka.ms/citus10 Download Citus open source citusdata.com/download/
  13. If need to scale Postgres, learn more about Citus 10

    As of Citus 10, now includes columnar compression We’ve open sourced the shard rebalancer too & Citus on a single node