Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Citus Hyperscale Overview | CI&T Data Week | Le...

Citus Hyperscale Overview | CI&T Data Week | Leandro Santana

In this talk, Leandro explains how to scale Postgres on Azure by using the Citus extension. Citus is a powerful extension to transform PostgreSQL into a distributed database. He covers the shard rebalancer, columnar storage, and much more.

Citus Data

July 23, 2021
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. PostgreSQL is more popular than ever PostgreSQL at core of

    many digital transformations Open source Proven resilience & stability Rich feature set enterprise-ready • Zero data loss • Rich indexing and data types, including geospatial • Low TCO. No Lock-in • Innovate faster with broad ecosystem
  2. Single Server Fully-managed, single-node PostgreSQL Hyperscale (Citus) High-performance Postgres for

    scale out Enterprise-ready, fully managed community PostgreSQL with built-in HA and multi-layered security
  3. Monitor Azure Database for PostgreSQL with Azure Monitor Azure Database

    for PostgreSQL tracks performance metrics and logs telemetry data Collect, analyze, and take action on telemetry data gathered from your database using Azure Monitor Better understand your apps with deep insights into app and database behavior, view alerts, and build remediation plans Azure Monitor Visualize Analyze Respond Integrate Insights
  4. Migrating to Azure Database for PostgreSQL Opportunity DBAs spending time

    maintaining their databases, not focused on their apps On premises PostgreSQL VM PostgreSQL Other Managed Services Azure Database Migration Services Azure Database for PostgreSQL Solution Azure Database for PostgreSQL takes care of patching, high availability, and performance management Provides added benefits of comprehensive, intelligent security and compliance, including threat alerts
  5. Migration from Oracle to Azure Database for PostgreSQL Single instance

    of Oracle Opportunity Oracle license cost is expensive and additional features (such as partitioning and high availability) come at added cost Solution Migrating to Azure Database for PostgreSQL removes need for licenses as the customer can take advantage of open source PostgreSQL Azure Database for PostgreSQL Azure Database Migration Services
  6. What is Citus? • Distributed tables • Reference tables •

    & more, as of Citus 10 Extension to Postgres (not a fork!) • Add nodes • Rebalance Simplicity & flexibility of using PostgreSQL, at scale • Scale transactional workloads • Scale analytical workloads • Mixed workloads too Multi-purpose:
  7. Why

  8. • Joins • Functions • Constraints • Indexes: B-tree, GIN,

    BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Citus
  9. Citus Coordinator Choosing the distribution column for Multi-Tenant scenarios SELECT

    create_distributed_table('companies', 'id’); SELECT create_distributed_table('campaigns', 'company_id’); SELECT create_distributed_table('ads', 'company_id’); SELECT create_distributed_table('clicks', 'company_id’); SELECT create_distributed_table('impressions', 'company_id');
  10. Citus Coordinator Choosing the distribution column for Real-Time scenarios SELECT

    create_distributed_table('github_users', 'user_id’); SELECT create_distributed_table('github_events', 'user_id');
  11. Rest of the tables can be reference tables or local

    table SELECT create_reference_table( 'table_name')
  12. APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR

    NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster
  13. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … Query across distribution columns.
  14. Transactions routed to a single node BEGIN; UPDATE SET WHERE

    COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES
  15. Node A Node B 1 2 4 5 3 6

    Node C Node A Node B Node C 1 2 4 5 3 6
  16. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." aka.ms/blog-petabyte-scale-analytics
  17. CREATE TABLE events_columnar( ts timestamptz, i int, n numeric, s

    text) USING columnar; It’s as simple as “USING columnar;”
  18. Citus Columnar && Range Partitioning in Postgres CREATE TABLE events(

    ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); CREATE TABLE events_2021_jan PARTITION OF events FOR VALUES FROM ('2021-01-01') TO ('2021-02-01'); CREATE TABLE events_2021_feb PARTITION OF events FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');