Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-tenant SaaS apps made simple on Azure Cos...

Multi-tenant SaaS apps made simple on Azure Cosmos DB for PostgreSQL | Citus Con 2023 | Adam Wolk & Alicja Kucjarcyzk

Cloud computing has given rise to a massive new market of SaaS applications built by ISV's and enterprises, however architecting these multi-tenant SaaS apps to optimize for underlying infrastructure cost—whilst still providing guarantees of isolation and performance—is hard.

In this session, Adam and Alicja will showcase a number of unique features in Azure Cosmos DB for PostgreSQL, a managed service powered by the open source Citus database extension. These Azure Cosmos DB for PostgreSQL features enable developers to build and scale multi-tenant SaaS apps easily with a variety of tenant management features built natively into the database.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Multi-tenant SaaS apps made simple on Azure Cosmos DB for

    PostgreSQL Adam Wołk, Senior Program Manager Alicja Kucharczyk, Senior Program Manager @ Citus Con: An Event for Postgres 2023
  2. Multi-tenant SaaS • Same software • Same infrastructure • Multiple

    businesses/customers Customer value: Ease of operation Operator margin: Efficient use of hardware DNS
  3. Row based vs Schema Based vs Database Sharding • 100k

    tenants per 16vCPU node • Adjust schema • Modify application queries • 4-5k tenants per 16vCPU node • No schema changes • No query changes • 100 tenants per 16vCPU node • No schema changes • No query changes Best hardware utilization Easy lift & shift Highest operational cost
  4. Terminology Coordinator – Stores Metadata. Node which application connects to.

    Worker / Data nodes – Nodes which store data in form of shards. Sharding – Process of dividing data among nodes. Shards – A partition of the data containing a subset of rows.
  5. Pick a distribution key • Good (high) cardinality • Filters

    reads to a single node • Fans out writes to multiple nodes Bad vs Good country vs unique_carrier dest or origin airport vs unique_carrier timestamp vs device_id
  6. Distributed Tables Definition: • Tables that are sharded. Classification: •

    Large tables (>10GB) – shard on same key (may require addition of shard key) • All tables are be co-located • Enables localized and fast joins on workers • Ex: transactions, events etc SELECT create_distributed_table(table_name, column_name);
  7. Reference Tables Definition: • Replicated to all the nodes (extra

    latency) Classification: • Small tables < 10GB • Efficient joins with distributed tables • Cannot have sharding dimension • Ex: countries, categories SELECT create_reference_table(table_name);
  8. Local tables • Plain Postgres tables on the coordinator node.

    • Admin Tables that don’t interact with main tables • Separate micro-service that doesn’t need sharding
  9. Monitor – upcoming in Citus 11.3 • Query counts per

    tenant • CPU utilization per tenant 10 80 80 50 50 70 30 1k 2K 100k 1k 99k 0.5k 0.5k
  10. Rebalance { "tasks": [ { "LSN": { "lag": null, "source":

    "0/731ACC0", "target": null }, "size": { "source": "14 MB", "target": "5440 kB" }, "hosts": { "source": "localhost:9702", "target": "localhost:9700" }, "phase": "Copying Data", "state": "running", "command": "SELECT pg_catalog.citus_move_shard_placement(102020,2,3,'block_writes')", "message": "", "retried": 0, "task_id": 10 } ], "task_state_counts": { "done": 9, "blocked": 1, "running": 1 } } select * from citus_rebalance_start(); select jsonb_pretty(details) from citus_rebalance_status();