Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From resilience to ultra-resilience of data fo...

From resilience to ultra-resilience of data for modern applications

Deep dive into how distributed PostgreSQL is architected to meet the demands of modern cloud-native applications, as well as sharing how real-life customers are using YugabyteDB to power a range of business-critical applications.

AMEY BANARSE

July 22, 2024
Tweet

More Decks by AMEY BANARSE

Other Decks in Technology

Transcript

  1. Amey Banarse VP of Solutions Engineering, YugabyteDB FROM RESILIENCE TO

    ULTRA-RESILIENCE OF DATA FOR MODERN APPLICATIONS
  2. © 2024 – All Rights Reserved seamless Scalability built-in Resilience

    flexible Geo-Distribution Cost Efficiency Run your business-critical applications with using PostgreSQL-compatible & Cassandra-inspired APIs while enjoying without compromising on performance 2
  3. © 2024 – All Rights Reserved PostgreSQL has become the

    default database API ◦ Powerful RDBMS capabilities: matches Oracle features ◦ Robust and mature: hardened over 30 years ◦ Fully open source: permissive license, large community ◦ Cloud providers adopting: managed services on all clouds “Most popular database” of 2022 “DBMS of the year” over multiple years 2017 2018 2020
  4. © 2024 – All Rights Reserved Wire-Protocol Compatibility Syntax Compatibility

    Feature Compatibility Runtime Compatibility Compatible with PG client drivers ✓ ✓ ✓ ✓ Parses PG syntax properly (but execution may be different) ✘ ✓ ✓ ✓ Supports equivalent features (but with different syntax & runtime) ✘ ✘ ✓ ✓ Appears and behaves just like PG to applications ✘ ✘ ✘ ✓ Not all “PostgreSQL Compatibilityˮ is created equal
  5. © 2024 – All Rights Reserved Pluggable Query Layer YCQL

    API Cassandra Compatible YSQL API PostgreSQL Compatible Other APIs (future) Innovative architecture combines best of databases Distributed, Transactional Storage Layer Automatic Sharding Load Balancing Distributed Transactions Raft Consensus On-Premises Datacenters Deploy Anywhere 5
  6. © 2024 – All Rights Reserved 6 YugabyteDB Voyager to

    simplify the Migration Journey Cloud, On-Premise RHEL / CentOS / Ubuntu MacOS / Docker YugabyteDB OSS, Managed, Anywhere PostgreSQL 9.x 11.x Oracle 11g, 12c-19c MySQL 8.x Amazon Aurora Amazon RDS Google Cloud SQL Azure SQL for PG Voyager
  7. © 2024 – All Rights Reserved 7 The ability of

    a system to readily respond to or recover from change, disruption, or a crisis Resilience
  8. © 2024 – All Rights Reserved Modern applications demand ultra-resilience

    Customers expect always-on apps Nations run on digital infrastructure Brand reputation requires uptime
  9. © 2024 – All Rights Reserved Commodity servers fail, network

    interruptions are common More apps as everything is digital and more headless services Unexpected successes can overwhelm systems Resilience to ultra-resilience: what changed? Cloud Native = More Failures Bigger Scale = More Failures Viral Success = More Failures
  10. © 2024 – All Rights Reserved Major cloud outages arenʼt

    uncommon Per quarter outages in Asia Pacific “Outages costing companies more than $1 million has increased from 11% to 15% since 2019.ˮ https://foundershield.com/blog/real-world-statistics-on-managing-cloud-outage-risks/
  11. © 2024 – All Rights Reserved • Infrastructure failures •

    Region and data center outage • User, app or operator error • Upgrades / patching downtime • Intermittent or partial failures • Massive or unexpected Spikes Different failure modes require different elements of resilience In-region resilience Multi-region BCDR Data protection Zero-downtime operations Grey failures Peak and freak events
  12. © 2024 – All Rights Reserved From resilience to ultra-resilience…

    … for no downtime, no limits In-region resilience Multi-region BCDR Zero-downtime operations Data protection Peak and freak events Grey failures
  13. © 2024 – All Rights Reserved Letʼs dive into the

    Real World Examples of ultra-resilience architectures
  14. © 2024 – All Rights Reserved Business Objective: Get Paramount+

    Closer to Their End Users With the anticipated expansion through globalization and release of new services and content, Paramount+ needed a database platform that could perform and scale to support peak demands to provide the best user experience. • Multi-Region/Cloud Deployment ◦ High availability and resilience ◦ Performance at peak scale • Compliance with local laws ◦ Conform to GDPR regulations ◦ Conform to local security laws
  15. © 2024 – All Rights Reserved Peak Event: Super Bowl

    2024 ◦ Use Case ◦ Media live streaming platform ◦ User registrations and entitlement lookup ◦ Peak ◦ CBS Sportsʼ presentation of Super Bowl LVIII was the most-watched telecast in history, with 123.4 million viewers across platforms ◦ Challenges ◦ Massively scaling user entitlements lookup ◦ Resilience ◦ Low latency for users around the world
  16. © 2024 – All Rights Reserved ✓ Prepared for unexpected

    bursts ✓ Built for expected peaks ✓ Surviving DDoS attacks ✓ Flexible expansion, anywhere ✓ Multitenancy ✓ No performance compromise What matters
  17. © 2024 – All Rights Reserved Retailer Weathered a Regional

    Cloud Outage Top 5 Global Retailer ◦ Use Case: ◦ Product catalog for a global top 5 retailer ◦ Over 1.6 billion products ◦ Freak events: ◦ Snowstorm in Texas took out a cloud region ◦ Key Challenges ◦ High availability: Keeping the product catalog up during peak holiday season in spite of the cloud outage ◦ Sustaining high throughput of 250k+ tps
  18. Yugabyte © 2023 – All Rights Reserved Yugabyte © 2023

    – All Rights Reserved 24 Demo: Failover entire Region and show impact on application. 24
  19. © 2024 – All Rights Reserved We are going Global

    - US, EU & APJ • Single YB cluster providing Strong Consistency across multi-region • Scalable and highly available operational data tier • Business continuity, able to withstand Region failure with RPO=0 • Geo-partitioning, Data Locality & Compliance 25
  20. © 2024 – All Rights Reserved Changes, disruptions and crisis

    take many shapes • Infrastructure failures • Region and data center outages • User, app or operator errors • Downtime from upgrades / patching • Intermittent or partial failures • Massive or unexpected spikes
  21. © 2024 – All Rights Reserved Changes, disruptions and crisis

    take many shapes • Infrastructure failures • Region and data center outages • User, app or operator errors • Downtime from upgrades / patching • Intermittent or partial failures • Massive or unexpected spikes TRADITIONAL RESILIENCE Only these 2 failure types are addressed
  22. © 2024 – All Rights Reserved ✓ Protection against region

    / DC outages to ensure business continuity ✓ E.g., power grid failures, natural disasters ✓ Nations are increasingly mandating multi-region resilience through regulatory compliance What is multi-region resilience?
  23. © 2024 – All Rights Reserved • Entire region or

    data center failure—low probability but we see it happen regularly • Failures that last a while • Complex process to “healˮ once the region / DC is back online • Ability to tradeoff between steady-state performance (latency) and potential data loss RPO • Very quick recovery (low RTO • Ability to run DR drills - planned switchover What can go wrong… What you want…
  24. © 2024 – All Rights Reserved 30 Thank You Join

    us on Slack: www.yugabyte.com/slack Star us on GitHub: github.com/yugabyte/yugabyte-db