Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Stateful Services or How to get 1 million RPS from a single node

Intro to Stateful Services or How to get 1 million RPS from a single node

Anton Moldovan

November 30, 2023
Tweet

More Decks by Anton Moldovan

Other Decks in Programming

Transcript

  1. Intro to Stateful Services
    by Anton Moldovan
    (@antyadev)

    View full-size slide

  2. https://nbomber.com

    View full-size slide

  3. https://github.com/stereodb

    View full-size slide

  4. AGENDA
    - Intro to Stateful Services
    - Why stateless is slow and less reliable
    - Tools for building stateful services

    View full-size slide

  5. DISCLAIMER
    Stateful has limited applicability but is worth looking into.
    Usually, it's high-loaded systems with very predictable
    dataset/memory footprint.

    View full-size slide

  6. PART I
    intro to sportsbook domain
    and
    how we come to stateful

    View full-size slide

  7. Real Madrid vs Chelsea
    2 : 1
    Red card
    Score
    changed
    Odds
    changed
    PUSH
    PULL

    View full-size slide

  8. Real Madrid vs Chelsea
    2 : 1
    Red card
    Score
    changed
    Odds
    changed
    PUSH
    PULL
    - quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
    - update rate: 2K RPS (per tenant)
    - user query rate: 3-4K RPS (per tenant)
    - live data is very dynamic: no much sense to cache it
    - data should be queryable: simple KV is not enough
    - we need secondary indexes

    View full-size slide

  9. Real Madrid vs Chelsea
    2 : 1
    Red card
    Score
    changed
    Odds
    changed
    PUSH
    PULL
    - quite big payloads: 30 KB compressed data (1.5 MB uncompressed)
    - update rate: 2K RPS (per tenant)
    - user query rate: 3-4K RPS (per tenant)
    - live data is very dynamic: no much sense to cache it
    - data should be queryable: simple KV is not enough
    - we need secondary indexes
    At pick to handle big load for 1 tenant we have:
    5-10 nodes, 0.5-2 CPU, 6GB RAM

    View full-size slide

  10. 20 KB payload for concurrent read and write
    REDIS, single node: 4vcpu - 8gb
    redis_write: 4K RPS, p99 = 842 ms
    redis read: 7K RPS, p99 = 1597 ms

    View full-size slide

  11. API API API
    DB
    classical stateless architecture

    View full-size slide

  12. API
    Cache
    DB
    but Cache is not queryable

    View full-size slide

  13. API + DB
    Stateful Service

    View full-size slide

  14. state
    state
    state
    Stateful Service
    Stateful Service
    Stateful Service
    How do we keep a state
    synchronized between nodes?

    View full-size slide

  15. state
    state
    state

    View full-size slide

  16. CDC (Debezium)
    DB
    state
    state
    state

    View full-size slide

  17. How to handle a case when
    your data is larger than RAM?
    10 GB 30 GB
    state

    View full-size slide

  18. Solution 1: use memory DB that supports data larger than RAM
    10 GB
    20 GB
    state

    View full-size slide

  19. UA PL US
    Solution 2: use partition by tenant

    View full-size slide

  20. Solution 3: use range-based sharding
    users
    (1-500)
    users
    (501-1000)
    shard A shard B

    View full-size slide

  21. PART II
    why stateless is slow

    View full-size slide

  22. API
    Cache
    DB
    Basic request handling:
    1) Get record by ID (Network cost)
    2) Deserialize compressed record payload (CPU cost)
    3) Filter, enrich record, build projection
    4) Serialize response payload (CPU cost)
    5) Send the response to the user
    CPU cost
    for serialization

    View full-size slide

  23. Basic request handling:
    1) Query record by ID or build projection
    2) Serialize response payload (CPU cost)
    3) Send the response to the user
    API + DB
    Stateful Service
    CPU cost
    for serialization

    View full-size slide

  24. 1) get by ID (full record)
    2) receive small projection
    API
    Cache
    DB
    Overreads
    mobile client

    View full-size slide

  25. Latency Numbers
    Changes over years 2010 2020
    Compress 1KB with Zippy 2μs 2μs
    Read 1 MB sequentially from RAM 30μs 3μs
    Read 1 MB sequentially from SSD 494μs 49μs
    Read 1 MB sequentially from disk 3ms 825μs
    Round trip within same datacenter 500μs 500μs
    Send packet CA -> Netherlands -> CA 150ms 150ms
    https://colin-scott.github.io/personal_website/research/interactive_latency.html

    View full-size slide

  26. API + DB
    Stateful Service
    API
    Cache
    DB
    1) try to get record by ID
    2) get record from DB
    3) deserialize record
    4) serialize record and insert to Cache
    Network Latency

    View full-size slide

  27. Object hit rate vs Transactional hit rate

    View full-size slide

  28. A B
    C
    API
    In order to fulfill our transactional flow we
    need to fetch records: A, B, C
    Record A and B will not impact our latency
    Overall Latency = Latency of record C

    View full-size slide

  29. Most existing cache eviction algorithms focus on maximizing
    object hit rate, or the fraction of single object requests served
    from cache. However, this approach fails to capture the
    inter-object dependencies within transactions.

    View full-size slide

  30. async / await

    View full-size slide

  31. public void SimpleMethod()
    {
    var k = 0;
    for (int i = 0; i < Iterations; i++)
    {
    k = Add(i, i);
    }
    }
    [MethodImpl(MethodImplOptions.NoInlining)]
    private int Add(int a, int b) => a + b;

    View full-size slide

  32. public async Task SimpleMethodAsync()
    {
    var k = 0;
    for (int i = 0; i < Iterations; i++)
    {
    k = await AddAsync(i, i);
    }
    }
    private Task AddAsync(int a, int b)
    {
    return Task.FromResult(a + b);
    }

    View full-size slide

  33. public async Task SimpleMethodAsyncYield()
    {
    var k = 0;
    for (int i = 0; i < Iterations; i++)
    {
    k = await AddAsync(i, i);
    }
    }
    private async Task AddAsync(int a, int b)
    {
    await Task.Yield();
    return a + b;
    }

    View full-size slide

  34. public async Task SimpleMethodAsyncYield()
    {
    var k = 0;
    for (int i = 0; i < Iterations; i++)
    {
    k = await AddAsync(i, i);
    }
    }
    private async Task AddAsync(int a, int b)
    {
    await Task.Yield();
    return await Task.Run(() => a + b);
    }

    View full-size slide

  35. PART III
    why stateless is less reliable

    View full-size slide

  36. API
    Cache
    DB
    API + DB
    Stateful Service
    We have a higher probability of failure

    View full-size slide

  37. API
    Cache
    DB
    circuit breaker
    retry
    fallback
    timeout
    bulkhead isolation
    circuit breaker
    retry
    fallback
    timeout
    bulkhead isolation
    API + DB
    Stateful Service

    View full-size slide

  38. At least 4 out of 15 major outages in the
    last decade at Amazon Web Services
    were caused by metastable failures.

    View full-size slide

  39. - Metastable failures occur in open systems with an uncontrolled source of
    load where a trigger causes the system to enter a bad state that persists
    even when the trigger is removed.
    - Paradoxically, the root cause of these failures is often features that
    improve the efficiency or reliability of the system.
    - The characteristic of a metastable failure is that the sustaining effect keeps
    the system in the metastable failure state even after the trigger is
    removed.

    View full-size slide

  40. API
    Cache
    DB
    What about cache invalidation
    and data consistency?
    API + DB
    Stateful Service

    View full-size slide

  41. API
    Cache
    DB
    What about the predictable scale-out?
    Will your RPS increase if you add an
    additional API or Cache node?
    API + DB
    Stateful Service

    View full-size slide

  42. PART IV
    tools for building stateful services

    View full-size slide

  43. distributed log with sync replication

    View full-size slide

  44. In-process memory DB
    SQL OLAP

    View full-size slide

  45. StereoDB Benchmarks
    Pure KV workload benchmark (in-process only, without
    persistence).
    In this benchmark, we run concurrently:
    3 million random reads and 100K random writes in 892 ms.

    View full-size slide

  46. Transactions
    StereoDB transactions allow the execution of a group of commands in a single step.
    StereoDB provides Read-Only and Read-Write transactions.
    ● Read-Only allows you only read data. Also, they are multithreaded.
    ● Read-Write allows you read and write data. They are running in a single-thread
    fashion.
    What to expect from transactions in StereoDB:
    ● they are blazingly fast and cheap to execute.
    ● they guarantee you atomic and consistent updates (you can update several
    tables including secondary indexes in one transaction and no other concurrent
    transaction will read your data partially; the transaction cannot be observed to be
    in progress by another database client).
    ● they don't support rollback since supporting rollbacks would have a significant
    impact on the simplicity and performance of StereoDB.

    View full-size slide

  47. // defines a Book type that implements IEntity
    public record Book : IEntity
    {
    public int Id { get; init; }
    public string Title { get; init; }
    public int Quantity { get; init; }
    }
    public class BooksSchema
    {
    public ITable Table { get; init; }
    }

    View full-size slide

  48. var db = StereoDb.Create(new BooksSchema());
    // WriteTransaction: it's a read-write transaction:
    // we can query and mutate data
    db.WriteTransaction(ctx =>
    {
    var books = ctx.UseTable(ctx.Schema.Books.Table);
    foreach (var id in Enumerable.Range(0, 10))
    {
    var book = new Book {Id = id, Title = $"book_{id}", Quantity = 1};
    books.Set(book);
    }
    });

    View full-size slide

  49. // ReadTransaction: it's a read-only transaction:
    // we can query multiple tables at once
    var bookId = 42;
    var result = db.ReadTransaction(ctx =>
    {
    var books = ctx.UseTable(ctx.Schema.Books.Table);
    return books.TryGet(1, out var book)
    ? book
    : null;
    });

    View full-size slide

  50. var result = db.ReadTransaction(ctx =>
    {
    var books = ctx.UseTable(ctx.Schema.Books.Table);
    var bookIdIndex = ctx.Schema.Orders.BookIdIndex;
    var quantityIndex = ctx.Schema.Orders.QuantityRangeIndex;
    // example of RangeScanIndex
    var booksRange = quantityIndex.SelectRange(0, 5).ToArray();
    // example of ValueIndex
    if (books.TryGet(1, out var book))
    {
    var orders = bookIdIndex.Find(book.Id).ToArray();
    return (book, orders);
    }
    return (null, null);
    });

    View full-size slide

  51. var result = db.ReadTransaction(ctx =>
    {
    var books = ctx.UseTable(ctx.Schema.Books.Table);
    var bookIdIndex = ctx.Schema.Orders.BookIdIndex;
    var quantityIndex = ctx.Schema.Orders.QuantityRangeIndex;
    // example of RangeScanIndex
    var booksRange = quantityIndex.SelectRange(0, 5).ToArray();
    // example of ValueIndex
    if (books.TryGet(1, out var book))
    {
    var orders = bookIdIndex.Find(book.Id).ToArray();
    return (book, orders);
    }
    return (null, null);
    });

    View full-size slide

  52. let result = db.ExecSql "SELECT Id, Quantity FROM Books WHERE Id <= 3"
    db.ExecSql "UPDATE Books SET Quantity = 222 WHERE Id = 8"
    db.ExecSql "DELETE FROM Books WHERE Id = 7"

    View full-size slide

  53. LINKS
    Fast key-value stores: An idea whose time has come and gone
    https://dl.acm.org/doi/pdf/10.1145/3317550.3321434
    Maximizing Transactional Cache Hit Rate
    https://www.usenix.org/system/files/osdi23-cheng.pdf
    Metastable Failures in Distributed Systems
    https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s11-bronson.pdf
    Faster: A Concurrent Key-Value Store with In-Place Updates
    https://www.microsoft.com/en-us/research/uploads/prod/2018/03/faster-sigmod18.pdf
    StereoDB
    https://github.com/StereoDB/StereoDB

    View full-size slide

  54. THANKS
    always benchmark
    https://twitter.com/antyadev

    View full-size slide