Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Billion Records from SQL to Cassandra, lessons ...

Billion Records from SQL to Cassandra, lessons learned

30min talk at the #CassandraSummit London 2014 with @BriceDutheil and @DoanDuyHai

At Libon, we process a lot of data. A lot means billions of messages/calls and billions of backed up contacts data. All the messaging backup is done in Cassandra with read & write latency below 10ms. Since we are happy with the level of performance achieved with Cassandra, we decided to migrate the contacts backup as well into Cassandra. During this session, we'll see how to tackle a migration project, from the SQL and Hibernate couple to Cassandra and Achilles. We'll highlight some mistakes to avoid and some tricks to make the migration smoother and more resilient.

Presentation video : https://youtu.be/U6oa3Tsdtp4

Brice Dutheil

December 04, 2014
Tweet

More Decks by Brice Dutheil

Other Decks in Programming

Transcript

  1. #CassandraSummit @doanduyhai @BriceDutheil Who are we ? Brice Dutheil Mockito

    Java Track Lead @ Devoxx France Independant contractor @ Libon (Orange-Vallée) DuyHai Doan Achilles Cassandra Technical Advocate Former Java Developer @ Libon 2
  2. #CassandraSummit @doanduyhai @BriceDutheil Agenda •  Libon context •  Migration strategy

    •  Business code migration •  Data Modeling •  Take Away 3
  3. #CassandraSummit @doanduyhai @BriceDutheil What is Libon ? •  Messaging app

    •  VOIP (out) •  Custom voicemail & greetings •  SMS/chat/file transfer •  Contacts matching 5
  4. #CassandraSummit @doanduyhai @BriceDutheil Project Context •  Application grew over the

    years •  Already using Cassandra to handle events •  messaging / file sharing / SMS / notifications •  Cassandra R/W latencies ≈ 0,4 ms •  server response time under 10 ms 11
  5. #CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … • 

    stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts 14
  6. #CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … • 

    stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ‛ billions of contacts to handle 15
  7. #CassandraSummit @doanduyhai @BriceDutheil Project Context •  About contacts … • 

    stored as relational model in RDBMS (Oracle) •  1 user ≈ 300 contacts •  with millions users ‛ billions of contacts to handle •  query latency unpredictable 16
  8. #CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS

    •  indices •  partitioning •  less joins, simplified relational model 21
  9. #CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS

    •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased 22
  10. #CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS

    •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased That worked 23
  11. #CassandraSummit @doanduyhai @BriceDutheil Fixing the problem •  Tune the RDBMS

    •  indices •  partitioning •  less joins, simplified relational model •  hardware capacity increased That worked but … 24
  12. #CassandraSummit @doanduyhai @BriceDutheil Next Challenges •  High Availability (DB failure,

    site failure …) •  Predictable performance at scale •  Going to multi data-centers 28
  13. #CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible

    …) •  Know your business ‛ know your queries •  Linear scaling out 31
  14. #CassandraSummit @doanduyhai @BriceDutheil Going for Cassandra •  Denormalize (if possible

    …) •  Know your business ‛ know your queries •  Linear scaling out •  Consistent performance 32
  15. #CassandraSummit @doanduyhai @BriceDutheil Objectives •  No downtime •  No concurrency

    corner-cases •  Safe rollback possible •  Replay-ability & resume-ability 37
  16. #CassandraSummit @doanduyhai @BriceDutheil Strategy •  3 phases •  Write contacts

    to both data stores •  Old contacts migration •  Switch to Cassandra … •  … and deprecate SQL 41
  17. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 1 Back end server ·

    · · SQL SQL SQL C* C* C* C* C* Write contactUUID 42 contactId … contactUUID 129363 123e4567- e89b-12d3… 834849 contacId(long) + contactUUID
  18. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 SQL SQL SQL C*

    C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL •  On live production, migrate old contacts 44 Old contacts created before phase 1
  19. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 SQL SQL SQL C*

    C* C* C* C* For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week •  On live production, migrate old contacts 45 Old contacts created before phase 1
  20. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 •  During data migration

    … •  … concurrent writes from the migration batch … 48
  21. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 •  During data migration

    … •  … concurrent writes from the migration batch … •  … and updates from production for the same contact 49
  22. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 contact_uuid name (now -1

    week) … name (now) … Johny … Johnny … Insert from batch (to the past) Update from production 50
  23. #CassandraSummit @doanduyhai @BriceDutheil Migration Phase 2 contact_uuid name (now -1

    week) … name (now) … Johny … Johnny … Future reads pick the most up-to-date value 51
  24. #CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS • 

    Lots of joins (no surprise) •  Designed around transactions 57
  25. #CassandraSummit @doanduyhai @BriceDutheil Code Inventory •  Written for RDBMS • 

    Lots of joins (no surprise) •  Designed around transactions •  Spring @Transactional everywhere 58
  26. #CassandraSummit @doanduyhai @BriceDutheil Code Inventory cont. •  Entities go through

    Services & Repositories 59 Repositories Services ContactEntity
  27. #CassandraSummit @doanduyhai @BriceDutheil Code Inventory cont. •  Hibernate is auto-magic

    •  lazy loading •  1st level cache •  N+1 select 61 Repositories Services ContactEntity
  28. #CassandraSummit @doanduyhai @BriceDutheil Which options ? •  Throw existing code

    … •  … and re-design from scratch for Cassandra 62
  29. #CassandraSummit @doanduyhai @BriceDutheil Which options ? •  Throw existing code

    … •  … and re-design from scratch for Cassandra No way ! 63
  30. #CassandraSummit @doanduyhai @BriceDutheil Code Quality •  Existing business code has…

    •  … ≈ 3500 unit tests •  and ≈600+ integration tests 65
  31. #CassandraSummit @doanduyhai @BriceDutheil Code Quality "The code coverage is one

    of your most valuable technical asset" Libon – since beginning 68
  32. #CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  Use CQRS •  ContactReadRepository

    •  ContactWriteRepository •  ContactUpdateRepository •  ContactDeleteRepository 72
  33. #CassandraSummit @doanduyhai @BriceDutheil Refactoring Strategy •  ContactWriteRepository •  write to

    all denormalized tables •  using CQL logged batches •  use TTLs 74
  34. #CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men

    work •  Many iterations to fix bugs (thanks to IT) 78
  35. #CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men

    work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling 79
  36. #CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men

    work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ‛ data model & code validation 81
  37. #CassandraSummit @doanduyhai @BriceDutheil Outcome •  5 months of 2 men

    work •  Many iterations to fix bugs (thanks to IT) •  Lots of performance benchmarks using Gatling ‛ data model & code validation •  … we are almost there for production 82
  38. #CassandraSummit @doanduyhai @BriceDutheil Denormalization, the good •  Support fast reads

    •  1 read ≈ 1 SELECT •  Worthy because mostly read, few updates 84
  39. #CassandraSummit @doanduyhai @BriceDutheil Denormalization, the bad •  Updating mutable data

    can be nightmare •  Data model bound by existing client-facing API •  Update paths very error-prone without tests 85
  40. #CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles

    Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 86
  41. #CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles

    Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 87 user_id always component of partition key
  42. #CassandraSummit @doanduyhai @BriceDutheil Scalable design 88 n1 n2 n3 n4

    n5 n6 n7 n8 A B C D E F G H user_id1 user_id2 user_id3 user_id4 user_id5
  43. #CassandraSummit @doanduyhai @BriceDutheil Scalable design 89 n1 n2 n3 n4

    n5 n6 n7 n8 A B C D E F G H user_id1 user_id2 user_id3 user_id4 user_id5
  44. #CassandraSummit @doanduyhai @BriceDutheil Bloom filters in action 90 •  For

    some tables, partition key = (user_id, contact_id) ‛ fast look-up, leverages Bloom filters ‛ touches 1 SSTable most of the time
  45. #CassandraSummit @doanduyhai @BriceDutheil Data model in detail Contacts_by_id Contacts_by_identifiers Contacts_in_profiles

    Contacts_by_modification_date Contacts_by_firstname_lastname Contacts_linked_user 91 Wide partition Bucketed
  46. #CassandraSummit @doanduyhai @BriceDutheil A "queue" story 93 •  contacts_by_modification_date • 

    queue-like pattern ‛ buckets to the rescue user_id:2014-12 date35 date12 … … date47 … … … … user_id:2014-11 date11 date12 … … date34 … … … …
  47. #CassandraSummit @doanduyhai @BriceDutheil Data model summary •  7 tables for

    denormalization •  Normalize some tables because rare access 95
  48. #CassandraSummit @doanduyhai @BriceDutheil Data model summary •  7 tables for

    denormalization •  Normalize some tables because rare access •  Read-before write in most update scenarios 96
  49. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  In SQL, auto-generated

    long using sequence •  In Cassandra, auto-generated timeuuid 97
  50. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  How to store

    both types ? •  As text ? ‛ easy solution … 99
  51. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  How to store

    both types ? •  As text ? ‛ easy solution … •  … but waste of space ! •  because encoded as UTF-8 or ASCII in Cassandra 100
  52. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  Long ‛ 8

    bytes •  Long as text(UTF-8: 1 byte) ‛ "digits count" bytes 101
  53. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  UUID ‛ 16

    bytes •  32 hex chars + 4 hyphens = 36 chars •  UUID as text(UTF-8: 1 byte) ‛ 36 bytes •  Bytes overhead = 36 – 16 = 20 bytes 102
  54. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  20 bytes wasted

    per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid 104
  55. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  20 bytes wasted

    per contact uuid •  × 7 denormalizations = 140 bytes per contact uuid •  × 109 contacts = 140 GB wasted 105 not even counting replication factor …
  56. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  ‛ just save

    contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) 107
  57. #CassandraSummit @doanduyhai @BriceDutheil Notes on contact_id •  ‛ just save

    contact id as byte[ ] •  Achilles @TypeTransformer for automatic conversion (see later) •  Use blobAsBigInt( ) or blobAsUUID( ) to view data 108
  58. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it

    important ? •  1 contact ≈ 8 mutable fields 111
  59. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it

    important ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … 112
  60. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty checking, why is it

    important ? •  1 contact ≈ 8 mutable fields •  × 7 denormalizations = 56 update combinations … •  and not even counting multiple fields updates … 113
  61. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Are you going to manually

    generate 56+ prepared statements for all possible updates ? 114
  62. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Are you going to manually

    generate 56+ prepared statements for all possible updates ? •  Or just use dynamic plain string statements and get some perf penalty ? 115
  63. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dirty check in action //No

    read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…); manager.update(proxy); 116
  64. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Dynamic statements generation UPDATE contacts

    SET firstname=?, lastname=?,address=? WHERE contact_id=? 118 prepared statements are cached, of course
  65. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Simple INSERT prepared statement INSERT

    INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?); 120
  66. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Runtime values binding •  some

    columns are optional preparedStatement.bind(49374,’John DOE’,33, null, null, …, null); 121
  67. #CassandraSummit @doanduyhai @BriceDutheil Achilles Inserting null 㲇 creating tombstones ×

    7 denormalizations × billions of contacts created 125 not even counting replication factor …
  68. #CassandraSummit @doanduyhai @BriceDutheil Achilles @Entity(table = "contacts_by_id ») @Strategy(insert =

    InsertStrategy.NOT_NULL_FIELDS) public class ContactById { } 126 •  Simple annotation
  69. #CassandraSummit @doanduyhai @BriceDutheil Achilles •  Runtime dynamic INSERT statement INSERT

    INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address); 127 prepared statements are cached, of course
  70. #CassandraSummit @doanduyhai @BriceDutheil Achilles @PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass =

    ContactIdToBytes.class) private ContactId contactId; 128 •  Remember the contactId ⁶ byte[ ] conversion ? BYOC ‛ Bring Your Own Codec
  71. #CassandraSummit @doanduyhai @BriceDutheil Achilles public interface Codec<FROM, TO> { Class<FROM>

    sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); } 129
  72. #CassandraSummit @doanduyhai @BriceDutheil Achilles 130 2014-12-01 14:25:20,554 Bound statement :

    [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...] 2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead] •  Dynamic logging in action
  73. #CassandraSummit @doanduyhai @BriceDutheil Achilles 131 •  Dynamic logging •  runtime

    activation •  no need to recompile/re-deploy •  save us hours of debugging •  TRACE log level ‛ query tracing
  74. #CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is

    crucial •  Double-run strategy & timestamp trick FTW 134
  75. #CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is

    crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky 135
  76. #CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is

    crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! 136
  77. #CassandraSummit @doanduyhai @BriceDutheil Conditions for success •  Data modeling is

    crucial •  Double-run strategy & timestamp trick FTW •  Data type conversion can be tricky •  Benchmark ! •  Mindset shifts for the team 137