Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Staying Ahead of Users And Time - two use cases...

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

A talk I gave at berlin buzzwords 2014

Boaz Leskes

May 26, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. A document {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }
  2. A type {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } } = docs with similar data/structure {
 "created_at": "Thu Jan 23 18:27:23 +0000 2014", "id": 426420915698544640, "text": "Elasticsearch es una maravilla !!!!", "user": { "name": "Abel Coronado", "screen_name": "abxda", } }
  3. An index = a collection of types {
 "created_at": "Thu

    Jan 23 "id": 426420915698544640, "text": "Elasticsearch Esc "user": { "name": "Abel Coronado "screen_name": "abxda" } } {
 "id": 19726002, "name": "Abel Coronado "screen_name": "abxda" "location":"Aguascalientes" "followers_count":871 "friends_count":1794 "listed_count":38 }
  4. Sharding index node node shard 3 shard 1 shard 4

    shard 2 node node copy 1 copy 4 copy 3 copy 2
  5. Sharding node node copy 1 copy 3 node node shard

    1 shard 3 copy 4 copy 2 shard 4 shard 2
  6. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2
  7. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 2 shard 3 copy 4 copy 2 shard 4
  8. Sharding - multiple indices node node shard 1 copy 2

    node node node node node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1
  9. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  10. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  11. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  12. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  13. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  14. easy to get, easy to index # curl -XPUT localhost:9200/tweets/tweet/426674590560305150

    -d '{
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk”, "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  15. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10
  16. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10 # curl localhost:9200/index/_search?q=something
  17. Reminds of a tile at my aunt’s house Today is

    the tomorrow we were all afraid of yesterday….
  18. Cluster scales with time ar. 1 april 1 pril 2

    mar. 1 april 1 april 2 may 1 may 2 may 2 may 2 june 2
  19. Scopes searches mar. 1 april 1 april 2 # curl

    localhost:9200/may/_search?q=something mar. 1 april 1 april 2 may 1 may 2 may 2 may 2
  20. one little tweak… # curl -XPUT localhost:9200/tweets_201401/tweet/426674590560305150 -d '{
 "created_at":

    "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  21. index templates curl -XPUT localhost:9200/_template/twitter -d ' { "template" :

    “twitter_*", "settings" : { "number_of_shards" : 4, "number_of_replicas" : 1 } }'
  22. older data # elasticsearch.yml ! node.disk: spinning_disks curl -XPUT localhost:9200/twitter_2012*/_settings

    -d '{ "index.routing.allocation.include.disk" : “spinning_disks”, “index.routing.allocation.exclude.disk" : "ssd" }'
  23. older data curl -XPOST localhost:9200/twitter_201404/_optimize ! curl -XPOST localhost:9200/twitter_201304/_close !

    curl -XDELETE localhost:9200/twitter_201204/ pro tip: https://github.com/elasticsearch/curator
  24. aliases curl -XPUT localhost:9200/_aliases -d ‘{ "actions": { "add": {

    "index": "twitter_201311", "alias": "last_2_months" }, "remove": { "index": "twitter_201309", "alias": "last_2_months" } } }'
  25. Implications • Use indices to manage data as it scales

    over time • Use aliases to efficiently point your searches at the relevant shards
  26. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2
  27. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node
  28. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node Overloaded
  29. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12
  30. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 limited horizon
  31. Solution 12 - both shard 1 shard 2 shard 1

    shard 1 shard 2 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 6 shard 2 user 6 shard 3 user 6 shard 4 user 6 index 1 index 2 index 3
  32. Why is one index per user convenient? # curl -XGET

    localhost:9200/user_1/_search -d '{
 "query": { "match": { "body": "all the things" } } }'
  33. What do we want? • Have the simplicity of one

    user per index • Have the scalability of solution 12
  34. Aliases to the rescue curl -XPUT localhost:9200/_aliases -d ‘{ "actions":

    { "add": { "index": "users_group_1", "alias": "user_1", "filter": { "term": { "user": "user_1" } } } }' # curl -XGET localhost:9200/user_1/_search -d '{
 … }'