is strictly prohibited About Elasticsearch Inc. • Founded in 2012 By the people behind the Elasticsearch and Apache Lucene http://www.elasticsearch.com Headquarters: Amsterdam and Los Altos, CA • We provide Training (public & onsite) Development support Production support subscription (SLA)
is strictly prohibited file descriptors ! ! ! “Make sure to increase the number of open files descriptors on the machine (or for the user running elasticsearch). Setting it to 32k or even 64k is recommended.” ! ! Source: setup and configuration guide
is strictly prohibited main concepts • node a running elasticsearch instance (typically JVM process) • cluster a group of nodes sharing the same set of indices • index a set of documents of possibly different types stored in one or more shards • shard a lucene index, allocated on one of the nodes
is strictly prohibited master node • elected when nodes form a cluster • coordinates work of other nodes through cluster state • the only node that can update cluster state • publishes cluster state to other node
is strictly prohibited cluster state • nodes list of nodes in the cluster, their addresses, attributes and master • index metadata settings, mappings and aliases • shard routing table where the shards can be found • index templates • cluster settings persistent and transient
is strictly prohibited cluster state - persistent • nodes list of nodes in the cluster, their addresses, attributes and master • index metadata settings, mappings and aliases • shard routing table where the shards can be found • index templates • cluster settings persistent and transient
is strictly prohibited data directory • “data” directory in elasticsearch home by default • path.data in config/elasticearch.yml • --path.data=… on command line • handled by deb and rpm packages !
is strictly prohibited multiple nodes per data dir • <data_dir>/<cluster_name>/nodes/NNN where NNN = 0, 1, 2, ... ! • node.max_local_storage_nodes! default 50
is strictly prohibited transaction log • transaction log stores every operation (create/update/delete) fsync-ed every 5 sec (configurable) replayed on node restart • lucene segments fsync-ed when transaction log is full (every 30 min, 200mb or 500 operations)
is strictly prohibited inverted index • Document 1: { “text”: “Elasticsearch is an open source, distributed search engine.”, “date”: “2014-07-01” } • Document 2: { “text”: “Elasticsearch is a search server based on Lucene.”, “date”: “2014-07-02” }
is strictly prohibited analysis • “Elasticsearch is an open source, distributed search engine.” could be translated into tokens: – elasticsearch – open – source – distributed – search – engine • “Elasticsearch is a search server based on Lucene.” could be translated into tokens: – elasticsearch – search – server – based – lucene
is strictly prohibited field data • “uninverted" inverted index • documents->tokens • can be built from inverted index on demand • can be stored with index as doc values • segmented • used by sorting, aggregations, scripts, etc
is strictly prohibited field data - text document tokens 1 distributed, elasticsearch, engine, open, search, source 2 based, elasticsearch, lucene, search, server
is strictly prohibited stored fields • _source - JSON source of the entire document • _parent id • routing • ttl • _uid • any other field marked as “stored”
is strictly prohibited QUERY phase - shard level Shard Engine Segment 1 Segment 2 Segment 3 Segment 4 Segment N ……. • each shard searches all segments in the shard one after another
is strictly prohibited QUERY phase - shard level Shard Engine Segment 1 Segment 2 Segment 3 Segment 4 Segment N ……. seg1, 2, [2014-07-02] seg1, 1, [2014-07-01] ……. • all segments are searched and top 10 documents are collected for each shard • for each document internal Lucene id and sort key is stored
is strictly prohibited Node 1 Node 2 QUERY phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action • top 10 ids and sort keys for each shard are sent to requesting node • requesting node resorts them and finds global top10
is strictly prohibited Node 1 Node 2 FETCH phase - node level Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Search Action • global top 10 documents are requested • only shards that have these top 10 documents are contacted