Founder of Vietnamese Elasticsearch Community • Author of Vietnamese Elasticsearch Analysis Plugin • Technical Consultant at Sentifi AG • Co-Founder at Krom • Follow me @duydo
Node is a single server, part of a cluster. • Index is a collection of shards ~ database. • Shard is a collection of documents. • Type is a category/partition of an index ~ table in database. • Document is a Json object ~ record in database.
“must”:[{“term”: {“gender”: “female”}}], # default boost 1 “should”:[ {“term”: {“country”: {“query”:“VN”, “boost”:3}}} # the most important {“term”: {“country”: {“query”:“US”, “boost”:2}}} # important than #1 but not as important as #2 ], } } }
haystack? • What is the average length of the needles? • What is the median length of the needles, broken down by manufacturer? • How many needles are added to the haystacks each month? • What are the most popular needle manufacturers? • ...
size 5-10MB. • Partitions your time series data by time period (monthly, weekly, daily). • Use aliases for your indices. • Turn off refresh, replicas while indexing. Turn on once it’s done • Multiple shards for parallel indexing. • Multiple replicas for parallel reading.