Grab bag of tips to help improve your queries in Elasticsearch. Everything may not be applicable to your data/architecture, so feel free to skim and selectively steal tips :)
is strictly prohibited Top level filter is slow(er) {! ! “query” : { … },! ! “filter” : { … }! }! Don’t use this unless you need it (only useful with facets)
is strictly prohibited Avoid deep pagination {! ! “query” : { … },! ! “from” : 10000000,! ! “size” : 10! }! Builds a PriorityQueue 10,000,010 large (for each shard in your index) (just to return 10 results)
is strictly prohibited Common Terms Very cool query, makes stop-words obsolete ! See this presentation: https://speakerdeck.com/polyfractal/common-terms-query
is strictly prohibited _source.my_field _fields.my_field Do not EVER use these in a search script: These access the disk and are sloooooow. You will destroy your performance FOR ALL THAT IS HOLY
is strictly prohibited Use common sense In general, scripting is slower than queries. Don’t go crazy. ! If you end up with a 10-page script, bake some of that logic into your index
is strictly prohibited Codecs Controls the data structure for • terms • positions • frequencies http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/index-modules-codec.html
is strictly prohibited Memory Codec When you have more memory than Google • Data stored as uncompressed arrays in memory • About as fast as you can go (2m Wikipedia dataset == 8gb used heap)
is strictly prohibited Bloom-pulsing Codec Fast execution for rare terms • Bloom filter fails fast if term is not present • Pulsing inlines the postings to avoid extra disk seek • Good for rare terms, ID numbers, etc (_uid uses this internally for fast get-by-id scenarios)
is strictly prohibited Memory Codec Super Fast execution for rare terms • Encodes your term dictionary as an in-memory FST • Crazy fast lookups • Really great for “primary keys” • Compresses certain “sequential” data very well ( “00001”, “00002”, “00003”, etc)
is strictly prohibited Boosted Synonyms This is actually kinda slow We want to boost synonyms, such that: vegetable => potato^3, tomato^2, carrot^1 https://gist.github.com/polyfractal/10276706