to the database: curl –XPUT • _search endpoint • Simple Query Format • Fields attribute – Specifying which field to search within. • Filtered Queries • Mappings • Example Queries • What more can be done?
server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. • Elastic search is developed in Java and is released as open source under the terms of the Apache License. • Elastic search can be used to search all kinds of documents. It provides scalable search, has near real-time search, and supports multitenancy.
can be divided into shards and each shard can have zero or more replicas. • Each node hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically ". • It uses Lucene and tries to make all features of it available through the JSON and Java API. It supports facetting and percolating, which can be useful for notifying if new documents match for registered queries. • Another feature is called 'Gateway' and handles the long term persistence of the index- i.e. an index can be recovered from the Gateway in a case of a server crash. • Elasticsearch supports real-time GET requests, which makes it suitable as a NoSQL solution, but it lacks distributed transactions.
pattern: <index>/<type>/_search where index and type are both optional. In other words, in order to search for our movies we can make POST requests to either of the following URLs: • http://localhost:9200/_search - Search across all indexes and all types. • http://localhost:9200/movies/_search - Search across all types in the movies index. • http://localhost:9200/movies/movie/_search - Search explicitly for documents of type movie within the movies index.
entire indexed database. Let's try a search for the word "kill" which is present in the title of two of our movies: curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { "query": "kill" } } }'
searches for "ford" in a particular category or column ("title") using the attribute "fields“: curl -XPOST "http://localhost:9200/_search" -d' { "query": { "query_string": { "query": "ford", "fields": ["title"] } } }'
executed it filters the result of the query using a filter. For this simple case, where a certain field should match a specific value a term filter will work well. curl -XPOST "http://localhost:9200/_search" -d' { "query": { "filtered": { "query": { "query_string": { "query": "drama" } }, "filter": { "term": { "year": 1962 } } } } }'
to apply the filter i.e., we want movies matching a certain criteria. Solution 1: Replace the query string query in the filtered query with a "match_all" query which is a query that simply matches everything. curl -XPOST "http://localhost:9200/_search" -d' { "query": { "filtered": { "query": { "match_all": { } }, "filter": { "term": { "year": 1962 } } } } }'
{ "query": { "constant_score": { "filter": { "term": { "director": "Francis Ford Coppola" } } } } }’ What is the problem ? On querying the total hits are “zero" even though we have indexed two movies with "Francis Ford Coppola" as director.
While Elastic Search has a JSON object with that data that it returns to us in search results in the form of the _source property that's not what it has in its index. • When we index a document with Elastic Search, it does two things: 1. Stores the original data untouched for later retrieval in the form of _source 2. Indexes each JSON property into one or more fields in a Lucene index. • During the indexing it processes each field according to how the field is mapped. If it isn't mapped default mappings depending on the fields type (string, number etc.) is used. • As we haven't supplied any mappings for our index, Elastic Search uses the default mappings for strings for the director field. This means that in the index the director fields value isn't "Francis Ford Coppola". Instead it's something more like [“Francis", “Ford", “Coppola"]. • This can be verified by modifying our filter to instead match “Francis" (or “Ford" or “Coppola"): We get two hits!
a number of ways to add mappings to Elastic Search, through a configuration file, as part of a HTTP request that creates and index and by calling the _mapping endpoint. Therefore, we add a mapping for the "director" field instructing Elastic Search not to analyze (tokenize etc.) the field at all when indexing it, like this: curl -XPUT "http://localhost:9200/movies/movie/_mapping" -d' { "movie": { "properties": { "director": { "type": "string", "index": "not_analyzed" } } } }'
already is a mapping for the field: REQUEST FAILED ERROR. 2. In many cases its not possible to modify existing mappings. Workaround: To create a new index with the desired mappings and re-index all of the data into the new index. 3. Even if we could add it, we would have limited our ability to search in the director field. That is, while a search for the exact value in the field would match we wouldn't be able to search for single words in the field.
map the field multiple times for indexing. Given that one of the ways we map it match the existing mapping both by name and settings that will work fine and we won't have to create a new index. curl -XPUT "http://localhost:9200/movies/movie/_mapping" -d' { "movie": { "properties": { "director": { "type": "multi_field", "fields": { "director": {"type": "string"}, "original": {"type" : "string", "index" : "not_analyzed"} } } } } }'
it sees a property named "director" in a movie document that is about to be indexed in the movies index it should index it multiple times. Once into a field with the same name “director” and once into a field named "director.original" and the latter field should not be analyzed, maintaining the original value allowing is to filter by the exact director name.
can re-index one or both of the movies directed by Francis Ford Coppola (copy from the list of initial indexing requests) and try the search request that filtered by author again. Only, this time we don't filter on the "director" field (which is indexed the same way as before) but instead on the "director.original" field: curl -XPOST "http://localhost:9200/_search" -d' { "query": { "constant_score": { "filter": { "term": { "director.original": "Francis Ford Coppola" } } } } }'
We can create search requests where we specify how many hits we want to use highlighting. • Get spelling suggestions and much more. • Also, the query DSL contains many interesting queries and filters that we can use. • A whole range of facets that we can use to extract statistics from our data or build navigations. • We can go far beyond the simple mapping example we've seen here to accomplish wonderful and interesting things. • Performance optimizations and considerations. • Functionality to find similar content.