{ title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” } • Article.find(2).to_json { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object- oriented programming language.” } • Article.find(3).to_json { title: “Three”, content: “Ruby is a song by English rock band.” } 3 Wednesday, February 6, 13
is what it is” T1 = “what is it” T2 = “it is a banana” “a”: {2} “banana”: {2} “is”: {0, 1, 2} “it”: {0, 1, 2} “what”: {0, 1} A term search for the terms “what”, “is” and “it” {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1} 5 Wednesday, February 6, 13
store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end def analyze content # Split content by words into "tokens" content.split(/\W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there) extend self end 9 Wednesday, February 6, 13
language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” How does search work? Indexing documents 10 Wednesday, February 6, 13
language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Indexed document article1 with tokens: [“ruby”, “language”, “java”, “also”, “language”] Indexed document article2 with tokens: [“ruby”, “song”] Indexed document article3 with tokens: [“ruby”, “stone”] Indexed document article4 with tokens: [“java”, “language”] How does search work? Indexing documents 11 Wednesday, February 6, 13
“ruby”: [1,2,3], “language”: [1,4] } + Relevance Scoring • How many matching terms does this document contain? • How frequently does each term appear in all your documents? • ... other complicated algorithms. 14 Wednesday, February 6, 13
3 Node 4 Master The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards. 19 Wednesday, February 6, 13
bool - filtered - fuzzy - range - geo_shape - ... Filters - term - query - range - bool - and - or - not - limit - match_all - ... ElasticSearch Query DSL With Relevance Without Cache With Cache Without Relevance 22 Wednesday, February 6, 13
“title”: { “type”: “string”, “analyzer”: “trigrams” } } } }’ curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’ ElasticSearch Analyzer C C n o i u p e r t u p u p e p e r . . . 26 Wednesday, February 6, 13