text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project available for free download. http://lucene.apache.org/core/
text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project available for free download. http://lucene.apache.org/core/
text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Lucene is an open source project available for free download. http://lucene.apache.org/core/
job in-ter-views. At one, the in-ter-viewer’s first com-ment was “It’s so un- usual that I see a résumé with-out any typos.” “Are you se-ri-ous?” I said. She said, “Yes, prob-a-bly 90% of the résumés I get have ty-pos. And that in-cludes the ones we get from the top schools.” I got the job. Prob-a-bly there were bet-ter-qual-i-fied can-di-dates, but they dam-aged their chances with sloppy résumés. The irony is that those peo-ple, who most needed to hear the in-ter-viewer’s feed-back, weren’t in the room. Be-cause they never got an interview. … ...
job in-ter-views. At one, the in-ter-viewer’s first com-ment was “It’s so un- usual that I see a résumé with-out any typos.” “Are you se-ri-ous?” I said. She said, “Yes, prob-a-bly 90% of the résumés I get have ty-pos. And that in-cludes the ones we get from the top schools.” I got the job. Prob-a-bly there were bet-ter-qual-i-fied can-di-dates, but they dam-aged their chances with sloppy résumés. The irony is that those peo-ple, who most needed to hear the in-ter-viewer’s feed-back, weren’t in the room. Be-cause they never got an interview. … ...
a law stu-dent, I went on a few job in-ter-views. At one, the in-ter-viewer’s first com-ment was “It’s so un- usual that I see a résumé with-out any typos.” “Are you se-ri-ous?” I said. She said, “Yes, prob-a-bly 90% of the résumés I get have ty-pos. And that in-cludes the ones we get from the top schools.” I got the job. Prob-a-bly there were bet-ter-qual-i-fied can-di-dates, but they dam-aged their chances with sloppy résumés. The irony is that those peo-ple, who most needed to hear the in-ter-viewer’s feed-back, weren’t in the room. Be-cause they never got an interview. … ...
a law stu-dent, I went on a few job in-ter-views. At one, the in-ter-viewer’s first com-ment was “It’s so un- usual that I see a résumé with-out any typos.” “Are you se-ri-ous?” I said. She said, “Yes, prob-a-bly 90% of the résumés I get have ty-pos. And that in-cludes the ones we get from the top schools.” I got the job. Prob-a-bly there were bet-ter-qual-i-fied can-di-dates, but they dam-aged their chances with sloppy résumés. The irony is that those peo-ple, who most needed to hear the in-ter-viewer’s feed-back, weren’t in the room. Be-cause they never got an interview. … ... Analyzer
Analyzer: Elasticsearch comes with pre-built analyzers, you can create your own. https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html Document Analysis Character Filter 1 2 3 Tokenizer Token Filter
Analyzer: Elasticsearch comes with pre-built analyzers, you can create your own. https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html Document Analysis Character Filter 1 2 3 Tokenizer Token Filter
houses documents (think RDBMS "table"); ‒ Index a document: insert into an Index ‒ Document: a JSON object (hash map) Stuff a search engine can do Indexing $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }'
frequency He 1 1 who 1 1 controls 1 1 the 1 1 spice 1 1 universe 1 1 A 2 1 mad 2 1 man 2 1 sees 2 1 what 2 1 he 2 1 # document id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."}
frequency He 1 1 who 1 1 controls 1 1 the 1,3 2 spice 1 1 universe 1,3 2 A 2 1 mad 2,3 2 man 2,3 2 sees 2 1 what 2 1 he 2 1 What 3 1 if 3 1 a 3 1 controlled 3 1 # document id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."} # document id 3 {"text": "What if a mad man controlled the universe?"}
frequency he 1,2 2 who 1 1 controls 1 1 the 1,3 2 spice 1 1 universe 1,3 2 a 2,3 2 mad 2,3 2 man 2,3 2 sees 2 1 what 2,3 2 if 3 1 controlled 3 1 # document id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."} # document id 3 {"text": "What if a mad man controlled the universe?"} Lower case token filter
frequency he 1,2 2 who 1 1 control 1,3 2 the 1,3 2 spice 1 1 univers 1,3 2 a 2,3 2 mad 2,3 2 man 2,3 2 see 2 1 what 2,3 2 if 3 1 # document id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."} # document id 3 {"text": "What if a mad man controlled the universe?"} + Stemmer
id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."} # document id 3 {"text": "What if a mad man controlled the universe?"} - Stopwords token document_id frequency he 1,2 2 who 1 1 control 1,3 2 the 1,3 2 spice 1 1 univers 1,3 2 a 2,3 2 mad 2,3 2 man 2,3 2 see 2 1 what 2,3 2 if 3 1
frequency he 1,2 2 who 1 1 control 1,3 2 spice 1 1 univers 1,3 2 mad 2,3 2 man 2,3 2 see 2 1 what 2,3 2 # document id 1 {"text": "He who controls the spice, controls the universe."} # document id 2 {"text": "A mad man sees what he sees."} # document id 3 {"text": "What if a mad man controlled the universe?"}
• Similar to SQL • Find exact values • Ranges • Group by • Match • Match Phrase • Relevancy and boosting • More Like This • Multifield Search • Pipeline Aggregations • Geolocation • Proximity Matching Searching and Ranking
• Similar to SQL • Find exact values • Ranges • Group by • Match • Match Phrase • Relevancy and boosting • More Like This • Multifield Search • Pipeline Aggregations • Geolocation • Proximity Matching Searching and Ranking
main factors of a document’s score: • TF (term frequency): The more a token appears in a doc, the more important it is • IDF (inverse document frequency): The more documents containing the term, the less important it is • Field length: shorter docs are more likely to be relevant than longer docs Searching and Ranking
Guide - https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html Discuss Forum - https://discuss.elastic.co/ Private or Public Training - https://training.elastic.co/ Subscriptions - https://www.elastic.co/subscriptions Stuff a search engine can do Would you like to know more?