is strictly prohibited Often empty of meaning Used frequently “the quick and brown fox jumped over a ledge” 33% or, it, be, a, and, to Friday, November 22, 13
is strictly prohibited Other general problems: - manually maintain stop-list - language dependent - domain dependent - makes query scoring tricky Friday, November 22, 13
is strictly prohibited Overview - identify “important” terms in query - find documents with “important” terms - score those matching docs with entire query Friday, November 22, 13
is strictly prohibited { "common": { "body": { "query": "the quick and brown fox jumped over the ledge", "cutoff_frequency": 0.001 } } } The Query Friday, November 22, 13
is strictly prohibited { "common": { "body": { "query": "the quick and brown fox jumped over the ledge", "cutoff_frequency": 0.001, "low_freq_operator": "or", "high_freq_operator": "or", "minimum_should_match": { "low_freq" : "60%", "high_freq" : "20%" } } } } Controlling Leniency use “or” for low-freq terms Friday, November 22, 13
fox jumped over the ledge", "cutoff_frequency": 0.001, "low_freq_operator": "or", "high_freq_operator": "or", "minimum_should_match": { "low_freq" : "60%", "high_freq" : "20%" } } } } Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Controlling Leniency how many clauses should match Friday, November 22, 13
is strictly prohibited { "common": { "body": { "query": "to be or not to be", "cutoff_frequency": 0.001 } } } All high-frequency terms Friday, November 22, 13
to be", "cutoff_frequency": 0.001, "minimum_should_match": { "low_freq" : "60%" } } } } Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Controlling Leniency how many clauses should match Friday, November 22, 13
is strictly prohibited Adaptive Stop-lists - Common Terms uses your index for frequency - Adapts to your domain - No manual stop-list creation/maintenance - Adapts to language, etc Friday, November 22, 13
is strictly prohibited Limitations - Frequencies are per-index, not per-type - No good way to pick cutoff frequency - Takes data to “warm” the query Friday, November 22, 13
is strictly prohibited Limitations - Frequencies are per-index, not per-type - No good way to pick cutoff frequency - Takes data to “warm” the query - Some advanced behavior missing (fuzzy, etc) Friday, November 22, 13