Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Search Engine with Python and Elasti...

Building a Search Engine with Python and Elasticsearch

https://us.pycon.org/2018/schedule/presentation/53/

One of the most common actions that we take when visiting any website is search. A common service that powers search for many sites is Elasticsearch - but what makes it so powerful? What can you do with Elasticsearch that you can’t with a regular database?

This tutorial starts with an introduction to Elasticsearch architecture, including what makes it great for search and not so great for other use cases. We will then build an application together with a search engine powered by Elasticsearch. We will also discuss how to optimize search queries and scale as the volume of data increases.

Julie Qiu

May 09, 2018
Tweet

More Decks by Julie Qiu

Other Decks in Technology

Transcript

  1. Building a Search Engine with Python + Elasticsearch Julie Qiu

    @jqiu25 Jim Grandpre @jimtla #PyCon2018
  2. 3

  3. 4

  4. 5

  5. Build comfort with the ES docs, and pyES Have a

    starting point for PY + ES projects. Use ES & Python to make search work. Problem Sets
  6. 15m Introduction to Elasticsearch & Indexing 45m Problem Set: Indexing

    10m Break 5m Introduction to Searching 45m Problem Set: Searching 5m Break 5m Introduction to Analysis 45m Problem Set: Analysis Agenda
  7. Work in Groups, help each other learn. Take your time.

    Read the docs. Problem Set Advice
  8. Many questions on the problem set include “spoilers.” Try to

    figure things out on your own, but if you feel stuck don’t hesitate to read them. Spoilers
  9. Distributed Elasticsearch is typically run in a cluster. Can add

    and remove instances any time. Data is split across instances, and queries are executed across the cluster.
  10. Search Optimized Data is indexed to allow for fast searching.

    Query language for complex searches. Built in support for text analysis.
  11. Document (~ Row) Document (~ Row) Index (~ Table) Database

    Structure Document (~ Row) Document (~ Row)
  12. Document (~ Row) Document (~ Row) Index (~ Table) Database

    Structure Document (~ Row) Document (~ Row) Document Type (Schema)
  13. Elasticsearch DSL Python library – helps with writing and running

    Elasticsearch queries elasticsearch-dsl.readthedocs.io/en/latest/
  14. Example Search: Comparison GET request to localhost:9200/products_index/product { "query": {

    "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()
  15. Example Search: Comparison GET request to localhost:9200/products_index/products { "query": {

    "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()
  16. Example Search: Comparison GET request to localhost:9200/products_index/products { "query": {

    "match": { "name": "necklace" } } } from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search( using=client, index=”products_index”, doc_type=”products” ) s.query( “match”, name=”necklace” ).execute()
  17. Problem Set: Search bit.ly/pycon-es-lesson2 Continuing from Part 1 (also on

    GitHub repo) git commit -am “session1 work” git fetch git checkout session2 source venv/bin/activate python searchapp/index_products.py
  18. Break a string into components. Input: “Walking The: Dog” Standard:

    [“Walking”,“The”,“Dog”] Whitespace: [“Walking”,“The:”,“Dog”] Edge N-grams: [“W”,“Wa”,“Wal”,“Walk”, …] Tokenizers
  19. Analyzers are configured in the mapping of your index. Custom

    analyzers are created in the settings of your index. Analyzing your Index
  20. Analyzers are applied to both the fields of your document,

    and the queries against those fields. Analyzing your Index