Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Searching in Neos with Elasticsearch - Inspirin...

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching of content is a core feature of almost every website, especially for bigger ones. This talk will explain the general approach of proactively indexing content, also highlighting the supported indexing systems like ElasticSearch. It will especially focus on indexing custom data, and show how this can be integrated with data from external systems like Magento. Furthermore, we’ll highlight some features which can be built upon a flexible indexing system, such as tagging or categorization of content. You’ll see that search combined with custom content types solves lots of use cases where custom home-grewn solutions had to be implemented beforehand.

Sebastian Kurfürst

March 28, 2015
Tweet

More Decks by Sebastian Kurfürst

Other Decks in Technology

Transcript

  1. F I R S T N A M E L

    A S T N A M E @ S K U R F U E R S T S E A R C H I N G I N N E O S S E B A S T I A N K U R F Ü R S T
  2. features (Page) main (ContentCollection) … (Headline) … (Text) roadmap (Page)

    neostypo3org (Page) Tree of Nodes de en de en de en en en unsere-codesprints (Page) de
  3. Find all articles written by Sebastian. Display the first three

    locations tagged with ConferenceLocation. What are the newest pages in a certain category?
  4. 1. Set up ElasticSearch # ElasticSearch 1.4.4 - config/elasticsearch.yml script.disable_dynamic:

    sandbox script.groovy.sandbox.class_whitelist: java.util.LinkedHashMap script.groovy.sandbox.receiver_whitelist: java.util.Iterator, 
 java.lang.Object, java.util.Map, java.util.Map$Entry script.groovy.sandbox.enabled: true cluster.name: [PUT_YOUR_CUSTOM_NAME_HERE] network.host: 127.0.0.1 index.number_of_shards: 1 index.number_of_replicas: 0
  5. 1. Node References # build up relation in NodeTypes.yaml 'Sandstorm.News:Article':

    superTypes: ['TYPO3.Neos:Document'] ... properties: tags: type: references ui: label: 'Tags' inspector: editorOptions: # allow only references to tags nodeTypes: ['Sandstorm.News:Tag']
  6. 2. Query in TypoScript # replace main content area by

    a custom TypoScript object prototype(PrimaryContent).newsTag { condition = ${q(node).is('[instanceof Sandstorm.News:Tag]')} type = 'Sandstorm.News:Tag' }
  7. 2. Query in TypoScript # inherits from Template by default

    prototype(Sandstorm.News:Tag) { latestArticlesTaggedWithTag = ${...} }
  8. 2. Query in TypoScript latestArticlesTaggedWithTag = ${Search.query(site) # search underneath

    this site
 .nodeType('Sandstorm.News:Article') # filter by node type .exactMatch('tags', node) # where tag == current tag .limit(3) # first 3 results .sortDesc('publishDate') # and sort by publishing date desc .execute()}
  9. A T2 T1 T3 Normalized Data in a relational DB

    A T1 A T2 A T3 Denormalized Data in an index
  10. Hey, InspiringCon 2015 Hey, InspiringCon 2015 Tokenization Token Filtering Hey

    InspiringCon 2015 hey inspiringcon 2015 Indexing Pipeline InspiringCon2015 Search Pipeline InspiringCon 2015 inspiringcon 2015
  11. ${Search.query(site)
 .fulltext('Alice') .execute()} .log() 15-03-23 07:20:50 1820 DEBUG Query Log

    (): {"query":{"filtered":{"query":{"bool":{"must":[{"match_all":[]},{"query_string": {"query":"Alice"}}]}},"filter":{"bool":{"must":[{"term":{"__parentPath":"\/sites\/neosdemotypo3org"}},{"terms":{"__workspace":["live"]}}],"should":[],"must_not":[{"term": {"_hidden":true}},{"range":{"_hiddenBeforeDateTime":{"gt":"now"}}},{"range":{"_hiddenAfterDateTime":{"lt":"now"}}}]}}}},"fields":["__path"],"highlight":{"fields": {"__fulltext*":{"fragment_size":150,"no_match_size":150,"number_of_fragments":2}}}} -- execution time: 10.998010635376 ms -- Total Results: 28 Data/Logs/ElasticSearch.log
  12. 1. ElasticSearch Schema TYPO3:
 TYPO3CR:
 Search:
 defaultConfigurationPerType:
 string:
 elasticSearchMapping:
 type:

    string
 include_in_all: false
 boolean:
 elasticSearchMapping:
 type: boolean
 date:
 elasticSearchMapping:
 type: date
 format: 'date_time_no_millis'
 include_in_all: false
 Settings.yaml NodeTypes.yaml 
 'TYPO3.Neos:Node': &node
 properties:
 '__identifier':
 search:
 elasticSearchMapping:
 type: string
 index: not_analyzed
 include_in_all: false
 
 defaults overrides indexing: '${node.identifier}'
  13. 3. Fulltext Searching We at InspiringCon (Article) main (ContentCollection) …

    (Headline) … (Text) collect all content Fulltext Root
  14. 3. Fulltext Searching # predefined in Neos 'TYPO3.Neos:Document':
 search:
 fulltext:


    isRoot: true 'TYPO3.Neos.NodeTypes:Text':
 properties:
 'text':
 search:
 fulltextExtractor: '${Indexing.extractHtmlTags(value)}' 'Sandstorm.News:Article':
 properties:
 'title':
 search:
 fulltextExtractor: '${Indexing.extractInto("h1", value)}'
  15. ElasticSearch too big for your project? Use SimpleSearch! composer require

    --prefer-source flowpack/simplesearch- contentrepositoryadaptor @dev