Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neo4j Theory and Practice

Neo4j Theory and Practice

GraphConnect 2013, London

Tareq Abedrabbo

November 19, 2013
Tweet

More Decks by Tareq Abedrabbo

Other Decks in Technology

Transcript

  1. About me • CTO/Principal Consultant at OpenCredo • Working with

    Neo4j for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)
  2. “If I'm to believe Twitter, half of the earth's population

    are importing Wikipedia into Neo4j, for very obscure reasons.”
  3. • What is Neo4j? • Approaching graph-based applications • Design

    • Implementation • Test • Use cases • Lessons learnt
  4. Domain-Centric • Well-defined data model • Data changes through user

    interactions • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design
  5. Data-Centric • Complex connected data that typically models real world

    networks • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design
  6. • Search and pattern-matching • Find a recommendation based on

    behaviour • Graph algorithms • Shortest path, disconnected components • Optimisation • Maximise oil flow while minimising water
  7. • Start from an initial population of candidate solutions (individuals

    or phenotypes), ideally random • Attribute a score each solution using a fitness function • The only place with specific business knowledge • Apply genetic operators to create a new generation • Cross-breeding to retain best characteristics from each parent • Mutation to maintain diversity and to avoid converging to a local optima too quickly • Stop when you want!
  8. • Don’t follow “best practices” blindly • For domain-centric applications

    you can use a mapping framework, such as Spring Data Neo4j • For data-centric applications, you should stay as close as possible to the graph model • In any case, don’t try to hide the graph!
  9. ! • Expressive • Readable • Maintainable • Performant •

    Cypher + the web console is the quickest way to experiment and to prototype solutions
  10. • Graph algorithms are typically complex • Knowledge of the

    domain can simplify queries and traversals • Make Cypher queries as specific as possible • Take “shortcuts” when you know the domain
  11. • Break down problems into a small queries. Return graph

    resources (or ids) to chain queries. • Robustness principal: “Be conservative in what you do, be liberal in what you accept from others” • Use assertions as preconditions • Assertions document intent • Fail fast if data doesn’t match
  12. • Create a small data sets to capture the initial

    use cases • Write simple unit tests using these datasets to support design and implementation • These tests tend to become less useful when requirements are better understood • Throw them away!
  13. • A realistic data set • Should capture the complexity

    of the real data • Should be sufficiently large • Ideally based on production data • Write functional and integration tests against this dataset
  14. • Graph data is inherently flexible and evolving • Queries

    need to be correct and sufficiently performant • Existing queries’s performance can degrade as the underlying model changes • Assertions on timeouts should be part of the test suite to detect loops and poor performance • JUnit’s @Test(timeout=5) • Spring’s @Timeout(value=5)