Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neo4j Theory and Practice

Neo4j Theory and Practice

GraphConnect 2013, London

Avatar for Tareq Abedrabbo

Tareq Abedrabbo

November 19, 2013
Tweet

More Decks by Tareq Abedrabbo

Other Decks in Technology

Transcript

  1. About me • CTO/Principal Consultant at OpenCredo • Working with

    Neo4j for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)
  2. “If I'm to believe Twitter, half of the earth's population

    are importing Wikipedia into Neo4j, for very obscure reasons.”
  3. • What is Neo4j? • Approaching graph-based applications • Design

    • Implementation • Test • Use cases • Lessons learnt
  4. Domain-Centric • Well-defined data model • Data changes through user

    interactions • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design
  5. Data-Centric • Complex connected data that typically models real world

    networks • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design
  6. • Search and pattern-matching • Find a recommendation based on

    behaviour • Graph algorithms • Shortest path, disconnected components • Optimisation • Maximise oil flow while minimising water
  7. • Start from an initial population of candidate solutions (individuals

    or phenotypes), ideally random • Attribute a score each solution using a fitness function • The only place with specific business knowledge • Apply genetic operators to create a new generation • Cross-breeding to retain best characteristics from each parent • Mutation to maintain diversity and to avoid converging to a local optima too quickly • Stop when you want!
  8. • Don’t follow “best practices” blindly • For domain-centric applications

    you can use a mapping framework, such as Spring Data Neo4j • For data-centric applications, you should stay as close as possible to the graph model • In any case, don’t try to hide the graph!
  9. ! • Expressive • Readable • Maintainable • Performant •

    Cypher + the web console is the quickest way to experiment and to prototype solutions
  10. • Graph algorithms are typically complex • Knowledge of the

    domain can simplify queries and traversals • Make Cypher queries as specific as possible • Take “shortcuts” when you know the domain
  11. • Break down problems into a small queries. Return graph

    resources (or ids) to chain queries. • Robustness principal: “Be conservative in what you do, be liberal in what you accept from others” • Use assertions as preconditions • Assertions document intent • Fail fast if data doesn’t match
  12. • Create a small data sets to capture the initial

    use cases • Write simple unit tests using these datasets to support design and implementation • These tests tend to become less useful when requirements are better understood • Throw them away!
  13. • A realistic data set • Should capture the complexity

    of the real data • Should be sufficiently large • Ideally based on production data • Write functional and integration tests against this dataset
  14. • Graph data is inherently flexible and evolving • Queries

    need to be correct and sufficiently performant • Existing queries’s performance can degrade as the underlying model changes • Assertions on timeouts should be part of the test suite to detect loops and poor performance • JUnit’s @Test(timeout=5) • Spring’s @Timeout(value=5)