Neo4j Theory and Practice

Neo4j Theory and Practice Tareq Abedrabbo Graph Connect - 19/11/2013

About me • CTO/Principal Consultant at OpenCredo • Working with
Neo4j for (almost) 3 years on a number of different projects • Co-author of Neo4j in Action (Manning)

What is this talk about?

It’s for developers designing and building applications with Neo4j

It’s not a collection of war stories but I will
refer to real-world examples

It is about sharing thoughts and lessons learnt in a
useful way

“If I'm to believe Twitter, half of the earth's population
are importing Wikipedia into Neo4j, for very obscure reasons.”

Agenda

• What is Neo4j? • Approaching graph-based applications • Design
• Implementation • Test • Use cases • Lessons learnt

What really is Neo4j?

A graph model

A query engine

A database

Neo4j is a solid foundation on which to build graph-
based applications

How should I approach graph-based applications?

Is there a useful way to categorise graph-based applications?

Domain-centric applications

Data-centric applications

Domain-Centric • Well-deﬁned data model • Data changes through user
interactions • Flexible but predictable data structure(s) • Recommendation engines, social networks, etc… • Top-down design

Data-Centric • Complex connected data that typically models real world
networks • Integrated from a variety of different sources • Data can be unpredictable • Telco networks, utility networks, etc… • bottom-up design

Typically applications fall somewhere between these 2 types

How can I use the information available in my graph?

• Search and pattern-matching • Find a recommendation based on
behaviour • Graph algorithms • Shortest path, disconnected components • Optimisation • Maximise oil ﬂow while minimising water

Graphs are naturally data-driven

Use case 1: Network Impact Analysis

Requirement: Identify the impact of failing components

Requirement: Identify interesting patterns, such as single points of failure

Labelled property graph is a natural ﬁt for the model

Additional “dimensions” can be added to capture abstract concepts: network
redundancy, load-balancing

Cypher queries are a natural solution to delivering the different
requirements

Use case 2: Oil ﬂow optimisation

Requirement: Identify candidate conﬁgurations to maximise ﬂow

Requirement: Identify the most practical and valuable adjustments to the
network

Simply connected graph with complex components

Interlude: Genetic Algorithms

• Start from an initial population of candidate solutions (individuals
or phenotypes), ideally random • Attribute a score each solution using a ﬁtness function • The only place with speciﬁc business knowledge • Apply genetic operators to create a new generation • Cross-breeding to retain best characteristics from each parent • Mutation to maintain diversity and to avoid converging to a local optima too quickly • Stop when you want!

Is this even a use case for Neo4j?

Persist and share calculated solutions

Inspect intermediary steps

Use Cypher queries to interrogate solutions

Lessons learnt

Understand your domain

• Don’t follow “best practices” blindly • For domain-centric applications
you can use a mapping framework, such as Spring Data Neo4j • For data-centric applications, you should stay as close as possible to the graph model • In any case, don’t try to hide the graph!

Use Cypher

! • Expressive • Readable • Maintainable • Performant •
Cypher + the web console is the quickest way to experiment and to prototype solutions

Manage complexity with domain knowledge

• Graph algorithms are typically complex • Knowledge of the
domain can simplify queries and traversals • Make Cypher queries as speciﬁc as possible • Take “shortcuts” when you know the domain

Write robust and ﬂexible code

• Break down problems into a small queries. Return graph
resources (or ids) to chain queries. • Robustness principal: “Be conservative in what you do, be liberal in what you accept from others” • Use assertions as preconditions • Assertions document intent • Fail fast if data doesn’t match

Start with a representative dataset

• Create a small data sets to capture the initial
use cases • Write simple unit tests using these datasets to support design and implementation • These tests tend to become less useful when requirements are better understood • Throw them away!

Move to a realistic dataset as soon as possible

• A realistic data set • Should capture the complexity
of the real data • Should be sufﬁciently large • Ideally based on production data • Write functional and integration tests against this dataset

Test non-functional aspects

• Graph data is inherently ﬂexible and evolving • Queries
need to be correct and sufﬁciently performant • Existing queries’s performance can degrade as the underlying model changes • Assertions on timeouts should be part of the test suite to detect loops and poor performance • JUnit’s @Test(timeout=5) • Spring’s @Timeout(value=5)

Links • Twitter: @tareq_abedrabbo • Blog: http://www.terminalstate.net • OpenCredo: http://www.opencredo.com
Thank you!

Neo4j Theory and Practice

Neo4j Theory and Practice

More Decks by Tareq Abedrabbo

Other Decks in Technology

Featured

Transcript