Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Large scale graph processing with apache giraph
Search
André Kelpe
May 23, 2012
Programming
2
5.4k
Large scale graph processing with apache giraph
André Kelpe
May 23, 2012
Tweet
Share
More Decks by André Kelpe
See All by André Kelpe
Cascading 3 and beyond
fs111
0
160
The Cascading (big) data application framework
fs111
1
230
SELECT ALL THE THINGS - Cascading Lingual, ANSI SQL for Apache Hadoop
fs111
0
190
A whirlwind tour through Lingual: ANSI SQL for Apache Hadoop
fs111
1
180
Tor for everyone!
fs111
0
170
Other Decks in Programming
See All in Programming
3 Effective Rules for Using Signals in Angular
manfredsteyer
PRO
0
100
Quine, Polyglot, 良いコード
qnighy
4
650
Snowflake x dbtで作るセキュアでアジャイルなデータ基盤
tsoshiro
2
520
Better Code Design in PHP
afilina
PRO
0
130
macOS でできる リアルタイム動画像処理
biacco42
9
2.4k
subpath importsで始めるモック生活
10tera
0
310
Macとオーディオ再生 2024/11/02
yusukeito
0
370
GitHub Actionsのキャッシュと手を挙げることの大切さとそれに必要なこと
satoshi256kbyte
5
430
みんなでプロポーザルを書いてみた
yuriko1211
0
280
型付き API リクエストを実現するいくつかの手法とその選択 / Typed API Request
euxn23
8
2.2k
Compose 1.7のTextFieldはPOBox Plusで日本語変換できない
tomoya0x00
0
190
Pinia Colada が実現するスマートな非同期処理
naokihaba
4
230
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
136
6.6k
Building an army of robots
kneath
302
43k
Building Better People: How to give real-time feedback that sticks.
wjessup
364
19k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
329
21k
Facilitating Awesome Meetings
lara
50
6.1k
Automating Front-end Workflow
addyosmani
1366
200k
KATA
mclloyd
29
14k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.2k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
65k
[RailsConf 2023] Rails as a piece of cake
palkan
52
4.9k
Documentation Writing (for coders)
carmenintech
65
4.4k
Transcript
Large scale graph processing with apache giraph André Kelpe @fs111
http://kel.pe
graphs 101
vertices and edges
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
graphs are everywhere road network, the www, social graphs etc.
graphs can be huge
google knows!
Pregel
Pregel by google Describes graph processing approach based on BSP
(Bulk Synchronous Parallel)
pro-tip: search for „pregel_paper.pdf“ on github ;-)
Properties of Pregel batch-oriented, scalable, fault tolerant processing of graphs
It is not a graph database It is a processing
framework
BSP vertex centric processing in so called supersteps
BSP vertices send messages to each other
BSP synchronization points between supersteps
execution of superstep S Each vertex processes messages generated in
S-1 and send messages to be processed in S+1 and determines to halt.
None
apache giraph
giraph Loose implementation of Pregel ideas on top of Hadoop
M/R coming from yahoo
apache giraph http://incubator.apache.org/giraph/
giraph avoid overhead of classic M/R process but reuse existing
infrastructure
giraph simple map jobs in master worker setup. coordination via
zookeeper. messaging via own RPC protocol. in memory processing. custom input and output formats.
current status version 0.1 released compatible with a multitude of
hadoop versions (we use CDH3 at work) still lots of things to do, join the fun!
the APIs the APIs
Vertex-API /** *@param <I> vertex id * @param <V> vertex
data * @param <E> edge data * @param <M> message data */ class BasicVertex<I extends WritableComparable, V extends Writable, E extends Writable, M extends Writable> void compute(Iterator<M> msgIterator); void sendMsg(I id, M msg); void voteToHalt();
Shortest path example https://cwiki.apache.org/confl uence/display/GIRAPH/Shorte st+Paths+Example
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
private boolean isSource() { return (getVertexId().get() == getContext().getConfiguration().getLong(SOURCE_ID, SOURCE_ID_DEFAULT)); }
@Override public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() == 0) { setVertexValue(new DoubleWritable(Double.MAX_VALUE)); } double minDist = isSource() ? 0d : Double.MAX_VALUE; while (msgIterator.hasNext()) { minDist = Math.min(minDist, msgIterator.next().get()); } if (minDist < getVertexValue().get()) { setVertexValue(new DoubleWritable(minDist)); for (Edge<LongWritable, FloatWritable> edge : getOutEdgeMap().values()) { sendMsg(edge.getDestVertexId(), new DoubleWritable(minDist + edge.getEdgeValue().get())); } } voteToHalt(); }
GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(SimpleShortestPathVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass( SimpleShortestPathsVertexOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(„/foo/bar/baz“)); FileOutputFormat.setOutputPath(job, new Path(„/foo/bar/quux“)); job.getConfiguration().setLong(SimpleShortestPathsVertex.SOURCE_ID, Long.parseLong(argArray[2])); job.setWorkerConfiguration(minWorkers, maxWorkers), 100.0f); GiraphJob
see also http://incubator.apache.org/giraph/ https://cwiki.apache.org/confluence/displ ay/GIRAPH/Shortest+Paths+Example http://googleresearch.blogspot.com/2009/ 06/large-scale-graph-computing-at- google.html
Thanks! Questions?