Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Large scale graph processing with apache giraph
Search
André Kelpe
May 23, 2012
Programming
2
5.5k
Large scale graph processing with apache giraph
André Kelpe
May 23, 2012
Tweet
Share
More Decks by André Kelpe
See All by André Kelpe
Cascading 3 and beyond
fs111
0
180
The Cascading (big) data application framework
fs111
1
240
SELECT ALL THE THINGS - Cascading Lingual, ANSI SQL for Apache Hadoop
fs111
0
210
A whirlwind tour through Lingual: ANSI SQL for Apache Hadoop
fs111
1
190
Tor for everyone!
fs111
0
200
Other Decks in Programming
See All in Programming
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
0
140
認証・認可の基本を学ぼう前編
kouyuume
0
260
生成AI時代を勝ち抜くエンジニア組織マネジメント
coconala_engineer
0
470
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
6
3.9k
ローカルLLMを⽤いてコード補完を⾏う VSCode拡張機能を作ってみた
nearme_tech
PRO
0
140
LLMで複雑な検索条件アセットから脱却する!! 生成的検索インタフェースの設計論
po3rin
4
950
Go コードベースの構成と AI コンテキスト定義
andpad
0
130
Kotlin Multiplatform Meetup - Compose Multiplatform 외부 의존성 아키텍처 설계부터 운영까지
wisemuji
0
110
エディターってAIで操作できるんだぜ
kis9a
0
750
GISエンジニアから見たLINKSデータ
nokonoko1203
0
180
AI前提で考えるiOSアプリのモダナイズ設計
yuukiw00w
0
180
メルカリのリーダビリティチームが取り組む、AI時代のスケーラブルな品質文化
cloverrose
2
340
Featured
See All Featured
AI: The stuff that nobody shows you
jnunemaker
PRO
1
8
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
190
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.3k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
88
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
130
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
51
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
91k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
34
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.7k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Site-Speed That Sticks
csswizardry
13
1k
Everyday Curiosity
cassininazir
0
110
Transcript
Large scale graph processing with apache giraph André Kelpe @fs111
http://kel.pe
graphs 101
vertices and edges
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
graphs are everywhere road network, the www, social graphs etc.
graphs can be huge
google knows!
Pregel
Pregel by google Describes graph processing approach based on BSP
(Bulk Synchronous Parallel)
pro-tip: search for „pregel_paper.pdf“ on github ;-)
Properties of Pregel batch-oriented, scalable, fault tolerant processing of graphs
It is not a graph database It is a processing
framework
BSP vertex centric processing in so called supersteps
BSP vertices send messages to each other
BSP synchronization points between supersteps
execution of superstep S Each vertex processes messages generated in
S-1 and send messages to be processed in S+1 and determines to halt.
None
apache giraph
giraph Loose implementation of Pregel ideas on top of Hadoop
M/R coming from yahoo
apache giraph http://incubator.apache.org/giraph/
giraph avoid overhead of classic M/R process but reuse existing
infrastructure
giraph simple map jobs in master worker setup. coordination via
zookeeper. messaging via own RPC protocol. in memory processing. custom input and output formats.
current status version 0.1 released compatible with a multitude of
hadoop versions (we use CDH3 at work) still lots of things to do, join the fun!
the APIs the APIs
Vertex-API /** *@param <I> vertex id * @param <V> vertex
data * @param <E> edge data * @param <M> message data */ class BasicVertex<I extends WritableComparable, V extends Writable, E extends Writable, M extends Writable> void compute(Iterator<M> msgIterator); void sendMsg(I id, M msg); void voteToHalt();
Shortest path example https://cwiki.apache.org/confl uence/display/GIRAPH/Shorte st+Paths+Example
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
private boolean isSource() { return (getVertexId().get() == getContext().getConfiguration().getLong(SOURCE_ID, SOURCE_ID_DEFAULT)); }
@Override public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() == 0) { setVertexValue(new DoubleWritable(Double.MAX_VALUE)); } double minDist = isSource() ? 0d : Double.MAX_VALUE; while (msgIterator.hasNext()) { minDist = Math.min(minDist, msgIterator.next().get()); } if (minDist < getVertexValue().get()) { setVertexValue(new DoubleWritable(minDist)); for (Edge<LongWritable, FloatWritable> edge : getOutEdgeMap().values()) { sendMsg(edge.getDestVertexId(), new DoubleWritable(minDist + edge.getEdgeValue().get())); } } voteToHalt(); }
GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(SimpleShortestPathVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass( SimpleShortestPathsVertexOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(„/foo/bar/baz“)); FileOutputFormat.setOutputPath(job, new Path(„/foo/bar/quux“)); job.getConfiguration().setLong(SimpleShortestPathsVertex.SOURCE_ID, Long.parseLong(argArray[2])); job.setWorkerConfiguration(minWorkers, maxWorkers), 100.0f); GiraphJob
see also http://incubator.apache.org/giraph/ https://cwiki.apache.org/confluence/displ ay/GIRAPH/Shortest+Paths+Example http://googleresearch.blogspot.com/2009/ 06/large-scale-graph-computing-at- google.html
Thanks! Questions?