Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Deploy your own Spark cluster in 4 minutes usin...
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Pishen Tsai
December 05, 2015
Programming
620
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Deploy your own Spark cluster in 4 minutes using sbt.
Pishen Tsai
December 05, 2015
More Decks by Pishen Tsai
See All by Pishen Tsai
Introduction to Minitime
pishen
1
170
都什麼時代了,你還在寫 while loop 嗎?
pishen
2
750
Pishen's Emacs Journey
pishen
0
160
Scala + Google Dataflow = Serverless Spark
pishen
6
870
Shapeless Introduction
pishen
2
920
ScalaKitchen
pishen
1
480
sbt-emr-spark
pishen
1
170
My Personal Report of Scala Kansai 2016
pishen
0
440
SBT Basic Concepts
pishen
1
670
Other Decks in Programming
See All in Programming
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
780
正しくソフトウェアを作る、前提を疑うための認知の視点 / doubt-premise
minodriven
20
6.5k
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
240
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
12k
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
4.9k
3Dシーンの圧縮
fadis
1
690
Semantic Version 単位で戦略を柔軟に変えて、パッケージアップデートを自動化する
daitasu
0
210
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
200
Javaの型とAI時代に型が大事な理由 / java types and type in AI era
kishida
2
120
Swiftのレキシカルスコープ管理
kntkymt
0
220
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
160
Featured
See All Featured
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
160
Java REST API Framework Comparison - PWX 2021
mraible
34
9.3k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
320
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
200
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.9k
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
Automating Front-end Workflow
addyosmani
1370
210k
The SEO Collaboration Effect
kristinabergwall1
1
480
Paper Plane
katiecoart
PRO
1
51k
Transcript
Pishen Tsai @ KKBOX Deploy your own Spark cluster in
4 minutes using sbt
KKBOX / spark-deployer • SBT plugin. • Productively used in
KKBOX. • 100% Scala. https://github.com/KKBOX/spark-deployer
None
destroy cluster submit job create cluster write the code compile
& assembly
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
https://aws.amazon.com/elasticmapreduce/details/spark http://spark.apache.org/docs/latest/ec2-scripts.html spark-ec2: amazon emr:
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt scp & ssh spark-ec2 spark-ec2
spark-ec2’s commands $ sbt assembly $ spark-ec2 -k awskey -i
~/.ssh/awskey.pem -r us-west-2 -z us-west-2a --vpc-id=vpc-a28d24c7 -- subnet-id=subnet-4eb27b39 -s 2 -t c4.xlarge -m m4.large --spark-version=1.5.2 --copy-aws- credentials launch my-spark-cluster $ scp -i ~/.ssh/awskey.pem target/scala-2.10 /my_job-assembly-0.1.jar root@<copy-master-ip- by-yourself>:~/job.jar $ ssh -i ~/.ssh/awskey.pem root@<master-ip> '. /spark/bin/spark-submit --class mypackage.Main --master spark://<master-ip>:7077 --executor- memory 6G job.jar arg0' $ spark-ec2 -r us-west-2 destroy my-spark- cluster
spark-ec2 write the code compile & assembly submit job create
cluster destroy cluster sbt spark-ec2 spark-ec2 scp & ssh make
spark-ec2’s bad parts Need to install sbt and spark-ec2. Need
to design and maintain Makefiles. Slow startup time (~20mins).
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr
emr’s commands $ sbt assembly $ aws emr create-cluster --name
my-spark-cluster --release-label emr-4.2.0 --instance-type m3. xlarge --instance-count 2 --applications Name=Spark --ec2-attributes KeyName=awskey --use- default-roles $ aws emr put --cluster-id j-2AXXXXXXGAPLF --key- pair-file ~/.ssh/mykey.pem --src target/scala- 2.10/my_job-assembly-0.1.jar --dest /home/hadoop/job.jar $ aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=Spark,Name=my-emr,ActionOnFailure= CONTINUE,Args=[--executor-memory,13G,--class, mypackage.Main,/home/hadoop/job.jar,arg0] $ aws emr terminate-clusters --cluster-id j-2AXX
emr write the code compile & assembly submit job create
cluster destroy cluster sbt emr make
emr’s bad parts Need to install sbt and emr. Need
to design and maintain Makefiles. Spark’s version is old. Restricted machine type.
Since sbt is a powerful build tool itself, why don’t
we let it handle all the dirty works for us?
• spark-ec2 • amazon emr (Elastic MapReduce) • spark-deployer Solutions
spark-deployer write the code compile & assembly submit job create
cluster destroy cluster sbt
spark-deployer’s commands $ sbt "sparkCreateCluster 2" $ sbt "sparkSubmitJob arg0"
$ sbt "sparkDestroyCluster"
spark-deployer’s good parts Need to install only sbt. No Makefile.
Easy to use. Let you focus on your code. Fast and parallel startup (~4mins). Dynamic scale out. Flexible design.
How to use it?
Prerequisites • java • sbt • export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=...
http://www.scala-sbt.org/0.13/tutorial/Manual-Installation.html#Unix sbt installation
Demo
• Report issues. • Join our gitter channel. • Send
pull requests. https://github.com/KKBOX/spark-deployer Give it a try, and share! KKBOX / spark-deployer
Thank you Pishen Tsai @ KKBOX KKBOX / spark-deployer