Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
PyConJP 2015: Dask: 軽量並列計算フレームワーク (Lightning ta...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Sinhrks
October 10, 2015
Programming
16k
3
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
PyConJP 2015: Dask: 軽量並列計算フレームワーク (Lightning talks)
Sinhrks
October 10, 2015
More Decks by Sinhrks
See All by Sinhrks
daskperiment: Reproducibility for Humans
sinhrks
1
440
PythonとApache Arrow
sinhrks
6
2k
大規模データの機械学習におけるDaskの活用
sinhrks
10
3.3k
機械学習と解釈可能性
sinhrks
7
5.8k
LIME
sinhrks
2
1.5k
データ分析言語R 1年の振り返り
sinhrks
5
2.6k
pandasでのOSS活動事例と最初の一歩
sinhrks
2
20k
Dask Distributedによる分散機械学習
sinhrks
4
1.6k
Data processing using pandas and Dask
sinhrks
1
300
Other Decks in Programming
See All in Programming
キャリア迷子上等 ─ "ない道"は自分で作ればいい
16bitidol
3
2.3k
AI時代のUIはどこへ行く?その2!
yusukebe
22
7.5k
Vue × Nuxt × Oxc どこまで使える?実運用の現在地
andpad
0
310
Vite+ Unified Toolchain for the Web
naokihaba
0
360
Signal Forms: Details & Live Coding @enterJS 2026 in Mannheim
manfredsteyer
PRO
0
200
Go1.27で導入されるジェネリクスメソッドでできること
mackee
0
190
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
170
[2026年度第1回ORセミナー] 計画最適化ベンチャーと競技プログラミング人材
terryu16
0
270
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
8.4k
PHPで使える日時の表現と、その知り方 #frontend_phpcon_do
o0h
PRO
0
270
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
14
6.4k
気づいたらRubyで100作品 ー クリエイティブコーディングが生活の一部になるまで / 100 Ruby Sketches Later: How Creative Coding Became Part of My Life
chobishiba
3
610
Featured
See All Featured
Skip the Path - Find Your Career Trail
mkilby
1
150
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
How to build a perfect <img>
jonoalderson
1
5.7k
BBQ
matthewcrist
89
10k
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.3k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
280
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
400
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
290
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
11k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
300
Transcript
Dask ܰྔฒྻܭࢉϑϨʔϜϫʔΫ
ࣗݾհ • Data Analyst • OSS ׆ಈ: • PyData Development
Team (pandasίϛολ) • Blaze Development Team (Daskίϛολ) • GitHub: https://github.com/sinhrks
Dask • ܰྔฒྻࢄϑϨʔϜϫʔΫ (ϊʔυฒྻ͕ओ) • NumPy, PyToolz, pandasͷAPI (αϒηοτ) Λ
ͭσʔλߏΛఏڙ αϒϞδϡʔϧ ϕʔεύοέʔδ EBTLBSSBZ /VN1ZOEBSSBZ EBTLCBH 1Z5PPM[ MJTU TFU EJDUʹର͢Δॲཧ EBTLEBUBGSBNF QBOEBT%BUB'SBNF
DataFrame • pandas.DataFrame: ϥϕϧ͖ͷ2࣍ݩσʔλ • Dask.DataFrame: pandas.DataFrame Λׂͯ͠ ॲཧ QBOEBT%BUB'SBNF
%BTL%BUB'SBNF
Dask DataFrame import pandas as pd df = pd.DataFrame({'X': np.arange(10),
'Y': np.arange(10, 20), 'Z': np.arange(20, 30)}, index=list('abcdefghij')) df import dask.dataframe as dd ddf = dd.from_pandas(df, 2) ddf dd.DataFrame<from_pandas-…, divisions=('a', 'f', 'j')> ߦྻͷ QBOEBT%BUB'SBNFΛ࡞ σʔλΛ෦తʹͭʹׂ͠ɺ %BTL%BUB'SBNFΛ࡞
DaskͰͷܭࢉॲཧ ddf + 1 dd.DataFrame<elemwise-…, divisions=('a', 'f', 'j')> (ddf +
1).compute() EG EEG DPNQVUF EEG શମʹΛՃࢉɻ ࣮ࡍͷܭࢉॲཧ·࣮ͩߦ͞Εͳ͍ ܭࢉΛ࣮ߦ
Blocked Algorithm (Ճࢉ) $PODBU (ddf + 1).compute() ॲཧલͷ
QBOEBT%BUB'SBNF %BTL%BUB'SBNF ʹม ׂ͞Εͨσʔλʹରͯ͠ ܭࢉ࣮ߦ݁ՌΛ݁߹ ॲཧޙͷ QBOEBT%BUB'SBNF
Blocked Algorithm (߹ܭ) ddf.sum().compute() 4VN 4VN $PODBU 4VN TVN ճ
DPODBU TVN ճ
Blocked Algorithm (߹ܭ) ddf.sum().visualize() TVN ճ DPODBU TVN ճ ॲཧલͷ
%BTL%BUB'SBNF
Blocked Algorithm (ฏۉ) ddf.mean().visualize() TVN DPVOU NFBOTVNDPVOU
Blocked Algorithm (ཁ౷ܭྔ) ddf.describe().visualize() ݁Ռ
Dask DataFrameͷػೳ • ࢛ଇԋࢉ/ൺֱԋࢉ • ౷ܭྔ • ϥϕϧʹΑΔσʔλબ • άϧʔϓԽ
/ ू • ࿈݁/݁߹ (merge, join, concat…)
ύϑΥʔϚϯεൺֱ • AWS EC2: c4.2xlarge (vCPU: 8, ϝϞϦ: 15 GiB)
n = 100000000 df = pd.DataFrame({'a': np.random.randint(1, 100, n), 'b': np.random.randn(n)}) df ddf = dd.from_pandas(df, 5) ddf dd.DataFrame<from_pandas-…, divisions=(0, 20000000, 40000000, 60000000, 80000000, 99999999)> ԯߦྻͷ QBOEBT%BUB'SBNFΛ࡞ σʔλΛ෦తʹͭʹׂ͠ɺ %BTL%BUB'SBNFΛ࡞
ύϑΥʔϚϯεൺֱ %timeit df.describe() 1 loops, best of 3: 25.3 s
per loop %timeit ddf.describe().compute() 1 loops, best of 3: 3.87 s per loop QBOEBT %BTL
݁Ռ • ※ percentile ۙࣅΞϧΰϦζϜΛར༻͢ΔͨΊɺ ʹࠩҟ͕ग़Δ߹͕͋Δ () df.describe() ddf.describe().compute() QBOEBT
%BTL
݁Ռ ddf.describe().visualize()
·ͱΊ • Dask: ܰྔฒྻࢄϑϨʔϜϫʔΫ • NumPy, PyToolz, pandas ͷAPIͷαϒηοτΛ ఏڙ
• Ϣʔβ NumPy / PyToolz / pandas ͷ API Λ ར༻ͯ͠ฒྻܭࢉ͕Ͱ͖Δ