Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Rayon (Rust Belt Rust)
Search
nikomatsakis
October 28, 2016
Programming
7
1k
Rayon (Rust Belt Rust)
A talk about Rayon from the Rust Belt Rust conference
nikomatsakis
October 28, 2016
Tweet
Share
More Decks by nikomatsakis
See All by nikomatsakis
Hereditary Harrop Formulas (Papers We Love Boston)
nikomatsakis
2
450
Rust: Systems Programming for All!
nikomatsakis
0
160
CppNow 2017
nikomatsakis
0
190
Rust at Mozilla (part of Mozilla Onboarding)
nikomatsakis
0
150
Guaranteeing Memory Safety and Data-Race Freedom in Rust
nikomatsakis
0
210
Other Decks in Programming
See All in Programming
KMP와 kotlinx.rpc로 서버와 클라이언트 동기화
kwakeuijin
0
160
Monixと常駐プログラムの勘どころ / Scalaわいわい勉強会 #4
stoneream
0
280
PSR-15 はあなたのための ものではない? - phpcon2024
myamagishi
0
140
モバイルアプリにおける自動テストの導入戦略
ostk0069
0
110
ドメインイベント増えすぎ問題
h0r15h0
2
360
快速入門可觀測性
blueswen
0
380
テストコード文化を0から作り、変化し続けた組織
kazatohiei
2
1.5k
MCP with Cloudflare Workers
yusukebe
2
220
PHPで学ぶプログラミングの教訓 / Lessons in Programming Learned through PHP
nrslib
3
300
KubeCon + CloudNativeCon NA 2024 Overviewat Kubernetes Meetup Tokyo #68 / amsy810_k8sjp68
masayaaoyama
0
260
StarlingMonkeyを触ってみた話 - 2024冬
syumai
3
270
From Translations to Multi Dimension Entities
alexanderschranz
2
130
Featured
See All Featured
Designing for Performance
lara
604
68k
KATA
mclloyd
29
14k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
26
1.9k
Producing Creativity
orderedlist
PRO
341
39k
For a Future-Friendly Web
brad_frost
175
9.4k
Side Projects
sachag
452
42k
Rails Girls Zürich Keynote
gr2m
94
13k
Optimising Largest Contentful Paint
csswizardry
33
3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
2
290
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
A Tale of Four Properties
chriscoyier
157
23k
Transcript
Rayon Data Parallelism for Fun and Profit Nicholas Matsakis (nmatsakis
on IRC)
Want to make parallelization easy 2 fn load_images(paths: &[PathBuf]) ->
Vec<Image> { paths.iter() .map(|path| Image::load(path)) .collect() } fn load_images(paths: &[PathBuf]) -> Vec<Image> { paths.par_iter() .map(|path| Image::load(path)) .collect() } For each path… …load an image… …create and return a vector.
Want to make parallelization safe 3 fn load_images(paths: &[PathBuf]) ->
Vec<Image> { let mut pngs = 0; paths.par_iter() .map(|path| { if path.ends_with(“png”) { pngs += 1; } Image::load(path) }) .collect() } Data-race Will not compile
4 http://blog.faraday.io/saved-by-the-compiler-parallelizing-a-loop-with-rust-and-rayon/
5 Parallel Iterators join() threadpool Basically all safe Safe interface
Unsafe impl Unsafe
6 fn load_images(paths: &[PathBuf]) -> Vec<Image> { paths.iter() .map(|path| Image::load(path))
.collect() }
7 fn load_images(paths: &[PathBuf]) -> Vec<Image> { paths.par_iter() .map(|path| Image::load(path))
.collect() }
Not quite that simple… 8 (but almost!) 1. No mutating
shared state (except for atomics, locks). 2. Some combinators are inherently sequential. 3. Some things aren’t implemented yet.
9 fn load_images(paths: &[PathBuf]) -> Vec<Image> { let mut pngs
= 0; paths.par_iter() .map(|path| { if path.ends_with(“png”) { pngs += 1; } Image::load(path) }) .collect() } Data-race Will not compile
10 `c` not shared between iterations! fn increment_all(counts: &mut [u32])
{ for c in counts.iter_mut() { *c += 1; } } fn increment_all(counts: &mut [u32]) { paths.par_iter_mut() .for_each(|c| *c += 1); }
fn load_images(paths: &[PathBuf]) -> Vec<Image> { let pngs = paths.par_iter()
.filter(|p| p.ends_with(“png”)) .map(|_| 1) .sum(); paths.par_iter() .map(|p| Image::load(p)) .collect() } 11
12 But beware: atomics introduce nondeterminism! use std::sync::atomic::{AtomicUsize, Ordering}; fn
load_images(paths: &[PathBuf]) -> Vec<Image> { let pngs = AtomicUsize::new(0); paths.par_iter() .map(|path| { if path.ends_with(“png”) { pngs.fetch_add(1, Ordering::SeqCst); } Image::load(path) }) .collect() }
13 3 2 1 12 0 4 5 1 2
1 3 2 1 0 1 3 4 0 3 6 7 8 vec1 vec2 6 2 6 * sum 8 82 fn dot_product(vec1: &[i32], vec2: &[i32]) -> i32 { vec1.iter() .zip(vec2) .map(|(e1, e2)| e1 * e2) .fold(0, |a, b| a + b) // aka .sum() }
14 fn dot_product(vec1: &[i32], vec2: &[i32]) -> i32 { vec1.par_iter()
.zip(vec2) .map(|(e1, e2)| e1 * e2) .reduce(|| 0, |a, b| a + b) // aka .sum() } 3 2 1 12 0 4 5 1 2 1 3 2 1 0 1 3 4 0 3 6 7 8 vec1 vec2 sum 20 19 43 39 82
15 Parallel iterators: Mostly like normal iterators, but: • closures
cannot mutate shared state • some operations are different For the most part, Rust protects you from surprises.
16 Parallel Iterators join() threadpool
The primitive: join() 17 rayon::join(|| do_something(…), || do_something_else(…)); Meaning: maybe
execute two closures in parallel. Idea: - add `join` wherever parallelism is possible - let the library decide when it is profitable
18 fn load_images(paths: &[PathBuf]) -> Vec<Image> { paths.par_iter() .map(|path| Image::load(path))
.collect() } Image::load(paths[0]) Image::load(paths[1])
Work stealing 19 Cilk: http://supertech.lcs.mit.edu/cilk/ (0..22) Thread A Thread B
(0..15) (15..22) (1..15) (queue) (queue) (0..1) (15..22) (15..18) (18..22) (15..16) (16..18) “stolen” (18..22) “stolen”
20
21 Parallel Iterators join() threadpool Rayon: • Parallelize for fun
and profit • Variety of APIs available • Future directions: • more iterators • integrate SIMD, array ops • integrate persistent trees • factor out threadpool
22 Parallel Iterators join() scope() threadpool
23 the scope `s` task `t1` task `t2` rayon::scope(|s| {
… s.spawn(move |s| { // task t1 }); s.spawn(move |s| { // task t2 }); … });
rayon::scope(|s| { … s.spawn(move |s| { // task t1 s.spawn(move
|s| { // task t2 … }); … }); … }); 24 the scope task t1 task t2
`not_ok` is freed here 25 the scope task t1 let
ok: &[u32]s = &[…]; rayon::scope(|scope| { … let not_ok: &[u32] = &[…]; … scope.spawn(move |scope| { // which variables can t1 use? }); });
26 fn join<A,B>(a: A, b: B) where A: FnOnce() +
Send, B: FnOnce() + Send, { rayon::scope(|scope| { scope.spawn(move |_| a()); scope.spawn(move |_| b()); }); } (Real join avoids heap allocation)
27 struct Tree<T> { value: T, children: Vec<Tree<T>>, } impl<T>
Tree<T> { fn process_all(&mut self) { process_value(&mut self.value); for child in &mut self.children { child.process_all(); } } }
28 impl<T> Tree<T> { fn process_all(&mut self) where T: Send
{ rayon::scope(|scope| { for child in &mut self.children { scope.spawn(move |_| child.process_all()); } process_value(&mut self.value); }); } }
29 impl<T> Tree<T> { fn process_all(&mut self) where T: Send
{ rayon::scope(|scope| { let children = &mut self.children; scope.spawn(move |scope| { for child in &mut children { scope.spawn(move |_| child.process_all()); } }); process_value(&mut self.value); }); } }
30 impl<T: Send> Tree<T> { fn process_all(&mut self) { rayon::scope(|s|
self.process_in(s)); } fn process_in<‘s>(&’s mut self, scope: &Scope<‘s>) { let children = &mut self.children; scope.spawn(move |scope| { for child in &mut children { scope.spawn(move |scope| child.process_in(scope)); } }); process_value(&mut self.value); } }