=⇒ no parallelism CPython Global Interpreter Lock =⇒ no parallelism Jython synchronized =⇒ slow single-threaded, no scaling JRuby No synchronization =⇒ unsafe with multiple threads Nashorn No synchronization =⇒ unsafe with multiple threads 3 / 33
append.rb 100000 JRuby, on the JVM with concurrent threads: jruby append.rb 64324 # If you are not lucky ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users << at org/jruby/RubyArray.java:1256 block at append.rb:8 zsh: exit 1 5 / 33
truffleruby append.rb 77148 # If you are not lucky append.rb:8:in ’<<’: 1338 (RubyTruffleError) ArrayIndexOutOfBoundsException IntegerArrayMirror.set from append.rb:8:in ’block (2 levels) in <main>’ zsh: exit 1 6 / 33
Enables a programming style that does not require so many upfront decisions (e.g.: choosing a collection implementation) Use them for both single-threaded and multi-threaded workloads The collections should be efficient The collections should scale when used concurrently 9 / 33
accessed concurrently Expensive to track exactly, so we make an over-approximation: track all objects which can be accessed concurrently, based on reachability 12 / 33
sharing the same way Shared collections use a write barrier when adding an element to the collection shared_array[3] = Object.new shared_hash["foo"] = "bar" Collections can change their representation when shared 15 / 33
is better 0.9 1.0 1.1 1.2 1.3 1.4 Bounce List Mandelbrot NBody Permute Queens Sieve Storage Towers DeltaBlue Json Richards Benchmark TruffleRuby TruffleRuby with Concurrent Collections No difference because these benchmarks do not use shared collections. Benchmarks from Cross-Language Compiler Benchmarking: Are We Fast Yet? S. Marr, B. Daloze, H. Mössenböck, DLS’16. 16 / 33
int[] long[] Object[] double[] storage strategies concurrent strategies store int store double store long store Object store Object store Object SharedDynamicStorage empty int[] long[] Object[] double[] SharedFixedStorage int[] internal storage change: <<, delete, etc storage transition on sharing 21 / 33
change =⇒ Array size and type of the elements fits the storage If so, the Array can be accessed without any synchronization, in parallel and without any overhead (except the write barrier) 22 / 33
storage? $array = [1, 2, 3] # SharedFixedStorage # All of these migrate to SharedDynamicStorage $array[1] = Object.new $array << 4 $array.delete_at(1) We use a Guest-Language Safepoint to migrate to SharedDynamicStorage 23 / 33
scalability when writing on different parts of the Array, an exclusive lock or a read-write lock is not enough. We use a Layout Lock: read, writes and layout changes Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications. N. Cohen, A. Tal, E. Petrank, PPoPP’17 24 / 33
thread has its own section of the Array With 6 different synchronization mechanisms: SharedFixedStorage: no synchronization ReentrantLock, Synchronized: from Java StampedLock: a read-write lock LayoutLock: scalable reads and writes and layout changes LightweightLayoutLock: an improvement over LayoutLock 26 / 33
8 12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock LayoutLock ReentrantLock StampedLock Synchronized Appends are considered layout changes and use the exclusive lock. Throughput in millions of appends per second. 29 / 33
28 32 12 4 8 12 16 202224 28 32 36 40 44 Threads Scalability relative to Local SharedFixedStorage Local Parallelized by distributing 64 groups of rows between threads dynamically using a global queue. 31 / 33
12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock LayoutLock Local ReentrantLock StampedLock 80% lookups, 10% puts, 10% removes over a range of 65536 keys Throughput in millions of operations per second. 32 / 33
and yet as efficient as unsynchronized collections We can make Array and Hash scale up to 44 cores linearly with SharedFixedStorage and the Lightweight Layout Lock We enable parallel programming with the existing built-in collections, not requiring upfront decisions by the programmer (e.g.: choosing a collection implementation based on concurrency or usage) 33 / 33