hot Memory wall Memory buses are not fast enough to handle these increase clock speeds ILP wall (Instruction level parallelism) Instruction pipeline really means digging deeper power hole 17 Dr. David Patterson, Berkeley
together, they mean. Computers will stop getting faster Furthermore, if an engineer optimizes one wall he aggravates the other two. Solution - Multi core processors 18
to complete particular program on hardware Throughput - Number of operations per second Utilisation - Utilising best use of multicore systems Speedup - (Specific to parallel programming, your algorithm runs faster on parallel hardware) Power Consumption (No one cares) 19
Preemptive - If the active process or task or thread can be temporarily suspended to execute a more important process or task or thread Non-preemptive - If the active process or task or thread cannot be suspended i.e. runs to completion 20
program (preemptive) Operating system does it for you Commonly used Cons Context switching / Scheduling overhead Deadlocks / Race conditions Synchronization / Locking issues 26
// multiple readers can enter this section // if not locked for writing, and not writers waiting // to lock for writing. readWriteLock.readLock().unlock(); readWriteLock.writeLock().lock(); // only one writer can enter this section, // and only if no threads are currently reading. readWriteLock.writeLock().unlock();
static final int MAX_AVAILABLE = 100; private final Semaphore available = new Semaphore(MAX_AVAILABLE, true); public Object getItem() throws InterruptedException { available.acquire(); return getNextAvailableItem(); } public void putItem(Object x) { if (markAsUnused(x)) available.release(); } ...
CountDownLatch(3); Waiter waiter = new Waiter(latch); Decrementer decrementer = new Decrementer(latch); new Thread(waiter) .start(); new Thread(decrementer).start(); Thread.sleep(4000);
progressing through some algorithm. It is a barrier that all threads must wait at, until all threads reach it, before any of the threads can continue. 43
ExchangerRunnable exchangerRunnable1 = new ExchangerRunnable(exchanger, "A"); ExchangerRunnable exchangerRunnable2 = new ExchangerRunnable(exchanger, "B"); new Thread(exchangerRunnable1).start(); new Thread(exchangerRunnable2).start();
new AtomicInteger(123); int expectedValue = 123; int newValue = 234; atomicInteger.compareAndSet(expectedValue, newValue); System.out.println(atomicInteger.getAndAdd(10)); System.out.println(atomicInteger.addAndGet(10));
new AtomicReference(); String initialReference = "the initially referenced string"; AtomicReference atomicReference = new AtomicReference(initialReference);
new int[10]; ints[5] = 123; AtomicIntegerArray array = new AtomicIntegerArray(ints); int value = array.get(5); array.set(5, 999); boolean swapped = array.compareAndSet(5, 999, 123); int newValue = array.addAndGet(5, 3); int oldValue = array.getAndAdd(5, 3); int newValue = array.incrementAndGet(5) int oldValue = array.getAndIncrement(5); int newValue = array.decrementAndGet(5); int oldValue = array.getAndDecrement(5);
the task, it can be passed to a thread pool ThreadPool reuses threads Internally it uses BlockingQueue for threads ExecutorService is a thread pool implementation 66
Similar to ExecutorService but one difference Makes it easy for tasks To split their work up into smaller tasks Tasks are then submitted to the ForkJoinPool too. Uses work stealing algorithm Next level of ExecutorService 69
Result solve(Problem problem) { if (problem is small) directly solve problem else { split problem into independent parts fork new subtasks to solve each part join all subtasks compose result from subresults } }
by programmers Provides very low level APIs to handle concurrency Fine grained concurrency with more power java.util.concurrent is robust. Still there are chances of bugs. 72
program processing using multiple processors. The speedup limited by the time needed for the sequential fraction of the program If N is the number of processors, s is the time spent by a processor on serial part of a program, and p is the time spent by a processor on a parallel part of a program, then the maximum possible speedup is given by: 1 / (s+p/N) Synchronization & communication overhead 75
other threads have not concurrently made changes to memory that it accessed in the past. This final operation, in which the changes of a transaction are validated and, if validation is successful, made permanent, is called a commit…” 77
coordinating multiple concurrent modifications to a shared set of storage locations. Similar to garbage collection has displaced the need for manual memory management. Characterized as providing the same kind of systematic simplification of another error-prone programming practice - manual lock management 78
72, {1, 2, 3}, “cat”, [4, 5, 7] (def sarah {:name "Sarah" :age 25 :wears-glasses? false}) An identify is an entity that has state State is a value at a point in time Value is something that doesn’t change e.g. Set of my favourite foods. I will prefer different foods in future, that will be a different set 80
state OOP, complects identity and state No way to obtain state independent of identity without copying There is no way to associate the identity’s state with a different value other than in-place memory mutation. Objects not treated as values Clojure says, OO doesn’t have to be in this way 81
old values An identity’s state at any point in time is an immutable value Values can always be observed and new values calculated from old values without coordination Values will never change “in hand” Values can be safely shared between threads 83
Defines mutable data structures which are concurrency aware Models Identity by way of references to immutable data Dereferencing a reference type gives you its (immutable) value Changes to an identity’s state are controlled / coordinated by the system 84
in concert with Clojure’s persistent data structures to separate identify from state. Allowing accessing mutable variables from multiple threads without dangers of deadlock or race conditions In all cases the program will see stable views of the values in the world 85
updates pure Refs Synchronised, Coordinated updates pure Agents Asynchronous, Independent updates any Vars Thread-local updates any Clojure Reference Containers
reference type. concurrency aware. Agents work in concert with persistent data structures to maintain the separation of identity and state. Changes to an agent’s state are independent of changes to other agents’ states, and that all such changes are made away from the thread of execution that schedules them. I/O and other side-effecting functions may be safely used in conjunction with agents. Agents are STM-aware, so that they may be safely used in the context of retrying transactions 93
can be retrieved with deref. Actor encapsulates state and provides no direct means to accept it. Agent does not encapsulates behaviour. Function is provided by sender Actor’s can encapsulate behaviour Agent’s error reporting is more primitive. Actor’s error detection and recovery is sophisticated Agents provide no support for distribution Actors can be remote Composing agents cannot deadlock Composing actors can deadlock
is perfectly safe to use agents to coordinate I/O and perform other blocking operations. Use their own thread pool send For fixed thread pool. Never used for actions that might perform I/O or other blocking operations send-off for growing thread pool Ideal for guarding a contested resource. Blocking IO. 96
to multiple values concurrently No possibility of the Involved refs ever being in an observable inconsistent state Race conditions among the involved refs. Deadlocks. No manual use of locks, monitors, or other low-level synchronization primitive 97
multiple states Operations are atomic, consistent and isolated from other transactions Manual transaction control (dosync) Operations alter, ref-set, commute, ensure 98
No thread needs to wait to access a resource Smaller scope that needs synchronizing - modifying disjoint parts of a system of a data structure Cons Aborting transactions Places limitations on the behaviour of transactions - they cannot perform any operation that cannot be undone, including most I/O 102
that is shared with potentially blocking agent actions. This pooling of resources can make futures more efficient than creating native threads as needed. Using future is much more concise than setting up and starting a native thread. Clojure futures (the value returned by future) are instances of java.util.concurrent.Future, which can make it easier to interoperate with Java APIs that expect them. Dereferencing a future blocks until value is available 103
of code, Evaluating only upon demand, when it is dereferenced Only evaluate their body of code once Caches the return value. 105 (def d (delay (println "Running...") :done!)) #'user/d (deref d) Running... :done! @d :done! (realized? d) true
or future They block when you dereference them if they don’t have value until they do You don’t immediately give them value, but provide them one by calling deliver 106 (def result (promise)) (future (println "The result is: " @result)) (Thread/sleep 2000) (deliver result 42) "The result is: 42”
to a single value Real world needs names that refer to different values at different points in time With vars you can override the value Concurrency aware Limitation We can override their value only within the scope of a particular function call, and nowhere else. 107
asynchronous I/O to yield to the next waiting process.) Slow network operations event loops are preferred You can serve more number of connections You don’t have to spend a process/thread for a connection libuv abstracts best of platforms in Node.js libuv internally uses thread pool. (e.g. File IO) 111
requests, but if you have long running request, it falls on its face. Pros Avoid polling. CPU bound vs IO bound Scales well vs spawning many threads No deadlocks Cons You block the even loop, all goes bad Program flow is “spaghettish” Callback hell Hard to debug, you lose the stack 114
programming Coroutines allow asynchronous code can be paused and resumed functions which cooperatively multitasks with other function Program puts coroutine on hold for async and control goes to event loop 116
construct Event loop is a scheduler for coroutine Coroutines are best of the both the worlds, They can wait as callbacks do. They yield to each other so less chance of race conditions Coroutines are extremely lightweight and never preempted Coroutines wait on socket and if something happens on socket they become runnable. Compared to multi-threaded, co-routines can run sequence of operations in a single function. Exception handling works too Limitation Coroutines execute on single thread only 117
model treats “Actors” as the universal primitives of concurrent digital computation Actors communicate with messages In response to a message that it receives, Actor can Make local decisions Create more Actors Send more messages Determine how to respond to the next message received 120
for specifying and verifying the concurrent aspects of variety of different systems Processes - No threads. No shared memory. Fixed number of processes. Channels - Communication is synchronous (Unlike Actor model) Influences on design Go, Limbo 123
Scheduler Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process Process OS Thread OS Thread OS Process BEAM OS Thread OS Thread
GC No stop-the world garbage collection (like Ruby/Java) Much easier for process scheduler to schedule Very little overhead Cheap to create thousands of Elixir processes Concurrency is not difficult in Erlang / Elixir and in built (not conventional) Isolated crashes. Process crash doesn’t affect other processes. 132
Postal service) Processes are actors Every actor has its address Actors communicate via messages asynchronously “Do not communicate by sharing memory; instead, share memory by communicating” 133
following 3 things Create more actors Send messages to other actors whose addressees it knows Define how it reacts to a message Actors may receive messages in random order 134
loop do receive do {:greet, name} -> IO.puts("Hello #{name}") {:praise, name} -> IO.puts("#{name}, you're amazing") {:celebrate, name, age} -> IO.puts("Here's to another #{age} years, #{name}") end loop end end pid = spawn(&Talker.loop/0) send(pid, {:greet, "Huey"}) send(pid, {:praise, "Dewey"}) send(pid, {:celebrate, "Louie", 16}) # Outputs Hello Huey Dewey, you're amazing Here's to another 16 years, Louie
model Send & Receive may block (synchronous) Only receive blocks Messages are delivered when they are sent No guarantee of delivery of messages Synchronous Send message and forget Works on one machine Work on multiple machines (Distributed by default) Lacks fault tolerance Fault tolerance
No shared state (avoid locks, easier to scale) Easier to maintain code. Declarative Cons When shared state is required, doesn’t fit well Handling of very big messages, or a lot of messages Messaging is essentially a copy of data 137
terms of hierarchies of applications. Agents Background process that maintain state State can be accessed at different places within a process or node, or across multiple nodes. Good at dealing with very-specific background activities, 138
VM Can be connected to there Nodes Remote execution Supervisors Has one purpose. It manages one or more worker processes. Uses process-linking and monitoring facilities of Erlang VM Heart of Reliability 139
goroutines Lightweight processes Runs concurrently and parallel Communicate using channels Channels is something you create and pass around as object Read and write operations on channels block e.g. send from process A will not complete unless process B receives 142
multiple channels we use select statement select allows you to listen for multiple channels There is default case and timeout case. e.g. for network IO if nothing happens, a timeout will happen. timeout case is important (in a highly concurrent system with network calls) 145
c2 := make(chan string) go func() { time.Sleep(time.Second * 1) c1 <- "one" }() go func() { time.Sleep(time.Second * 2) c2 <- "two" }() for i := 0; i < 2; i++ { select { case msg1 := <-c1: fmt.Println("received", msg1) case msg2 := <-c2: fmt.Println("received", msg2) } } $ time go run select.go received one received two real 0m2.245s Output
are scheduled by user space runtime in Golang start with small stack and can be efficiently grown can be multiplexed on operating system thread has very well defined scheduling order for IO, scheduler will preempt it, it will make resume goroutine once IO is complete. Java/Ruby lack the concept of goroutine or lightweight thread or select (important for CSP). Without concept of cheap threads to communicate between channels. It won’t be possible to have go like concurrency. In Java/Ruby, it would be heavy weight thread, Not as performant. 147
that uses cooperative multitasking instead of preemptive multitasking. A running fiber must explicitly "yield" to allow another fiber to run, which makes their implementation much easier than kernel or user threads. A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well-designed multithreaded application.However, using fibers can make it easier to port applications that were designed to schedule their own threads. 149
1 Fiber 2 Blocking IO (red) CPU Time (orange) Co-operative: 60ms Blocking IO (red) CPU Time (orange) 10 quanta * 10 ms = 100 ms Cooperative - Handling execution rights between one and another, saving local state
much easier to understand and implement No need for locks (cooperative scheduling) Scales vertically (add more CPU power) Cons Single thread: Harder to paralleize/scale horizontally (Use more cases, add more nodes) Constrained to have all components work together symbiotically Fibers are more of a code organization pattern than a way of doing concurrent tasks. 151
(aka GIL - Global Interpreter Lock) What happens with GVL? With GVL, only one thread executes a time Thread must request a lock If lock is available, it is acquired If not, the thread blocks and waits for the lock to become available Ruby’s runtime guarantees thread safety. But it makes no guarantees about your code. 153
of GVL You can still write performant concurrent (as good as Java, Node.js) in a Ruby app if it does only heavy IO Multithreaded CPU-bound requests GVL is still issue. Ruby is fast enough for IO (network) heavy applications (In most cases) 154
harder to corrupt data) Avoids race conditions C extensions It makes C extensions development easier Most C libraries are not thread safe Parts of Ruby’s implementation aren’t thread safe (Hash for instance) 155
If parent dies before children have exited, children can become zombie processes All threads die when the process dies (no chance of zombies) More expensive for forked processes to switch context since OS needs to save and reload everything Threads have considerably less overhead since they share address space and memory Forked processes are given a new virtual memory space (process isolation) Threads share the same memory, so need to control and deal with concurrent memory issues Requires inter-process communication Can "communicate" via queues and shared memory Slower to create and destroy Faster to create and destroy Easier to code and debug Can be significantly more complex to code and debug Ruby - Multiprocess vs. Multithreading
didn't not have actor model Ruby didn’t have STM Ruby didn’t have better concurrency abstractions. Ruby has concurrent-ruby gem now concurrent-ruby gem provides concurrency aware abstractions (Inspired from other languages) 161
that provides simple asynchronous behavior to a class. Loosely based on Erlang's gen_server. Future - An asynchronous operation that produces a value. Promise - Similar to Futures, with more features. ScheduledTask - Like a Future scheduled for a specific future time. TimerTask - A Thread that periodically wakes up to perform work at regular intervals. 165
A thread-safe subclass of Ruby's standard Array. Hash - A thread-safe subclass of Ruby's standard Hash. Map - A hash-like object that should have much better performance characteristic Tuple - A fixed size array with volatile (synchronized, thread safe) getters/setters. 166
shared, mutable, asynchronous, independent, state. Based on Clojure's Agent. Atom - A way to manage shared, mutable, synchronous, independent state. Based on Clojure's Atom. AtomicBoolean - A boolean value that can be updated atomically. AtomicFixnum - A numeric value that can be updated atomically. AtomicReference - An object reference that may be updated atomically. Exchanger - A synchronization point at which threads can pair and swap elements within pairs. Based on Java's Exchanger. Java-inspired ThreadPools and Other Executors 168
synchronization object that allows one thread to wait on multiple other threads. CyclicBarrier - A synchronization aid that allows a set of threads to all wait for each other to reach a common barrier point. Event - Old school kernel-style event. ReadWriteLock - A lock that supports multiple readers but only one writer. ReentrantReadWriteLock - A read/write lock with reentrant and upgrade features. Semaphore - A counting-based locking mechanism that uses permits. 169
where concurrent actors exchange messages. New Future Framework Channel: Communicating Sequential Processes (CSP) - Functionally equivalent to Go channels with additional inspiration from Clojure core.async. 170