Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Concurrent & Multicore OCaml: A deep dive

Concurrent & Multicore OCaml: A deep dive

Talk was given at Facebook, Menlo Park.

Avatar for KC Sivaramakrishnan

KC Sivaramakrishnan

January 28, 2016
Tweet

More Decks by KC Sivaramakrishnan

Other Decks in Programming

Transcript

  1. Concurrent & Multicore OCaml: A deep dive KC Sivaramakrishnan1 &

    Stephen Dolan1 Leo White2, Jeremy Yallop1,3, Armaël Guéneau4, Anil Madhavapeddy1,3 1 2 3 4
  2. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations
  3. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations Concurrency ∩ Parallelism ➔ Scalable Concurrency
  4. Concurrency ≠ Parallelism • Concurrency • Programming technique • Overlapped

    execution of processes • Parallelism • (Extreme) Performance hack • Simultaneous execution of computations Concurrency ∩ Parallelism ➔ Scalable Concurrency (Fibers) (Domains)
  5. Schedulers • Multiplexing fibers over domain(s) • Bake scheduler into

    the runtime system (GHC) • Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling
  6. Schedulers • Multiplexing fibers over domain(s) • Bake scheduler into

    the runtime system (GHC) • Allow programmers to describe schedulers! • Parallel search —> LIFO work-stealing • Web-server —> FIFO runqueue • Data parallel —> Gang scheduling • Algebraic Effects and Handlers
  7. • Programming and reasoning about computational effects in a pure

    setting. • Cf. Monads Algebraic effects & handlers
  8. • Programming and reasoning about computational effects in a pure

    setting. • Cf. Monads • Eff — http://www.eff-lang.org/ Algebraic effects & handlers
  9. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1
  10. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1
  11. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4
  12. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  13. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  14. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)
  15. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)
  16. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1) val r : int = 5
  17. Algebraic Effects: Example exception Foo of int let f ()

    = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1 val r : int = 4 effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1) val r : int = 5 fiber — lightweight stack
  18. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS Implementation
  19. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. Implementation
  20. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers Implementation
  21. • Fibers: Heap allocated, dynamically resized stacks • ~10s of

    bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers Implementation handle / continue handler sp call chain reference
  22. Implementation handle / continue handle / continue sp handler call

    chain reference • Fibers: Heap allocated, dynamically resized stacks • ~10s of bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers
  23. perform sp handle / continue Implementation handler call chain reference

    • Fibers: Heap allocated, dynamically resized stacks • ~10s of bytes • No unnecessary closure allocation costs unlike CPS • One-shot delimited continuations • Simplifies reasoning about resources - sockets, locks, etc. • Handlers —> Linked-list of fibers
  24. Native-code fibers — Vanilla OCaml start program C call OCaml

    callback C call OCaml callback C OCaml C OCaml C OCaml
  25. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C
  26. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C
  27. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C 1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks
  28. C system stack Native-code fibers — Effects OCaml heap OCaml

    start program C call handle OCaml callback C call C C 1. Stack overflow checks for OCaml functions • Simple static analysis eliminates many checks 2. FFI calls are more expensive due to stack switching • Specialise for calls which {allocate / pass arguments on stack / do neither}
  29. Performance : Vanilla OCaml 0 0.25 0.5 0.75 1 almabench

    alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample 4.02.2+effects 4.02.2+vanilla Normalised time (lower is better)
  30. Performance : Vanilla OCaml 0 0.25 0.5 0.75 1 almabench

    alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample 4.02.2+effects 4.02.2+vanilla Normalised time (lower is better) 4.02.2+effects ~5.4% slower
  31. Performance : Chameneos-Redux Time (S) 0 0.45 0.9 1.35 1.8

    Iterations (X100,000) 1 2 3 4 5 6 7 8 9 10 Lwt Concurrency Monad GHC Fibers
  32. Generator from Iterator1 [1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml let rec iter f =

    function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t
  33. Generator from Iterator1 [1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml (* val to_gen : 'a

    t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step () let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t
  34. Performance : Generator Time (S) 0 1 2 3 4

    Binary tree depth 15 16 17 18 19 20 21 22 23 24 25 Iterator Fiber Generator H/W Generator
  35. Javascript backend • js_of_ocaml • OCaml bytecode —> Javascript •

    js_of_ocaml compiler pass • Whole-program selective CPS transformation
  36. Javascript backend • js_of_ocaml • OCaml bytecode —> Javascript •

    js_of_ocaml compiler pass • Whole-program selective CPS transformation • Work-in-progress! • Runs “hello-effects-world”!