Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Concurrency in Python

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Concurrency in Python

It is mainly about the multithreading and the multiprocessing in Python, and *in Python's flavor*.

It's also the share at Taipei.py [1].

[1] http://www.meetup.com/Taipei-py/events/220452029/

Avatar for Mosky Liu

Mosky Liu

March 26, 2015
Tweet

More Decks by Mosky Liu

Other Decks in Programming

Transcript

  1. MULTITHREADING • GIL • Only one thread runs at any

    given time. • It still can improves IO-bound problems. 6
  2. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. 7
  3. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. • Note the initial cost. 7
  4. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … 8
  5. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. 8
  6. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. • Then it will be easy. 8
  7. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) 9
  8. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. 9
  9. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. • Which is the hard part. 9
  10. PRODUCER-CONSUMER PATTERN • A queue • Producers → A queue

    • A queue → Consumers • Python has built-in Queue module for it. 12
  11. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. 14
  12. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. • It’s implemented by threading.Condition. 14
  13. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter THE THREADING MODULE
  14. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter • Semaphore — lock when counter goes zero THE THREADING MODULE
  15. 16

  16. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition 16
  17. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition • with lock: … 16
  18. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. DAEMONIC THREAD
  19. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. DAEMONIC THREAD
  20. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. • Others keep running until return. DAEMONIC THREAD
  21. SO, HOW TO STOP? • Set demon and let Python

    clean it up. • Let it return. 20
  22. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. 23
  23. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. 23
  24. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. 23
  25. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. • It’s Python. 23
  26. BROADCAST SIGNAL 
 TO SUB-PROCESS • Just broadcast the signal

    to sub-processes. • Start with register signal handler:
 signal(SIGINT, _handle_to_term_signal) 24
  27. 25

  28. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) 25
  29. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) 25
  30. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) • Broadcast:
 killpg(pgid, signum) 25
  31. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) 27
  32. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. 27
  33. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” 27
  34. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” • And fix some stupid behavior.
 (I meant atexit with multiprocessing.Pool.) 27
  35. COLLECT RESULT SMARTER • Put into a safe queue. •

    Use a thread per instance. • Learn “let it go”. 28
  36. MONITOR THEM • No one is a master at first.

    • Don’t guess. • Just use a function to print log. 30
  37. BENCHMARK THEM • No one is a master at first.

    • Don’t guess. • Just prove it. 31
  38. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. 32
  39. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. 32
  40. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. 32
  41. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. • Monitor and benchmark your code. 32