Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pyruvate, a reasonably fast, non-blocking, mult...

Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

Slides from my talk at https://2020.ploneconf.org/
#ploneconf2020 #plone #ploneconf

Thomas Schorr

January 13, 2021
Tweet

More Decks by Thomas Schorr

Other Decks in Programming

Transcript

  1. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate,

    a reasonably fast, non-blocking, multithreaded WSGI server Thomas Schorr Plone Conference 2020
  2. WSGI Why Rust? Project Status Performance Demo Next steps PEP-3333:

    Python Web Server Gateway Interface def application(environ, start_response): """Simplest possible WSGI application""" status = '200 OK' response_headers = [ ('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello World!\n']
  3. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives
  4. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests
  5. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server
  6. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request
  7. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading
  8. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads
  9. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing
  10. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ...
  11. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ... • The WSGI server can give hints through environ dictionary
  12. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request
  13. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches
  14. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe
  15. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive
  16. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope
  17. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope • recipe for disaster: choose a WSGI server with an inappropriate worker model
  18. WSGI Why Rust? Project Status Performance Demo Next steps Consequence:

    Limited Choice of WSGI servers suitable for Zope/Plone. • waitress (the default) with very good overall performance • bjoern: fast, non-blocking, single threaded • ...
  19. WSGI Why Rust? Project Status Performance Demo Next steps More

    options please Wishlist: • multithreaded, 1:1 threading, workerpool • PasteDeploy entry point • handle the Zope/Plone use case • non-blocking • File wrapper supporting sendfile • competitive performance Non Goals • Python 2 • ASGI (not yet at least) • Windows
  20. WSGI Why Rust? Project Status Performance Demo Next steps Why

    Rust? Naive expectations: • Faster than Python • Easier to use than C
  21. WSGI Why Rust? Project Status Performance Demo Next steps Performance

    Performance Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. - https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf .
  22. WSGI Why Rust? Project Status Performance Demo Next steps Memory

    Management through Ownership • feature unique to Rust • a set of rules that the compiler checks at compile time (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) • Each value in Rust has a variable that’s called it’s owner. • There can be only one owner at a time. • When the owner goes out of scope, the value will be dropped. • Drop is a trait; there’s a default implementation that you can override • You can still control where (stack or heap) your data is stored.
  23. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
  24. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF)
  25. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF)
  26. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps
  27. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps • still possible to create more references than needed
  28. WSGI Why Rust? Project Status Performance Demo Next steps Other

    Rust features • strict typing will find many problems at compile time • Pattern matching • very good documentation, helpful compiler messages
  29. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI:
  30. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate
  31. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module:
  32. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module: import pyruvate def application(environ, start_response): """WSGI application""" ... pyruvate.serve(application, '0.0.0.0:7878', 3)
  33. WSGI Why Rust? Project Status Performance Demo Next steps Using

    Pyruvate with Zope/Plone with plone.recipe.zope2instance: • buildout.cfg [instance] recipe = plone.recipe.zope2instance http-address = 127.0.0.1:8080 eggs = Plone pyruvate wsgi-ini-template = ${buildout:directory}/ templates/pyruvate.ini.in • pyruvate.ini.in Template [server:main] use = egg:pyruvate#main socket = %(http_address)s workers = 2
  34. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib
  35. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder
  36. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies
  37. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point
  38. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518)
  39. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules)
  40. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules) • __init__.py in pyruvate folder • Paste Deploy entry point • FileWrapper import
  41. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy
  42. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test
  43. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io
  44. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox
  45. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox • build wheels
  46. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions
  47. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout)
  48. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout) • wanted: MacOS
  49. WSGI Why Rust? Project Status Performance Demo Next steps Features

    • rust-cpython based Python interface (https://github.com/dgrunwald/rust-cpython) • Nonblocking IO using mio (https://github.com/tokio-rs/mio) • Nonblocking read • blocking or nonblocking write • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1 threading • PasteDeploy entry point • integrates with Python logging • asynchronous logging -> no need to hold the GIL when creating the log message • logging configuration in wsgi.ini • TCP or Unix Domain sockets • supports systemd socket activation
  50. WSGI Why Rust? Project Status Performance Demo Next steps Performance

    Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0 • number of requests/amount of data transferred per unit of time • Testing and eventually improving it
  51. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive
  52. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking?
  53. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box
  54. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker?
  55. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html)
  56. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase()
  57. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase() • load testing with siege and ab
  58. WSGI Why Rust? Project Status Performance Demo Next steps Performance:

    Design considerations • Python Global Interpreter Lock: Python code can only run when holding the GIL • Multiple worker threads need to acquire the GIL in turn • acquire GIL only for application execution • drop GIL when doing IO • more than one possible way to do this • IO event polling • abstraction: mio Poll instance • accepted connections are registered for read events with a Poll instance in the main thread • completely read requests + connection are passed to the worker pool • iterate over WSGI response chunks (needs GIL) • blocking write: loop until response is completely written • non-blocking write: • write until EAGAIN • register connection for write events with per worker Poll instance • drop GIL, stash response
  59. WSGI Why Rust? Project Status Performance Demo Next steps Performance:

    current status • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit) • faster than waitress on a Hello world WSGI application • faster that waitress on / (looking at https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for- recommendations) • but slower on /Plone • more performance testing needed
  60. WSGI Why Rust? Project Status Performance Demo Next steps Release

    1.0 • Planned for end of this year • Reuse connections (keep-alive + chunked transport) • Branch on Gitlab, needs some work • MacOS support wanted • optimize pipeline • use a kcov binary package • async logging: thread ID • More testing + bugfixing
  61. WSGI Why Rust? Project Status Performance Demo Next steps Thanks

    for your attention • Thomas Schorr • [email protected] • https://gitlab.com/tschorr/pyruvate • https://pypi.org/project/pyruvate