Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pyruvate, a reasonably fast, non-blocking, mult...

Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

Slides from my talk at https://2020.ploneconf.org/
#ploneconf2020 #plone #ploneconf

Avatar for Thomas Schorr

Thomas Schorr

January 13, 2021
Tweet

More Decks by Thomas Schorr

Other Decks in Programming

Transcript

  1. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate,

    a reasonably fast, non-blocking, multithreaded WSGI server Thomas Schorr Plone Conference 2020
  2. WSGI Why Rust? Project Status Performance Demo Next steps PEP-3333:

    Python Web Server Gateway Interface def application(environ, start_response): """Simplest possible WSGI application""" status = '200 OK' response_headers = [ ('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello World!\n']
  3. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives
  4. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests
  5. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server
  6. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request
  7. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading
  8. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads
  9. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing
  10. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ...
  11. WSGI Why Rust? Project Status Performance Demo Next steps The

    Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ... • The WSGI server can give hints through environ dictionary
  12. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request
  13. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches
  14. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe
  15. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive
  16. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope
  17. WSGI Why Rust? Project Status Performance Demo Next steps The

    Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope • recipe for disaster: choose a WSGI server with an inappropriate worker model
  18. WSGI Why Rust? Project Status Performance Demo Next steps Consequence:

    Limited Choice of WSGI servers suitable for Zope/Plone. • waitress (the default) with very good overall performance • bjoern: fast, non-blocking, single threaded • ...
  19. WSGI Why Rust? Project Status Performance Demo Next steps More

    options please Wishlist: • multithreaded, 1:1 threading, workerpool • PasteDeploy entry point • handle the Zope/Plone use case • non-blocking • File wrapper supporting sendfile • competitive performance Non Goals • Python 2 • ASGI (not yet at least) • Windows
  20. WSGI Why Rust? Project Status Performance Demo Next steps Why

    Rust? Naive expectations: • Faster than Python • Easier to use than C
  21. WSGI Why Rust? Project Status Performance Demo Next steps Performance

    Performance Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. - https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf .
  22. WSGI Why Rust? Project Status Performance Demo Next steps Memory

    Management through Ownership • feature unique to Rust • a set of rules that the compiler checks at compile time (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) • Each value in Rust has a variable that’s called it’s owner. • There can be only one owner at a time. • When the owner goes out of scope, the value will be dropped. • Drop is a trait; there’s a default implementation that you can override • You can still control where (stack or heap) your data is stored.
  23. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
  24. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF)
  25. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF)
  26. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps
  27. WSGI Why Rust? Project Status Performance Demo Next steps How

    is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps • still possible to create more references than needed
  28. WSGI Why Rust? Project Status Performance Demo Next steps Other

    Rust features • strict typing will find many problems at compile time • Pattern matching • very good documentation, helpful compiler messages
  29. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI:
  30. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate
  31. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module:
  32. WSGI Why Rust? Project Status Performance Demo Next steps What

    is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module: import pyruvate def application(environ, start_response): """WSGI application""" ... pyruvate.serve(application, '0.0.0.0:7878', 3)
  33. WSGI Why Rust? Project Status Performance Demo Next steps Using

    Pyruvate with Zope/Plone with plone.recipe.zope2instance: • buildout.cfg [instance] recipe = plone.recipe.zope2instance http-address = 127.0.0.1:8080 eggs = Plone pyruvate wsgi-ini-template = ${buildout:directory}/ templates/pyruvate.ini.in • pyruvate.ini.in Template [server:main] use = egg:pyruvate#main socket = %(http_address)s workers = 2
  34. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib
  35. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder
  36. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies
  37. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point
  38. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518)
  39. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules)
  40. WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate

    project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules) • __init__.py in pyruvate folder • Paste Deploy entry point • FileWrapper import
  41. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy
  42. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test
  43. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io
  44. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox
  45. WSGI Why Rust? Project Status Performance Demo Next steps Gitlab

    Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox • build wheels
  46. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions
  47. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout)
  48. WSGI Why Rust? Project Status Performance Demo Next steps Binary

    packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout) • wanted: MacOS
  49. WSGI Why Rust? Project Status Performance Demo Next steps Features

    • rust-cpython based Python interface (https://github.com/dgrunwald/rust-cpython) • Nonblocking IO using mio (https://github.com/tokio-rs/mio) • Nonblocking read • blocking or nonblocking write • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1 threading • PasteDeploy entry point • integrates with Python logging • asynchronous logging -> no need to hold the GIL when creating the log message • logging configuration in wsgi.ini • TCP or Unix Domain sockets • supports systemd socket activation
  50. WSGI Why Rust? Project Status Performance Demo Next steps Performance

    Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0 • number of requests/amount of data transferred per unit of time • Testing and eventually improving it
  51. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive
  52. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking?
  53. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box
  54. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker?
  55. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html)
  56. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase()
  57. WSGI Why Rust? Project Status Performance Demo Next steps Approach

    • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase() • load testing with siege and ab
  58. WSGI Why Rust? Project Status Performance Demo Next steps Performance:

    Design considerations • Python Global Interpreter Lock: Python code can only run when holding the GIL • Multiple worker threads need to acquire the GIL in turn • acquire GIL only for application execution • drop GIL when doing IO • more than one possible way to do this • IO event polling • abstraction: mio Poll instance • accepted connections are registered for read events with a Poll instance in the main thread • completely read requests + connection are passed to the worker pool • iterate over WSGI response chunks (needs GIL) • blocking write: loop until response is completely written • non-blocking write: • write until EAGAIN • register connection for write events with per worker Poll instance • drop GIL, stash response
  59. WSGI Why Rust? Project Status Performance Demo Next steps Performance:

    current status • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit) • faster than waitress on a Hello world WSGI application • faster that waitress on / (looking at https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for- recommendations) • but slower on /Plone • more performance testing needed
  60. WSGI Why Rust? Project Status Performance Demo Next steps Release

    1.0 • Planned for end of this year • Reuse connections (keep-alive + chunked transport) • Branch on Gitlab, needs some work • MacOS support wanted • optimize pipeline • use a kcov binary package • async logging: thread ID • More testing + bugfixing
  61. WSGI Why Rust? Project Status Performance Demo Next steps Thanks

    for your attention • Thomas Schorr • [email protected] • https://gitlab.com/tschorr/pyruvate • https://pypi.org/project/pyruvate