Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ...
Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ... • The WSGI server can give hints through environ dictionary
Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive
Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope
Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope • recipe for disaster: choose a WSGI server with an inappropriate worker model
Limited Choice of WSGI servers suitable for Zope/Plone. • waitress (the default) with very good overall performance • bjoern: fast, non-blocking, single threaded • ...
Performance Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. - https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf .
Management through Ownership • feature unique to Rust • a set of rules that the compiler checks at compile time (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) • Each value in Rust has a variable that’s called it’s owner. • There can be only one owner at a time. • When the owner goes out of scope, the value will be dropped. • Drop is a trait; there’s a default implementation that you can override • You can still control where (stack or heap) your data is stored.
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF)
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF)
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps • still possible to create more references than needed
is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module: import pyruvate def application(environ, start_response): """WSGI application""" ... pyruvate.serve(application, '0.0.0.0:7878', 3)
project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point
project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518)
Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox
packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions
packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout)
packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout) • wanted: MacOS
• rust-cpython based Python interface (https://github.com/dgrunwald/rust-cpython) • Nonblocking IO using mio (https://github.com/tokio-rs/mio) • Nonblocking read • blocking or nonblocking write • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1 threading • PasteDeploy entry point • integrates with Python logging • asynchronous logging -> no need to hold the GIL when creating the log message • logging configuration in wsgi.ini • TCP or Unix Domain sockets • supports systemd socket activation
Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0 • number of requests/amount of data transferred per unit of time • Testing and eventually improving it
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking?
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker?
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html)
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase()
• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase() • load testing with siege and ab
Design considerations • Python Global Interpreter Lock: Python code can only run when holding the GIL • Multiple worker threads need to acquire the GIL in turn • acquire GIL only for application execution • drop GIL when doing IO • more than one possible way to do this • IO event polling • abstraction: mio Poll instance • accepted connections are registered for read events with a Poll instance in the main thread • completely read requests + connection are passed to the worker pool • iterate over WSGI response chunks (needs GIL) • blocking write: loop until response is completely written • non-blocking write: • write until EAGAIN • register connection for write events with per worker Poll instance • drop GIL, stash response
current status • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit) • faster than waitress on a Hello world WSGI application • faster that waitress on / (looking at https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for- recommendations) • but slower on /Plone • more performance testing needed
1.0 • Planned for end of this year • Reuse connections (keep-alive + chunked transport) • Branch on Gitlab, needs some work • MacOS support wanted • optimize pipeline • use a kcov binary package • async logging: thread ID • More testing + bugfixing