Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate,
a reasonably fast, non-blocking, multithreaded WSGI server Thomas Schorr Plone Conference 2020

WSGI Why Rust? Project Status Performance Demo Next steps PEP-3333:
Python Web Server Gateway Interface def application(environ, start_response): """Simplest possible WSGI application""" status = '200 OK' response_headers = [ ('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello World!\n']

WSGI Why Rust? Project Status Performance Demo Next steps The
Server Side • The server invokes the application callable once for each HTTP request it receives

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ...

Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ... • The WSGI server can give hints through environ dictionary

Application Side • often needs to connect to components that outlive the single request

Application Side • often needs to connect to components that outlive the single request • databases, caches

Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe

Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive

Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope

Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope • recipe for disaster: choose a WSGI server with an inappropriate worker model

WSGI Why Rust? Project Status Performance Demo Next steps Consequence:
Limited Choice of WSGI servers suitable for Zope/Plone. • waitress (the default) with very good overall performance • bjoern: fast, non-blocking, single threaded • ...

WSGI Why Rust? Project Status Performance Demo Next steps More
options please Wishlist: • multithreaded, 1:1 threading, workerpool • PasteDeploy entry point • handle the Zope/Plone use case • non-blocking • File wrapper supporting sendfile • competitive performance Non Goals • Python 2 • ASGI (not yet at least) • Windows

WSGI Why Rust? Project Status Performance Demo Next steps Why
Rust? Naive expectations: • Faster than Python • Easier to use than C

WSGI Why Rust? Project Status Performance Demo Next steps Performance
Performance Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. - https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf .

WSGI Why Rust? Project Status Performance Demo Next steps Memory
Management through Ownership • feature unique to Rust • a set of rules that the compiler checks at compile time (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) • Each value in Rust has a variable that’s called it’s owner. • There can be only one owner at a time. • When the owner goes out of scope, the value will be dropped. • Drop is a trait; there’s a default implementation that you can override • You can still control where (stack or heap) your data is stored.

WSGI Why Rust? Project Status Performance Demo Next steps How
is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps

is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF)

is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF)

is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps

is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps • still possible to create more references than needed

WSGI Why Rust? Project Status Performance Demo Next steps Other
Rust features • strict typing will find many problems at compile time • Pattern matching • very good documentation, helpful compiler messages

WSGI Why Rust? Project Status Performance Demo Next steps What
is Pyruvate from a user perspective • a package available from PyPI:

is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate

is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module:

is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module: import pyruvate def application(environ, start_response): """WSGI application""" ... pyruvate.serve(application, '0.0.0.0:7878', 3)

WSGI Why Rust? Project Status Performance Demo Next steps Using
Pyruvate with Zope/Plone with plone.recipe.zope2instance: • buildout.cfg [instance] recipe = plone.recipe.zope2instance http-address = 127.0.0.1:8080 eggs = Plone pyruvate wsgi-ini-template = ${buildout:directory}/ templates/pyruvate.ini.in • pyruvate.ini.in Template [server:main] use = egg:pyruvate#main socket = %(http_address)s workers = 2

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate
project structure • initially created with cargo new --lib

project structure • initially created with cargo new --lib • Rust sources in src folder

project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies

project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point

project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518)

project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules)

project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules) • __init__.py in pyruvate folder • Paste Deploy entry point • FileWrapper import

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab
Pipeline • Two stages: test + build

Pipeline • Two stages: test + build • Linting: rustfmt, clippy

Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test

Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io

Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox

Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox • build wheels

WSGI Why Rust? Project Status Performance Demo Next steps Binary
packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions

packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout)

packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout) • wanted: MacOS

WSGI Why Rust? Project Status Performance Demo Next steps Features
• rust-cpython based Python interface (https://github.com/dgrunwald/rust-cpython) • Nonblocking IO using mio (https://github.com/tokio-rs/mio) • Nonblocking read • blocking or nonblocking write • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1 threading • PasteDeploy entry point • integrates with Python logging • asynchronous logging -> no need to hold the GIL when creating the log message • logging configuration in wsgi.ini • TCP or Unix Domain sockets • supports systemd socket activation

WSGI Why Rust? Project Status Performance Demo Next steps Performance
Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0 • number of requests/amount of data transferred per unit of time • Testing and eventually improving it

WSGI Why Rust? Project Status Performance Demo Next steps Approach
• Static code analyis + refactoring

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking?

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker?

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html)

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase()

• Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase() • load testing with siege and ab

WSGI Why Rust? Project Status Performance Demo Next steps Performance:
Design considerations • Python Global Interpreter Lock: Python code can only run when holding the GIL • Multiple worker threads need to acquire the GIL in turn • acquire GIL only for application execution • drop GIL when doing IO • more than one possible way to do this • IO event polling • abstraction: mio Poll instance • accepted connections are registered for read events with a Poll instance in the main thread • completely read requests + connection are passed to the worker pool • iterate over WSGI response chunks (needs GIL) • blocking write: loop until response is completely written • non-blocking write: • write until EAGAIN • register connection for write events with per worker Poll instance • drop GIL, stash response

WSGI Why Rust? Project Status Performance Demo Next steps Performance:
current status • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit) • faster than waitress on a Hello world WSGI application • faster that waitress on / (looking at https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for- recommendations) • but slower on /Plone • more performance testing needed

WSGI Why Rust? Project Status Performance Demo Next steps Live
Demo

WSGI Why Rust? Project Status Performance Demo Next steps Release
1.0 • Planned for end of this year • Reuse connections (keep-alive + chunked transport) • Branch on Gitlab, needs some work • MacOS support wanted • optimize pipeline • use a kcov binary package • async logging: thread ID • More testing + bugfixing

WSGI Why Rust? Project Status Performance Demo Next steps Thanks
for your attention • Thomas Schorr • [email protected] • https://gitlab.com/tschorr/pyruvate • https://pypi.org/project/pyruvate

Pyruvate, a reasonably fast, non-blocking, mult...

Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

More Decks by Thomas Schorr

Other Decks in Programming

Featured

Transcript