Checking your work: validating the kernel by building and testing in CI

Checking your work: Linux kernel testing and CI Scaling reliability
across the global upstream community David Vernet [email protected] Kernel Recipes 2022 – Paris, France

Agenda 01 Disclaimers 02 How kernel tests are written 03
How kernel tests are run 04 What can we improve? 05 Q & A 06 Bonus: how to write a kselftest

01 Disclaimers

1. I may be missing details of tools I’m not
aware of 2. Presentation was crafted in the middle of the night over the Atlantic 01 Disclaimers

02 How kernel tests are written

Pick your poison, there are a number of options •
kselftests (https://docs.kernel.org/dev-tools/kselftest.html) • KUnit (https://docs.kernel.org/dev-tools/kunit/index.html) • xfstests (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/) • Benchmarks (LKP @ https://github.com/intel/lkp-tests, Phoronix @ https://openbenchmarking.org/tests/pts) • Fuzzers (https://github.com/google/syzkaller) • Sanitizers (KASAN, kmemleak, …) • Linux Test Project (https://github.com/linux-test-project/ltp) • … 02 How kernel tests are written

What are kselftests? Testcases are instances of userspace programs Commonly
written in C, but need only be an executable file Located in tree at tools/testing/selftests 02 How kernel tests are written

06 How to write a kselftest

What are KUnit tests? Unit testing framework for testing individual
Linux kernel functions Compiled into the kernel by specifying kconfig options Testcases link directly against kernel symbols and kunit APIs, which are used to make assertions about expected return values of the kernel symbols 02 How kernel tests are written

What are xfstests? Filesystem regression test suite (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/) Tests are
categorized according to whether they’re global, shared between a subset of FSs, or specific to one FS Tests use common logic for bootstrapping block devices, etc Located in a separate repository 02 How kernel tests are written

And more test repos housed in external repositories Linux Kernel
Performance (https://github.com/intel/lkp-tests) Phoronix (https://openbenchmarking.org/tests/pts) Linux Test Project (https://github.com/linux-test-project/ltp) 02 How kernel tests are written

03 How kernel tests are run

Pick your poison, there are a few options • KernelCI
(https://foundation.kernelci.org) • LKP / kernel test robot (https://01.org/lkp/documentation/0-day-brief-introduction) • Patchwork + github + extra magic (https://patchwork.kernel.org/project/netdevbpf/list/) • syzbot (https://syzkaller.appspot.com/upstream) • Maintainers’ private machines (e.g. Josef Bacik’s btrfs dashboards: http://toxicpanda.com/) • Thorsten Leemhuis’ regzbot (https://linux-regtracking.leemhuis.info/regzbot/mainline/) 03 How kernel tests are run

KernelCI – A Linux Foundation project Open source test automation
system Builds and runs kernels across a variety of trees, branches, toolchains, and configs Also runs tests on different architectures and SoCs 03 How kernel tests are run

https://linux.kernelci.org/job/

https://linux.kernelci.org/build/

https://linux.kernelci.org/build/id/6295acad348c04ad65a39bdd/

Kernel module build logs

https://linux.kernelci.org/tests/

https://linux.kernelci.org/soc/

Pros - Builds for multiple architectures - Tests on multiple
architectures - Builds with multiple toolchains - Useful information provided with failures and known regressions - Open source and part of the Linux Foundation - Emails failures to upstream lists - Bisects to find culprit patches KernelCI – Pros and Cons Cons - Only runs on merged patches - …but new APIs are coming to allow developers to address this - Web dashboard needs some redesign, still has some bugs

LKP – Linux Kernel Performance / 0 day Run by
the 0-day team at Intel Builds and runs kernels across a variety of trees, branches, toolchains, and configs, including unmerged patches Runs build tests, benchmarks, and logical tests (defined out of tree in separate github repo) Only builds and tests on and for x86 (though apparently they also build for other architectures on private jobs / branches?) 03 How kernel tests are run

https://www.intel.com/content/www/us/en/developer/topic-tech nology/open/linux-kernel-performance/overview.html

https://lists.01.org/hyperkitty/

https://lists.01.org/hyperkitty/list/[email protected]/

Pros - Builds on patches that have not yet been
merged - Provides strong signal by sending messages to upstream lists - Runs benchmarks - Does bisection to find initial broken commit LKP / 0 Day – Pros and Cons Cons - Only runs builds and tests for x86 (or not?) - Does not build with multiple toolchains - Error information helpful, but less comprehensive than KernelCI - Uses Intel / private infrastructure (and source?)

https://patchwork.kernel.org

Patchwork + github – How BPF runs CI tests Patchwork
is a free, web-based patch tracking system Architecture is a combination of patchwork, github, Meta infrastructure Runs all BPF seltests (https://github.com/torvalds/linux/tree/master/tools/testing/selftests/bpf) on every patch sent to bpf and bpf-next lists Only builds and tests for x86 and s390x architectures 03 How kernel tests are run

https://patchwork.kernel.org/project/netdevbpf/list/

Components Patchwork Kernel Patches Daemon kernel_patches/bpf GitHub repo GitHub action
runners (x86, s390x) kernel_patches/vm_test Slide copied almost verbatim from BPF CI talk by Mykola Lysenko at LSFMM 2022 (https://docs.google.com/presentation/d/1RQZjLkbXmSFOr_4Sj5BdQsXbUh_vMshXi7w09pUpWsY/edit#slide=id.g127798017a6_0_194)

https://patchwork.kernel.org/project/netdevbpf/list/

Pros - Patchwork is used by maintainers (one stop shops
can be nice) - Runs on every patch sent to BPF lists - Runs on at least 2 architectures, could theoretically add more - BPF tests in general are easy to run locally – can use script to run in VM - New BPF tests automatically run Patchwork Cons - Other patchwork suites need their own daemon, etc infra to run CI - Doesn’t send messages to BPF lists for job failures - Uses Meta / private infrastructure for Kernel Patches daemon - Doesn’t run tests on SoCs or directly on various non-x86 hardware (uses QEMU for s390x)

syzkaller + syzbot – Fuzzing the kernel Continuously fuzzes main
Linux kernel branches Reports found bugs to upstream lists Bisects to find bugs (and fixes) on specific patches Runs on multiple architectures 03 How kernel tests are run

https://syzkaller.appspot.com/upstream

https://lore.kernel.org/lkml/[email protected]/T/

Pros - Great coverage thanks to the nature of fuzzing
+ sanitizers - Bisects to find culprit patch, and the patch that fixes an issue - Runs on multiple architectures (in VMs) - Sends messages to upstream on failures syzbot Cons - Doesn’t run on unmerged patches - Doesn’t run selftests / kunit tests - Runs on proprietary Google infra - Configurations are hard-coded per platform in the syzbot repo

Independently managed solutions (e.g. for btrfs)

http://toxicpanda.com

http://toxicpanda.com/results/josefbacik/fedora-rawhide/btrfs_nor mal_freespacetree/05-30-2022-21:06:02/index.html

http://toxicpanda.com/performance/

http://toxicpanda.com/performa nce/smallfiles100k.html

Pros - Tailored directly to the need of the subsystem
- Inspires test and benchmark writing Independent solutions Cons - No cross architecture, cross-config, etc coverage provided by framework. - Maintainers need to spend a lot of their time getting something like this set up

04 What can be improved? Note: Lots of discussion expected
(and hoped for) during this section. Please feel free to interject.

04 What can be improved? Let’s start by talking about
CI

All of the CI systems we’ve covered have roughly the
same, or at least similar, goals Run tests on some matrix of configurations and architectures When regressions are detected, provide signal: Ideally before patches are merged Otherwise, bisect and detect the bad patch automatically 04 What can be improved?

All of the CI systems do a subset of things
well KernelCI has a great UI, gets a lot of test coverage and provides detailed information LKP / kernel test robot / 0-day detects regressions for all patches sent to the list, and pings vger when a regression is detected. It also runs tests not included in the source tree, including benchmarks Patchwork / BPF also has a great UI, makes it easy for developers to test locally, and provides signal for all patches sent to the BPF lists. The signal is also highly reliable, due to BPF selftests being deterministic and fast. 04 What can be improved?

Can we combine forces? As maintainers / kernel developers, for
the purposes of testing the kernel, can we break anything out into shared code? - Patch bisection - Invoking kselftests, kunit, interpreting TAP output 04 What can be improved?

04 What can be improved? What about our approach to
writing tests?

kselftests is great, but has room for improvement Was originally
intended as a dumping ground for tests that would often bit rot on individual developers’ servers 04 What can be improved?

04 What can be improved?

Allow for more comprehensive kselftest configurations The maintainers of each
test suite know best how it should be configured Allow selftest suites to be configured to advertise: - State: Stable, flaky, unstable - Support: Supported architectures, unsupported config options (not just what’s necessary to run which is what exists today) - Trees and branches to run on - Frequency of runs + how to invoke test for each frequency 04 What can be improved?

Add more tests! Great way to test your newly added
APIs (both design and correctness) Leverage the excellent infrastructure being developed in tools like KernelCI Add your tests to the tree 04 What can be improved?

Out-of-tree tests Nothing at all wrong with having them (in
fact they provide a ton of value today), but… Having tests which inform the "official" stability, performance, etc for the kernel, should probably reside in the kernel tree as a general rule Allows tests to be controlled and configured by maintainers CI systems can always pull tests from multiple sources 04 What can be improved?

04 What can be improved? …and what do we need
to avoid?

Annoying maintainers Having a CI system should alleviate pressure on
maintainers Things can get tricky though - Flaky tests - Tests failing after merge If tests waste people’s time, they are providing negative value If CI systems spam upstream lists, they are providing negative value 04 What can be improved?

Not all tests created equal Need a high threshold (which
we currently have) for when failing CI runs should email upstream lists - Build regressions are a very stable and reliable signal - If a testrun fails, it’s less clear. Could be flaky, broken test, failing hardware on the host, etc. 04 What can be improved?

How failing tests are interpreted should be up to the
maintainers of a subsystem For subsystems like RCU and BPF, test failures are a strong signal, as tests are actively fixed if flakiness is observed For subsystems like cgroup, it’s less clear. Some testcases (such as test_cpu.c and test_memcontrol.c) are validating heuristic behavior 04 What can be improved?

05 Q & A

06 Bonus: How to write a kselftest

Anatomy of a kselftest suite – livepatch 06 How to
write a kselftest config file contains kconfig options required to build and run the suite Makefile contains recipes for compiling testcases, and variables that are consumed by the kselftest build system

kselftests example – livepatch config file and Makefiles 06 How
to write a kselftest

06 How to write a kselftest

Checking your work: validating the kernel by bu...

Checking your work: validating the kernel by building and testing in CI

More Decks by Kernel Recipes

Other Decks in Technology

Featured

Transcript