Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Checking your work: validating the kernel by building and testing in CI

Checking your work: validating the kernel by building and testing in CI

The Linux kernel is one of the most complex pieces of software ever written. Being in ring 0, bugs in the kernel are a big problem, so having confidence in the correctness and robustness of the kernel is incredibly important. This is difficult enough for a single version and configuration of the kernel, but becomes exponentially more difficult as you consider the need to validate the kernel across various configurations, architectures, trees, and branches.

While there are a number of great tools available to write tests for the kernel, and a number of great resources to run those tests and publish the results, our general testing and CI story is inconsistent. Some subsystems run a 100% stable kselftest suite on patchwork, others run regression tests in datacenters maintained by private companies, and yet others run tests on machines that reside in maintainers’ private datacenters (basements?).

The aim of this talk is to paint a relatively complete picture of the kernel testing story as it exits today, to highlight some of the pain-points felt by maintainers and contributors, and to suggest a way forward towards a more comprehensive and consistent testing story for the upstream community.

David Vernet

Kernel Recipes

June 09, 2024
Tweet

More Decks by Kernel Recipes

Other Decks in Technology

Transcript

  1. Checking your work: Linux kernel testing and CI Scaling reliability

    across the global upstream community David Vernet [email protected] Kernel Recipes 2022 – Paris, France
  2. Agenda 01 Disclaimers 02 How kernel tests are written 03

    How kernel tests are run 04 What can we improve? 05 Q & A 06 Bonus: how to write a kselftest
  3. 1. I may be missing details of tools I’m not

    aware of 2. Presentation was crafted in the middle of the night over the Atlantic 01 Disclaimers
  4. Pick your poison, there are a number of options •

    kselftests (https://docs.kernel.org/dev-tools/kselftest.html) • KUnit (https://docs.kernel.org/dev-tools/kunit/index.html) • xfstests (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/) • Benchmarks (LKP @ https://github.com/intel/lkp-tests, Phoronix @ https://openbenchmarking.org/tests/pts) • Fuzzers (https://github.com/google/syzkaller) • Sanitizers (KASAN, kmemleak, …) • Linux Test Project (https://github.com/linux-test-project/ltp) • … 02 How kernel tests are written
  5. What are kselftests? Testcases are instances of userspace programs Commonly

    written in C, but need only be an executable file Located in tree at tools/testing/selftests 02 How kernel tests are written
  6. What are KUnit tests? Unit testing framework for testing individual

    Linux kernel functions Compiled into the kernel by specifying kconfig options Testcases link directly against kernel symbols and kunit APIs, which are used to make assertions about expected return values of the kernel symbols 02 How kernel tests are written
  7. What are xfstests? Filesystem regression test suite (https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/) Tests are

    categorized according to whether they’re global, shared between a subset of FSs, or specific to one FS Tests use common logic for bootstrapping block devices, etc Located in a separate repository 02 How kernel tests are written
  8. And more test repos housed in external repositories Linux Kernel

    Performance (https://github.com/intel/lkp-tests) Phoronix (https://openbenchmarking.org/tests/pts) Linux Test Project (https://github.com/linux-test-project/ltp) 02 How kernel tests are written
  9. Pick your poison, there are a few options • KernelCI

    (https://foundation.kernelci.org) • LKP / kernel test robot (https://01.org/lkp/documentation/0-day-brief-introduction) • Patchwork + github + extra magic (https://patchwork.kernel.org/project/netdevbpf/list/) • syzbot (https://syzkaller.appspot.com/upstream) • Maintainers’ private machines (e.g. Josef Bacik’s btrfs dashboards: http://toxicpanda.com/) • Thorsten Leemhuis’ regzbot (https://linux-regtracking.leemhuis.info/regzbot/mainline/) 03 How kernel tests are run
  10. KernelCI – A Linux Foundation project Open source test automation

    system Builds and runs kernels across a variety of trees, branches, toolchains, and configs Also runs tests on different architectures and SoCs 03 How kernel tests are run
  11. Pros - Builds for multiple architectures - Tests on multiple

    architectures - Builds with multiple toolchains - Useful information provided with failures and known regressions - Open source and part of the Linux Foundation - Emails failures to upstream lists - Bisects to find culprit patches KernelCI – Pros and Cons Cons - Only runs on merged patches - …but new APIs are coming to allow developers to address this - Web dashboard needs some redesign, still has some bugs
  12. LKP – Linux Kernel Performance / 0 day Run by

    the 0-day team at Intel Builds and runs kernels across a variety of trees, branches, toolchains, and configs, including unmerged patches Runs build tests, benchmarks, and logical tests (defined out of tree in separate github repo) Only builds and tests on and for x86 (though apparently they also build for other architectures on private jobs / branches?) 03 How kernel tests are run
  13. Pros - Builds on patches that have not yet been

    merged - Provides strong signal by sending messages to upstream lists - Runs benchmarks - Does bisection to find initial broken commit LKP / 0 Day – Pros and Cons Cons - Only runs builds and tests for x86 (or not?) - Does not build with multiple toolchains - Error information helpful, but less comprehensive than KernelCI - Uses Intel / private infrastructure (and source?)
  14. Patchwork + github – How BPF runs CI tests Patchwork

    is a free, web-based patch tracking system Architecture is a combination of patchwork, github, Meta infrastructure Runs all BPF seltests (https://github.com/torvalds/linux/tree/master/tools/testing/selftests/bpf) on every patch sent to bpf and bpf-next lists Only builds and tests for x86 and s390x architectures 03 How kernel tests are run
  15. Components Patchwork Kernel Patches Daemon kernel_patches/bpf GitHub repo GitHub action

    runners (x86, s390x) kernel_patches/vm_test Slide copied almost verbatim from BPF CI talk by Mykola Lysenko at LSFMM 2022 (https://docs.google.com/presentation/d/1RQZjLkbXmSFOr_4Sj5BdQsXbUh_vMshXi7w09pUpWsY/edit#slide=id.g127798017a6_0_194)
  16. Pros - Patchwork is used by maintainers (one stop shops

    can be nice) - Runs on every patch sent to BPF lists - Runs on at least 2 architectures, could theoretically add more - BPF tests in general are easy to run locally – can use script to run in VM - New BPF tests automatically run Patchwork Cons - Other patchwork suites need their own daemon, etc infra to run CI - Doesn’t send messages to BPF lists for job failures - Uses Meta / private infrastructure for Kernel Patches daemon - Doesn’t run tests on SoCs or directly on various non-x86 hardware (uses QEMU for s390x)
  17. syzkaller + syzbot – Fuzzing the kernel Continuously fuzzes main

    Linux kernel branches Reports found bugs to upstream lists Bisects to find bugs (and fixes) on specific patches Runs on multiple architectures 03 How kernel tests are run
  18. Pros - Great coverage thanks to the nature of fuzzing

    + sanitizers - Bisects to find culprit patch, and the patch that fixes an issue - Runs on multiple architectures (in VMs) - Sends messages to upstream on failures syzbot Cons - Doesn’t run on unmerged patches - Doesn’t run selftests / kunit tests - Runs on proprietary Google infra - Configurations are hard-coded per platform in the syzbot repo
  19. Pros - Tailored directly to the need of the subsystem

    - Inspires test and benchmark writing Independent solutions Cons - No cross architecture, cross-config, etc coverage provided by framework. - Maintainers need to spend a lot of their time getting something like this set up
  20. 04 What can be improved? Note: Lots of discussion expected

    (and hoped for) during this section. Please feel free to interject.
  21. All of the CI systems we’ve covered have roughly the

    same, or at least similar, goals Run tests on some matrix of configurations and architectures When regressions are detected, provide signal: Ideally before patches are merged Otherwise, bisect and detect the bad patch automatically 04 What can be improved?
  22. All of the CI systems do a subset of things

    well KernelCI has a great UI, gets a lot of test coverage and provides detailed information LKP / kernel test robot / 0-day detects regressions for all patches sent to the list, and pings vger when a regression is detected. It also runs tests not included in the source tree, including benchmarks Patchwork / BPF also has a great UI, makes it easy for developers to test locally, and provides signal for all patches sent to the BPF lists. The signal is also highly reliable, due to BPF selftests being deterministic and fast. 04 What can be improved?
  23. Can we combine forces? As maintainers / kernel developers, for

    the purposes of testing the kernel, can we break anything out into shared code? - Patch bisection - Invoking kselftests, kunit, interpreting TAP output 04 What can be improved?
  24. kselftests is great, but has room for improvement Was originally

    intended as a dumping ground for tests that would often bit rot on individual developers’ servers 04 What can be improved?
  25. Allow for more comprehensive kselftest configurations The maintainers of each

    test suite know best how it should be configured Allow selftest suites to be configured to advertise: - State: Stable, flaky, unstable - Support: Supported architectures, unsupported config options (not just what’s necessary to run which is what exists today) - Trees and branches to run on - Frequency of runs + how to invoke test for each frequency 04 What can be improved?
  26. Add more tests! Great way to test your newly added

    APIs (both design and correctness) Leverage the excellent infrastructure being developed in tools like KernelCI Add your tests to the tree 04 What can be improved?
  27. Out-of-tree tests Nothing at all wrong with having them (in

    fact they provide a ton of value today), but… Having tests which inform the "official" stability, performance, etc for the kernel, should probably reside in the kernel tree as a general rule Allows tests to be controlled and configured by maintainers CI systems can always pull tests from multiple sources 04 What can be improved?
  28. Annoying maintainers Having a CI system should alleviate pressure on

    maintainers Things can get tricky though - Flaky tests - Tests failing after merge If tests waste people’s time, they are providing negative value If CI systems spam upstream lists, they are providing negative value 04 What can be improved?
  29. Not all tests created equal Need a high threshold (which

    we currently have) for when failing CI runs should email upstream lists - Build regressions are a very stable and reliable signal - If a testrun fails, it’s less clear. Could be flaky, broken test, failing hardware on the host, etc. 04 What can be improved?
  30. How failing tests are interpreted should be up to the

    maintainers of a subsystem For subsystems like RCU and BPF, test failures are a strong signal, as tests are actively fixed if flakiness is observed For subsystems like cgroup, it’s less clear. Some testcases (such as test_cpu.c and test_memcontrol.c) are validating heuristic behavior 04 What can be improved?
  31. Anatomy of a kselftest suite – livepatch 06 How to

    write a kselftest config file contains kconfig options required to build and run the suite Makefile contains recipes for compiling testcases, and variables that are consumed by the kselftest build system