Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reproducible Postgres

Reproducible Postgres

Reproducible Postgres is an Apache-licensed, secure Postgres distribution that packages vanilla upstream PostgreSQL. It’s built under the guiding principles of reproducible and hermetic builds. These principles aim to guarantee that you will get exactly the same Postgres binaries ("bit-by-bit") whenever and wherever you build them, and that no build artifact is tainted by the host environment where it’s built.

Among other benefits, this means that all artifacts are content-addressable; that they provide a strong defense against supply-chain attacks; and that build and caching efficiencies are really high.

The speakers will dive into the journey towards creating Reproducible Postgres, including, among others:

What are reproducible and hermetic builds and their benefits.
Which are the best tools to produce such builds.
Why the project settled on Bazel, and what are the technical challenges faced.
How to build Postgres and the extensions ecosystem from a single “monorepo”.
Notes on toolchains, cross-compilation, buildfarms and build outputs.
What, if anything, could the current Postgres build system learn from this experience.

Avatar for OnGres

OnGres

May 14, 2025
Tweet

More Decks by OnGres

Other Decks in Technology

Transcript

  1. Reproducible Postgres ` whoami ` Javier Maestro <[email protected]> 2jotas.com •

    Infrastructure Software Engineer with 20+ years of experience • Worked at hyperscalers like Facebook and Tuenti Technologies with distributed systems, real-time data, reliability engineering, disaster recovery, and incident management.
  2. Reproducible Postgres ` whoami ` Alvaro Hernandez <[email protected]> aht.es •

    Founder & CEO, OnGres • 20+ years Postgres user and DBA • Mostly doing R&D to create new, innovative software on Postgres • More than 140 tech talks, most about Postgres • Founder and President of the NPO Fundación PostgreSQL • AWS Data Hero
  3. Reproducible Postgres Open source and supply-chain attacks You use open

    source software, right? Yes, for security reasons and to prevent vendor lock in. Do you compile it from source? No, I use binary packages. Who builds those binary packages? How do you ensure they provide from the OSS software you think and no attacks are injected during the process?
  4. Reproducible Postgres Reproducible builds If a binary is built twice*

    and the resulting binaries are not byte-for-byte identical, the build is not reproducible. * the devil is in the details…
  5. Reproducible Postgres Reproducible builds Without reproducible builds: • You have

    little guarantee of how the binary was built (can’t reproduce). • You can’t troubleshoot on dev/test environments with the very same binary (since they may be different). • Provisioning is much harder and caching degrades (many more binaries).
  6. Reproducible Postgres Hermetic builds “When given the same input source

    code and product configuration, a hermetic build system always returns the same output by isolating the build from changes to the host system” https://bazel.build/basics/hermeticity
  7. Reproducible Postgres Hermetic builds Hermetic builds lead to (but don't

    guarantee): • Reproducibility • Protection from environment poisoning • The ability to create self-contained (or static) packages
  8. Reproducible Postgres Breaking reproducibility/hermeticity • System-dependent embeddings in the binary

    ◦ Timestamps ◦ RPATH ◦ GNU_BUILD_ID ◦ strings / debug info with build paths, config flags… ◦ code generation (flex and its #line directive) • Different versions of dependencies and/or tools
  9. Reproducible Postgres But Debian is reproducible, isn’t it? “Most packages

    built in sid today are reproducible… under a fixed, predefined, build-path and environment” https://wiki.debian.org/ReproducibleBuilds
  10. Reproducible Postgres Monogres: goal Create the Postgres monorepo A centralized

    repository where Postgres and all of its extensions are indexed, built and packaged
  11. Reproducible Postgres Monogres: an Open Source, upstream distro • Monogres

    will be Open Source with Apache 2.0 License. • An upstream distribution that other downstream distributions can re-use and re-package. • Both a binary and (potentially) a source distribution
  12. Reproducible Postgres Monogres: cardinality • 5 major versions • All

    minor versions of every major • 5 "option sets" (barebones, minimal, regular, full, debug) • All extensions (1K+) with multiple versions • All extensions compiled against major.minor versions to avoid potential ABI issues
  13. Reproducible Postgres Monogres: high cardinality 4 major-minor per year x

    (5y + 4y + … + 1y) x ( 5 Postgres option sets (barebones, minimal, regular, full, debug) + (1K extensions x ~10 extension versions) ) x 2 architectures (amd64, arm64) = 4 x 15 x (5 + 10K) x 2 ≅ 1.2M 1M+ packages (and more!)
  14. Reproducible Postgres {Monogres, Bazel} — Choose two https://bazel.build A mature

    (10y), open-source, build and testing tool created by Google and the Bazel community
  15. Reproducible Postgres Bazel: remote builds bazelbuild/remote-apis: remote execution, caching, …

    (1) is becoming the de-facto standard (2) with industry support (3) and no vendor lock-in (1) Bazel, Buck2, BuildStream, Pants, Please, Buildbox (2) Aspect, BuildBuddy, Engflow, NativeLink (3) BuildBarn, BuildBuddy, BuildFarm, BuildGrid, NativeLink
  16. Reproducible Postgres Bazel: extensible, polyglot • It’s fast, reliable, hermetic,

    incremental, parallelized and extensible • It has a high-level build language with deterministic evaluation and hermetic execution (Starlark) • Polyglot: supports multiple languages, platforms, and architectures (ideal for extensions!)
  17. Reproducible Postgres Bazel: hermeticity, sandboxing • Bazel constructs a work

    directory for each target (the execroot/). • It contains all input files and serves as the container for any generated outputs. • When possible, Bazel uses an OS mechanism to constrain the action within the execroot/ (e.g. containers on Linux and sandbox-exec on Mac)
  18. Reproducible Postgres Bazel: community, ecosystem Third-party extensions that bring awesome

    functionality with little effort: • toolchains (GCC, LLVM, Zig…) • rules_pkg: packaging tar, zip, deb, rpm • rules_oci: building OCI images • BCR: Bazel Central Registry (discoverability)
  19. Reproducible Postgres Bazel: pain points • Abstraction comes with developer

    complexity, especially when debugging. • The hermeticity and reproducibility aspects still lack a simple and easy sandbox integration. • In the end, the easy path is to initially use container images which partially defeat the purpose and complicate the reproducibility.
  20. Reproducible Postgres What’s next • Publish as open source •

    Monobot: an automatic crawler that will generate repo.json • Add more extensions ◦ So far we have all contrib and some PGXS extensions • Support multiple glibc • Support multiple forks (Babelfish, IvorySQL, OrioleDB, OpenHalo, PgEdge, …)