absolutely isn't. Hasn't [X] database had transparent encryption for over a decade? Yes, server-side transparent encryption. With delegated keys. What about secure hardware enclaves, don't they take care of this? Popular solutions like Intel SGX have had… challenges. Plaintext exists inside the enclave, possibly entire databases. First principles or Pinky Promise as a Service? Popular misconceptions around encrypted search
"encrypted databases" is often completely different. Here, we use encrypted search in the way it's used by cryptographers: end-to-end encrypted data that can be privately queried. Queries can be performed against a database server holding the encrypted data but which does not have access to the keys. This is a shared lexicon issue
secure messaging apps, the claims are (mostly?) understood. But what defines an "end"? What is the threat model and what are its actual guarantees? Where do the data flow? Who holds the keys? The turtles problem: who holds the key to the keys? Who can see the secrets? The trust problem
discover? What information does the database leak? Are leaks exploitable, in what context, and over what period of time? Can this leakage be formally described? (spoiler: yes, we believe so) Digging deeper…
Networks are unreliable. Robust replication is hard. Partitions are a problem. Sometimes things just… die. Worse, sometimes client apps only semi-die. Dirty secrets of distributed (cloud) databases
with: DOB == "1989-Dec-13" The dream: rich, expressive queries on encrypted data find all records with: DOB ≥ 1920 AND < 2000 find all last names starting with "Rodrig" find all records with SSN ending with "7192" find accounts with credit card # ending in "8210" find all complaints containing "slow" What types of searches are we talking about?
service • 2.5M active clusters • 200+ data centers around the world • 8 major cloud providers deployments range from tiny developer test/sandbox instances to PB-scale global multi-cloud sharded clusters 100M+ cluster node certs generated
of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines
of documents most here think about native JSON documents, including complex sub-documents distributed architecture by design vibrant developer community 260M+ downloads to date 1.5M online University students 7M+ developers globally View from the front lines
research models of databases prototypes challenges of large, messy networks removing developer pain & false choices apps in every major modern programming language growing an in-house advanced crypto R&D team formal analysis & proofs of real-world systems
& compromises real-world performance security properties that are explicitly not guaranteed everyone has Strong Thoughts on key management usability, usability, usability things we got wrong lessons learned
Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?
Encryption (PPE) Oblivious RAM (ORAM) Fully Homomorphic Encryption (FHE) Functional encryption Garbled RAM Structured Encryption (STE) Are any of these suitable to general purpose databases?
Arab cryptanalysis ▪ 9th century scholar born ca. 805 in Iraq, educated in Baghdad ▪ Polymath philosopher, scientist, mathematician, musician ▪ Authored manuscript Treatise on Decrypting Cryptographic Messages ▪ Formal analysis of core "analytic principles" / letter frequencies ▪ Methods & attacks developed by al-Kindi 1,100+ years ago still apply to encrypted systems today https://plato.stanford.edu/entries/al-kindi/ https://muslimheritage.com/wp-content/uploads/2018/05/cryptology01.pdf https://membres-ljk.imag.fr/Bernard.Ycart/mel/hm/AlKadi_cryptology.pdf https://archive.is/R2vvu
Encrypt-then-MAC authenticated encryption AES-256 with HMAC-SHA256 document encryption/decryption only happens in the client app top-level 96 byte composite user key on client comprising: 256-bit AES-CTR key 256-bit HMAC key 256-bit key for encrypted search operations How does it work?
keys ("field keys") database can never access raw key material backing vault key management: cloud KMS, KMIP/HSM, Hashicorp Vault, custom key service, local key… How does it work?
based on a novel Structured Encryption construction client is stateless; database maintains (blinded) state distributed, highly available, highly scalable by design scheme is robust to client failures, high contention, dropped sessions How does it work?
data structures are Encrypted Multi-Maps a type of reverse or inverted index of encrypted label/tuple pairs labels are pseudorandom function (PRF) evaluations fast, efficient EMM lookups PRFs here == keyed HMACs (HMAC-SHA-256) How does it work?
user collection still evaluating the storage:performance impact collection sizes can be decreased via compaction One additional field per document, negligible size Test example: 3.4M documents → 6.3GB user collection Punchline: Expect at least 2-4X additional storage cost vs unencrypted Trade-Offs: Storage *performance analysis/optimization for production-sized test workloads is ongoing
queries: millisecond results on 5M+ document test DBs Writes - mixed low-volume (<100) insert() & update(): negligible impact high volume/batch (1K+) inserts(): still analyzing throughput bulk writes via insertMany() in test Trade-Offs: Performance *performance analysis/optimization for production-sized test workloads is ongoing
on every page of docs & tutorials) there will be breaking changes (ibid) experimental protocols / data format changes will be frequent primary focus is usability & features expect architectural changes user experience: where can we reduce or eliminate choices? constant march towards more opinionated UX & safer defaults QE Public Preview
[Node.js, Java (Sync & Async), Go, Python, C/C++, C# .NET, Ruby, PHP, Rust, Scala] Compass & mongosh All commits are public QE support in driver (client) releases are beta / experimental releases can be 5+ months behind nightly branches QE Public Preview
Craig Colby Cynthia Dave Davi Divjot Dmitry Dmitry Duran Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jeremy Jesse Judah Julie Julias Kaitlin Katia Kevin Mark Mat Matt Nathan Naomi Neal Nick Oleg Oz Pramod Preston Ravind Rachael Rachelle Roberto Ross Sam Sara Sergei Shane Shreyas Spencer Vincent