on disk Volume Encryption Storage Engine Encryption Network Data is decrypted when the DB starts up Reminder: At-rest encryption is (mostly) to protect non-running databases & backups
database Data is vulnerable to insider access and active database breaches: • Authorized and compromised administrators, DBAs & privileged users • RAM scraping • Process inspection • Cloud providers In-use, in memory
database Data is vulnerable to insider access and active database breaches: • Authorized and compromised administrators , DBAs & users • RAM scraping • Process inspection In-use, in memory This is why we built Client-Side Field Level Encryption!
to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn = “901-10-4312” 10 records fetched with ssn = “901-10-4312” • Fast querying • But data is not secure in-use
ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “901-10-4312”} {payer: “Jones Inc”, ssn: “901-10-4312”} … {payer: “Baker Co”, ssn: “901-10-4312”} 1 million records total 10 records with ssn: “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn: “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted field 10 records fetched with ssn = “901-10-4312” All 1 million records fetched • Client-side processing & decryption • Filtering of records on the client side (performance hit) Problem: You can't actually directly search encrypted fields. Not feasible for many use cases.
• No crypto experience required • Encrypted throughout the data lifecycle • Rich expressive queries • MongoDB is the only platform to implement fast searchable encryption scheme • Server-side processing of encrypted data • Server does not know anything about the data Queryable Encryption Query to find ssn = “901-10-4312” {payer: “Acme Corp”, ssn: “3DwK354xz”} {payer: “Jones Inc”, ssn “23awW124xz”} … {payer: “Baker Co”, ssn: “75fdwswed”} 1 million records total 10 Randomly encrypted fields 10 records fetched with ssn = “901-10-4312” MongoDB’s Approach
transactions using a range of dates or dollar amounts for fraud detection Industry: Human Resources HR system allows searching for employees by the last 4 digits of their social security number Industry: Health Care Customer support agents needs to find patient records by searching for the first few characters of their name
Run expressive queries like range, equality, prefix, suffix, substring, and more on encrypted data Ground-breaking query technology, standards-based cryptography Based on strong, standards-based cryptographic primitives End-to-end fully randomized encryption Data never exists in the clear outside of the client Dramatically reduces attack surface Faster app development No crypto experience required Intuitive and easy for developers to set up and use Strong technical controls for critical data privacy use cases Meet the strictest data privacy requirements for confidentiality on security critical workloads Reduce institutional risk Confident in storing and processing your sensitive workloads in MongoDB Atlas (Cloud)
in Encrypted Search 2019 Post 6.0 Client-Side Field Level Encryption (CSFLE) Equality search on Deterministic encryption 2021 Queryable Encryption Preview Structured Encryption core functionality; Equality search on randomized encryption Post 6.0 Queryable Encryption v1.1 Addition of Range query capabilities Queryable Encryption v1.2 Addition of prefix,suffix, substring query capabilities Future New privacy-enhancing cryptography capabilities Tarik Moataz Seny Kamara Formation of Advanced Cryptography Research group Seny Kamara, Tarik Moataz, and a team of PhD cryptography researchers May 2022
point-in-time access to the entire memory & disk of the database server ◦ At that instant, adversary can access the entire DB, any keys stored in memory, all CPU state including L1-3 cache, and all logs
documents are verifiably CCA-secure (secure against chosen ciphertext attacks) ▪ Ciphertexts don't reveal information about the plaintext, beyond encrypted document size… ▪ …even to adversaries that can adaptively query an encryption oracle ◦ Encrypted indexes are verifiably adaptively multi-snapshot secure
encryption ▪ Encrypt-then-MAC authenticated encryption ◦ AES-CTR-256 with HMAC-SHA256 ◦ Document encryption/decryption only happens on the client, in the application ◦ Top-level 96 byte composite user key on client comprising: ▪ 256-bit AES-CTR key ▪ 256-bit HMAC key ▪ 256-bit key for encrypted search operations How does it work?
functional search index is introduced ◦ Based on a novel Structured Encryption construction ▪ client is stateless; database maintains (blinded) state ▪ distributed, highly available, highly scalable by design ▪ scheme is robust to client failures, high contention, dropped sessions ▪ index data structures are Encrypted Multi-Maps (EMMs) ◦ a type of reverse or inverted index of encrypted label/tuple pairs ◦ labels are pseudorandom function (PRF) evaluations ◦ fast, efficient EMM lookups ◦ PRFs here == keyed HMACs (HMAC-SHA-256) Reminder: HMACs are secret key digests/tags; the database does not have the key
Queryable Encryption migration not (yet) supported CSFLE workloads should stay on CSFLE Queryable Encryption net new only Must be specified at collection creation
unaware • Queryability ◦ Equality only - Deterministic ▪ Data leakage on low entropy fields • Flexible key usage ◦ unique key per field ◦ 1 key for all fields ◦ per-document keys • No additional data elements • Client-side encryption • Server is integral • Queryability ◦ New functional search index ◦ Equality - Fully random ▪ No snapshot leakage, even on low entropy fields ◦ Range, prefix, suffix and substring • Requires a unique key per field • Additional data ◦ 1 new field per document ▪ __safeContent__ ◦ 3 new system collections: enxcol_.* ◦ Do not modify any of these!
Structured Encryption • Ideal for real-time database operations that MongoDB customers need • Software implementation, hardware agnostic • Optimized for sublinear searching Fully or Partial Homomorphic Encryption [FHE] • Not natively offered in any major/commercial general purpose database • For encrypted search, FHE is a poor choice due to weak performance – search speed is linear • Queries slow down significantly as the data set grows • Typically incurs a very heavy computational overhead • Better suited for certain types of secure private computation - sums, statistical means, etc Secure Enclaves • Requires specialized hardware, often cloud-provider proprietary • Keys are still managed by the cloud provider - albeit in hardware • Enclaves are not as powerful as general purpose CPUs, security guarantees unclear
new data encryption keys were protected by a new CMK • Key Vault rotation replaces all former versions of CMK seamlessly, via a single API call Key Rotation
require decrypting and re-encrypting all of your data • A single API call now seamlessly migrates your keys from any supported key provider to another one ◦ AWS - GCP ◦ Local - Azure ◦ GCP - KMIP • With no impact to your application or data Key Migration
Craig Dave Davi Divjot Dmitry Elizabeth Emily Eric Erwin Esha Ezra Jacob Jeff Jesse Judah Julie Kaitlin Katia Kevin Mark Mat Nathan Naomi Nick Oz Pramod Ravind Rachael Rachelle Ross Sam Sara Sergei Shane Shreyas Spencer Vincent