how to parse this to yield: host identifiers, authentication credentials, connection options A mongodb+srv:// scheme indicates Initial DNS Seedlist Discovery, which may yield additional host identifiers Atlas uses this to provide shorter, more resilient connection strings
newly established connections This uses OP_QUERY instead of OP_MSG for backwards compatibility Drivers can also provide client metadata The isMaster response reports the server’s min and max wire versions Used for protocol negotiation, feature discovery, detecting imposters No authentication or compression at this step
and compression protocols (if any) are supported by the server Drivers also advertise what compression they support in client metadata Auth spec defines command conversations for various auth mechanisms Compression spec defines OP_COMPRESSED as an envelope for other opcodes Compression is never used for certain commands (e.g. isMaster, auth)
server descriptions, a strategy for periodic monitoring, and a state machine for updating descriptions Drivers can infer initial topology type and servers from the connection string Unknown types address ambiguity (e.g. seed list without replicaSet option) isMaster response affirms a server’s type and may also update the topology
of workers, each responsible for serving one request at a time Different application deployments App Server Cluster Cluster Multi-threaded and async applications have a limited number of app servers responsible for serving incoming requests concurrently App Server App Server App Server App Server App Server App Server
topology in a background “thread” and maintain a separate connection pool for application usage Monitoring thread does not share sockets with the connection pool (rationale) Single-threaded drivers share sockets for monitoring and application usage and perform monitoring during server selection (i.e. procuring a socket) Separate sockets would be redundant and/or costly Forgo connection pools for persistent sockets
Retry isMaster once to quickly recover dropped sockets (rationale) Drivers internally invoke monitoring as needed (e.g. after “not master” error) Optimizations for single-threaded drivers Ignore inaccessible servers for cooldownMS (five seconds) Monitoring can be parallelized with async IO
the topology and its servers Server Selection uses a loop to filter the topology to a server description Straightforward algorithm for multi-threaded and async drivers, but single-threaded drivers must invoke SDAM during the loop Random selection within a latency window if multiple servers are eligible A server description can be exchanged for a socket
monitoring activity (defaults to 10 seconds) Consider tuning closer to the expected max latency of the database servers socketTimeoutMS pertains to application operations (defaults to 300 seconds) Comparable to PHP’s own default_socket_timeout. Be mindful of PHP’s max_execution_time. Configuring socket timeouts
single-threaded drivers; minimum is 500ms) socketCheckIntervalMS determines if a socket is considered inactive and must be re-checked before use (defaults to 5 seconds) Specifically for single-threaded drivers. Like retrying isMaster, this helps insulate applications from network errors. Configuring monitoring
an eligible server (defaults to 15ms) serverSelectionTimeoutMS is maximum amount of time to spend in the server selection loop (defaults to 30 seconds) serverSelectionTryOnce allows the application to “fail fast” Specifically for single-threaded drivers, where this defaults to true Disabling try-once behavior can improve resiliency at the expense of time
that the data has been acknowledged by a majority linearizable provides additional guarantees over majority to avoid returning stale data. Introduced in MongoDB 3.4 to satisfy the Jepsen test framework. Peter Bailis provides an accessible definition of linearizability snapshot may be used with majority-committed transactions to guarantee that reads within that transaction use a snapshot of majority-committed data
time with maxTimeMS Server will track processing time and abort at the next interrupt point Socket timeouts can be expensive for both the client and server Write concerns can also use wtimeout to limit waiting time for replication Distinguish write concern errors from write errors
a preceding operation Causal consistency comes with several guarantees Read your own writes, monotonic reads/writes, and writes follow reads Satisfied by majority read and write concerns (when durability required) Applications can obtain causal consistency by using explicit sessions (examples)
their operations In earlier versions of MongoDB, state was tied to connection objects Sessions live throughout a cluster and are not tied to connection objects Sessions can be created and used as an explicit option for database operations Group operations by passing the same session (e.g. causal consistency) By default, drivers will use an implicit session for single operations
of the system Reads or writes may continue to run on the server after the client moves on. Write operations may not be idempotent and safe to execute multiple times. At best, retrying may waste time or consume resources At worst, retrying may inadvertently alter the data itself
of failure Transient network error, persistent outage, command error A retry attempt may be necessary to differentiate transience from persistence If a command response reports failure, retrying probably isn’t going to help
safe to retry Short-running queries that return a single batch of documents (i.e. will not leave behind a cursor) may be safe to retry Drivers will aim to retry most read commands in MongoDB 4.2 Requires server functionality to detect dropped sockets and abort operations getMore cannot be retried, since cursor iteration is forward only
beyond the scope of a connection • Each write can be uniquely identified by a session and statement ID • Drivers can rely on SDAM and server selection to re-select the primary Drivers can safely retry single-document writes (or bulks thereof) by resending the original command to the primary and trusting the server to Do the Right Thing™ If the write already executed, return the result we missed If the write never executed, do it now and return its result
for each retry attempt PHP’s default try-once behavior is unlikely to find a new primary after a failover, since replica set elections can take a few seconds (electionTimeoutMillis) Reducing election times for planned maintenance (SERVER-35624) Combining retryWrites=true with serverSelectionTryOnce=false can fully insulate an application’s writes from replica set failovers (https://git.io/fNbW0)
string, disable try-once behavior (serverSelectionTryOnce=false), and tune serverSelectionTimeoutMS closer to expected election time (e.g. 15 seconds) Atlas already advises this, which helps with its automated maintenance Use the driver as you would normally Multi-document writes (e.g. updateMany) may still fail; you’re no worse off Single-document writes may still fail after one retry attempt
engine MongoDB 3.2 made WiredTiger the default, introduced read concerns, and made significant improvements to the replication protocol MongoDB 3.6 introduced logical sessions, which was the underlying framework for causal consistency and retryable writes MongoDB 4.0 introduced multi-document transactions for replica sets by leveraging the logical session API and WiredTiger storage engine MongoDB 4.2 will add transaction support for sharded clusters
route to the same member (i.e. primary) Read and write concerns are specified once, when starting a transaction While many operations are supported, there are some restrictions (e.g. DDL) Databases and collections must exist prior to starting the transaction Cursors created outside a transaction cannot be used within, and vice versa
Applications can retry commits additional times if desired Other read and write operations are not retried. Transactions and retryable writes (i.e. retryWrites=true) are mutually exclusive. Entire transactions may be retried if an operation fails with a transient error Use a majority write concern when retrying transactions for durability
driver or library may be associated with one or more error labels, which can be checked using the hasErrorLabel() method TransientTransactionError implies the entire transaction can be retried UnknownTransactionCommitResult implies a commit can be retried Applications can, and should, handle both cases (example)
https://php.net/mongodb https://docs.mongodb.com/php-library/ https://github.com/mongodb/specifications MongoDB Manual (CRUD concepts, retryable writes, transactions) https://docs.mongodb.com/manual/core/crud/ https://docs.mongodb.com/manual/core/retryable-writes/ https://docs.mongodb.com/manual/core/transactions/ How to Write Resilient MongoDB Applications — A. Jesse Jiryu Davis https://emptysqua.re/blog/how-to-write-resilient-mongodb-applications/ It’s 10pm: Do You Know Where Your Writes Are? — Jeremy Mikola https://speakerdeck.com/jmikola/its-10pm-do-you-know-where-your-writes-are