Skip to content
Open
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
588f1f2
initial commit
baileympearson Nov 25, 2025
e467f5b
new files
baileympearson Nov 26, 2025
d55fdb9
add tests for handshake changes
baileympearson Nov 26, 2025
8e74b41
add generated tests
baileympearson Dec 1, 2025
072b453
test fixes and add prose test
baileympearson Dec 1, 2025
52e2a35
fix run on requirements
baileympearson Dec 2, 2025
391c951
fix run on requirements?
baileympearson Dec 2, 2025
92501c0
fix CI
baileympearson Dec 2, 2025
0fdef39
comments
baileympearson Dec 3, 2025
82acab8
Fix broken unified tests
baileympearson Dec 3, 2025
b3a7b6c
fix UTR linting failures
baileympearson Dec 3, 2025
60a87b8
remove broken deleteMany() from unified tests
baileympearson Dec 3, 2025
399a56b
add backwards compat section
baileympearson Dec 10, 2025
0545e15
Jeff's and Jib's comments
baileympearson Dec 11, 2025
ff5475a
adjust backpressure spec phrasing to make it clear retryable errors a…
baileympearson Dec 11, 2025
6211624
squash: jeremy's casing comments
baileympearson Dec 11, 2025
c1001bc
squash: other comments
baileympearson Dec 11, 2025
def5fbd
squash: other comments
baileympearson Dec 11, 2025
08da5c4
last round comments
baileympearson Dec 11, 2025
034b85e
unified retry loop, handshake phrasing change
baileympearson Dec 15, 2025
779e171
Jeremy's last comments
baileympearson Dec 15, 2025
e5d4de6
Other misc comments
baileympearson Dec 16, 2025
1cd95fc
update transaction spec and add unified tests
blink1073 Dec 17, 2025
6912e45
update transaction logic and add more tests
blink1073 Dec 17, 2025
88e6067
verify commitTransaction fails after 5 backoff attempts
blink1073 Dec 18, 2025
2019678
clean up transactions spec
blink1073 Dec 18, 2025
d3ce32b
update test names
blink1073 Dec 18, 2025
de7e862
address writeconcern on retries
blink1073 Dec 18, 2025
ab85a5b
add retryable get more tests
baileympearson Dec 18, 2025
eb10ddb
transaction test fixes
baileympearson Dec 18, 2025
2b90697
deduplicate ids
baileympearson Dec 18, 2025
0abf373
update transaction writeconcern logic and add changelog entry
blink1073 Dec 18, 2025
d07d49e
last few comments
baileympearson Dec 18, 2025
747c18c
add runOnRequirement for getMore tests
blink1073 Dec 19, 2025
be27bc0
updated formula
baileympearson Dec 19, 2025
92e479a
Merge branch 'master' of github.com:mongodb/specifications into DRIVE…
blink1073 Dec 22, 2025
b674f13
lint
blink1073 Dec 22, 2025
03065ad
address review
blink1073 Dec 23, 2025
1b7f6df
fix link
blink1073 Dec 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
314 changes: 314 additions & 0 deletions source/client-backpressure/client-backpressure.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):

  1. Is it possible to encounter a failed attempt that is is eligible for a retry by both the current and the overload policy?
    1.1. I suspect, it currently is not, because the overload retry policy for now requires both RetryableError and SystemOverloadedError to be present. However, he specification should make the answer clear.
  2. What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the overload retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the current retry policy?
    2.1. The same question is for two attempts a(n), a(n+1) where the latter immediately1 follows the former, with the former, a(n), not being the first attempt.
    2.1.1. Note that such a situation may be encountered more than once for a single operation.
  3. What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the current retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the overload retry policy?
    3.1. The same question is for two attempts a(n), a(n+1) where the latter immediately1 follows the former, with the former, a(n), not being the first attempt.
    3.1.1. Note that such a situation may be encountered more than once for a single operation.
  4. The current retry policy for reads and writes specify which error is to be propagated to an application (if all attempts fail, there are multiple errors to choose from). The proposed overload retry policy does not do this even within itself; it should further specify which error is to be propagated to an application when some attempts of the same requested operation are done according to the current retry policy, while others are done according to the overload retry policy.

1 In terms of ordering relations, not in the temporal sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated pseudocode should answer these questions - let me know if there's anything else you'd like clarified.

Original file line number Diff line number Diff line change
@@ -0,0 +1,314 @@
# Client Backpressure

- Status: Accepted
- Minimum Server Version: N/A

______________________________________________________________________

## Abstract

This specification adds the ability for drivers to automatically retry requests that fail due to server overload errors
while applying backpressure to avoid further overloading the server.

## META

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

### Terms

#### Ingress Connection Rate Limiter

A token-bucket based system introduced in MongoDB 8.2 to admit, reject or queue connection requests. It aims to prevent
connection spikes from overloading the system.

#### Ingress Request Rate Limiter

A token bucket based system introduced in MongoDB 8.2 to admit an operation or reject it with a System Overload Error at
the front door of a mongod/s. It aims to prevent operations spikes from overloading the system.

#### MongoTune

Mongotune is a policy engine outside the server (mongod or mongos) which monitors a set of metrics (MongoDB or system
host) to dynamically configure MongoDB settings. MongoTune is deployed to Atlas clusters and will dynamically configure
the connection and request rate limiters to prevent and mitigate overloading the system.

#### RetryableError label

An error is considered retryable if it includes the "RetryableError" label. This error label indicates that an operation
is safely retryable regardless of the type of operation, its metadata, or any of its arguments.

Note that for the initial draft of the spec, only errors that have both the RetryableError label and the
SystemOverloadedError label are eligible for the retry backoff loop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "initial draft of the spec"? As far as I understand, once the changed proposed in this PR are approved and merged, they become part of the driver specifications, and not "initial draft".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


#### SystemOverloadedError label

An error is considered overloaded if it includes the "SystemOverloadError" label. This error label indicates that the
server is overloaded. If this error label is present, drivers will backoff before attempting a retry.

#### Overload Errors

An overload error is any command or network error that occurs due to a server overload. For example, when a request
exceeds the ingress request rate limit:

```js
{
'ok': 0.0,
'errmsg': "Rate limiter 'ingressRequestRateLimiter' rate exceeded",
'code': 462,
'codeName': 'IngressRequestRateLimitExceeded',
'errorLabels': ['SystemOverloadedError', 'RetryableError'],
}
```

When a new connection attempt exceeds the ingress connection rate limit, the server closes the TCP connection before TLS
handshake is complete. Drivers will observe this as a network error (e.g. "connection reset by peer" or "connection
closed").

When a new connection attempt is queued by the server for so long that the driver-side timeout expires, drivers will
observe this as a network timeout error.

Note that there is no guarantee that all SystemOverloaded errors are retryable or that all RetryableErrors also have the
SystemOverloaded error label.

#### Goodput

The throughput of positive, useful output. In the context of drivers, this refers to the number of non-error results
that the driver processes per unit of time.
Copy link
Member

@stIncMale stIncMale Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the number of non-error results that the driver processes per unit of time" is neither throughput, nor the "good throughput" ("goodput"). Throughput is the characteristic of a system (the combination of the application, the driver, the DBMS, their configuration, the network connecting them, the hardware, etc.), which is a constant for a given system, and tells about system capacity at its peak. SPECjbb2012: Updated Metrics for a Business Benchmark explains nicely what a throughput is, and how it may be measured.

"the number of non-error results
that the driver processes per unit of time"
is not a characteristic of a system, but rather a metric whose value may vary. Trivially, if an application does not request any operations via the driver, then "the number of non-error results
that the driver processes per unit of time"
is zero, but the throughput is still not.

Copy link
Member

@stIncMale stIncMale Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, however, we want to define "throughput"/"goodput" the way it is currently proposed, then when we use the term in

  • "negatively affect goodput"
  • "stable but lowered throughput"

we have to say "max goodput" / "max throughput", or something like that, instead of just "goodput"/"throughput".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an alternative definition you think makes more sense here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughput is the characteristic of a system (the combination of the application, the driver, the DBMS, their configuration, the network connecting them, the hardware, etc.), which is a constant for a given system, and tells about system capacity at its peak.

I think the bounds of throughput can be scoped to whatever end-to-end system you're defining as the receiver. In our case, that's couched in the "output" which is just the data received from the server. We can have that say "packets retrieved from the server". As well, we can change the language to say "achieved" or "realized" throughput/goodput everywhere we use the word in a sentence.

I think the only correction needed here is:

  • Remove the non-error results "processed" phrase. Change that to say "The rate at which actionable data is sent from the server and retrieved at the application layer of the driver code"
  • Change the places with "negatively affect goodput" to "lower the achieved goodput rate" and "lower the achieved/realized throughput rate"


See [goodput](https://en.wikipedia.org/wiki/Goodput).

### Requirements for Client Backpressure

#### Overload retry policy

This specification expands the driver's retry ability to all commands if the error indicates that is both an overload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it" is missing

Suggested change
This specification expands the driver's retry ability to all commands if the error indicates that is both an overload
This specification expands the driver's retry ability to all commands if the error indicates that it is both an overload

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

error and that it is retryable, including those not currently considered retryable such as updateMany, create
collection, getMore, and generic runCommand. The new command execution method obeys the following rules:
Copy link
Member

@stIncMale stIncMale Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case:

  1. Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.
  2. Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not). Applying the overload retry policy to runCommand, however, necessitates analyzing the server response.
    2.1. Furthermore, given that the internal retry mechanisms likely require a server response containing errors to be represented as some kind of an exception, a driver will not only have to do that for runCommand, but, if all retry attempts fail, will then have to replace the propagated exception back with the server response, because runCommand should return the response, if one is present, instead of propagating an exception.

I don't think retrying runCommand is worth the effort. If we, nonetheless, still want the specification to require retries for runCommand, let's make the corresponding specification change in a separate DRIVERS ticket included in the different epic, DRIVERS-3337: Client Backpressure Improvements. That way, drivers will be able to complete DRIVERS-3160: Client Backpressure Support, and postpone implementing retries for runCommand if they so desire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).

I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?

--

runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.

I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.

r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.


1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens.
- The value is 0.1 and non-configurable.
2. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens.
3. If a retry attempt fails with an error that does not include `SystemOverloadedError` label, drivers MUST deposit 1
token.
- A non-SystemOverloaded error indicates that the server is healthy enough to handle requests. For the purposes of
retry budget tracking, this counts as a success.
4. A retry attempt will only be permitted if the error is eligible for retryable reads or writes, the error has a
`SystemOverloadedError` label, we have not reached `MAX_ATTEMPTS`, the CSOT deadline has not expired, and a token
can be acquired from the token bucket.
- The value of `MAX_ATTEMPTS` is 5 and non-configurable.
- This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within the
timeout to avoid retry storms.
5. If a retry attempt is to be attempted, a token will be consumed from the token bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is difficult to understand. Consider a clearer sentence:

Suggested change
5. If a retry attempt is to be attempted, a token will be consumed from the token bucket.
5. A retry attempt consumes 1 token from the token bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

6. If the request is eligible for retry (as outlined in step 4), the client MUST apply exponential backoff according to
the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`
- `i` is the retry attempt number (starting with 0 for the first retry).
Copy link
Member

@sanych-sun sanych-sun Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we decided to have retry number start we 0? It makes the requirements confusing. Why don't we start with 1 and in fact we will have the same formula as for withTransaction:

delayMS = j * min(maxBackoff, baseBackoff * 2^(i-1))

With the only difference here we have 2 as the base for pow function, in convinientTransaction API we have 1.5

Copy link
Member

@sanych-sun sanych-sun Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the formula from withTransaction:
jitter * min(BACKOFF_INITIAL * 1.5 ** (transactionAttempt - 1), BACKOFF_MAX)

Where transactionAttempt started with 0 and is being incremented AFTER the delay, but before executing the callback attempt. Which is also confusing... but in C# implementation we wait AFTER the attempt so it's more natural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing, including "Retries start at 0", was just taken from the design. There's no need to keep it this way if it causes confusion.

I can adjust the phrasing to more closely align with the transaction spec, if that's preferable?

Copy link
Member

@sanych-sun sanych-sun Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the following formula?
delayMS = j * min(maxBackoff, baseBackoff * 2^(i-1))

or

we can keep the formula as is, but adjust the baseBackoff :
delayMS = j * min(maxBackoff, baseBackoff * 2^i)
where baseBackoff is 50 instead of 100.

It produces the same results, but starts i with 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use any of those three formulas in individual driver implementations if it increases readability so long as they result in the same outputs. As far as reducing confusion, I can say it didn't substantially change my understanding of the formula.

+1 to Bailey changing the phrasing, which I think would suffice.

- `j` is a random jitter value between 0 and 1.
- `baseBackoff` is constant 100ms.
- `maxBackoff` is 10000ms.
- This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter.
7. If the request is eligible for retry (as outlined in step 4), the client MUST add the previously used server's
address to the list of deprioritized server addresses for server selection.

#### Interaction with Existing Retry Behavior

The retryability API defined in this specification is separate from the existing retryability behaviors defined in the
retryable reads and retryable writes specifications. Drivers MUST:

- Only retryable errors with the `SystemOverloadedError` consume tokens from the token bucket before retrying.
- Only retryable errors with the `SystemOverloadedError` label apply backoff and jitter.

#### Pseudocode

The following pseudocode describes the overload retry policy:

```python
# Note: the values below have been scaled down by a factor of 1000 because
# Python's sleep API takes a duration in seconds, not milliseconds.
BASE_BACKOFF = 0.1 # 100ms
MAX_BACKOFF = 10 # 10s

RETRY_TOKEN_RETURN_RATE = 0.1
MAX_ATTEMPTS = 5

def execute_command_retryable(command, ...):
deprioritized_servers = []
attempt = 0
attempts = if is_csot then 1 else math.inf

while True:
try:
server = select_server(deprioritized_servers)
connection = server.getConnection()
res = execute_command(connection, command)
# Return tokens to the bucket on success.
tokens = RETRY_TOKEN_RETURN_RATE
if attempt > 0:
tokens += 1
token_bucket.deposit(tokens)
return res
except PyMongoError as exc:
is_retryable = is_retryable_read() or is_retryable_write() or (exc.has_error_label("RetryableError") and exc.has_error_label("SystemOverloadedError"))
is_overload = exc.has_error_label("SystemOverloadedError")

# if a retry fails with a non-System overloaded error, deposit 1 token
if attempt > 0 and not is_overload:
token_bucket.deposit(1)

# Raise if the error is non-retryable.
if not is_retryable:
raise

attempt += 1
if is_overload:
attempts = MAX_ATTEMPTS

if attempt >= attempts:
raise

deprioritized_servers.append(server.address)

if is_overload:
jitter = random.random() # Random float between [0.0, 1.0).
backoff = jitter * min(BASE_BACKOFF * (2 ** attempt), MAX_BACKOFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification says "i is the retry attempt (starting with 0 for the first retry)." The pseudocode deviates from it by using the next attempt number (attempt) here instead of using the next retry attempt number (all attempts but the first one are retry attempts). The minimal attempt value at this point in execution is 1, making the minimal backoff equal to BASE_BACKOFF * (2^1) = 200 ms (before accounting for jitter), instead of the expected 100 ms.

To correctly illustrate the specification, the pseudocode should be

backoff = jitter * min(BASE_BACKOFF * (2^(attempt - 1)), MAX_BACKOFF)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# If the delay exceeds the deadline, bail early.
if _csot.get_timeout():
if time.monotonic() + backoff > _csot.get_deadline():
raise

if not token_bucket.consume(1):
raise

time.sleep(backoff)
```

### Token Bucket

The overload retry policy introduces a per-client token bucket to limit SystemOverloaded retry attempts. Although the
server rejects excess operations as quickly as possible, doing so costs CPU and creates extra contention on the
connection pool which can eventually negatively affect goodput. To reduce this risk, the token bucket will limit retry
attempts during a prolonged overload.
Comment on lines +198 to +201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to directly contradict the assertions in the CSOT spec section Why don't drivers use backoff/jitter between retry attempts?.

  1. Should we remove that section from the CSOT spec?
  2. Is there a reason not to apply backoff+jitter to all retry attempts, not just those with the SystemOverloadedError label?

P.S. I realize both of these are arguably out of the scope of this change, mostly wanted to know if there are existing answers to these questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1. - I think the section is still accurate because we only apply backoff and jitter to a particular subset of errors (system overloaded errors). I could update the phrasing to mention the client backpressure specification if you'd like.
re 2. - Not that I can think of but I also don't know that it is necessary. Especially with Noah's server selection changes, because now there's no chance of selecting the same server again.

The exception would be really extreme server overload (what this feature addresses) or other extreme driver scenarios where requests to all nodes are failing, exhausting the deprioritized server list. The only such non-overload scenario I can think of is a complete network outage to all nodes but I don't think backoff and jitter would help much in this scenario at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the responses! Consider this resolved.


The token bucket capacity is set to 1000 for consistency with the server.

#### Pseudocode

The token bucket is implemented via a thread safe counter. For languages without atomics, this can be implemented via a
lock, for example:

```python
DEFAULT_RETRY_TOKEN_CAPACITY = 1000
class TokenBucket:
"""A token bucket implementation for rate limiting."""
def __init__(
self,
capacity: float = DEFAULT_RETRY_TOKEN_CAPACITY,
):
self.lock = Lock()
self.capacity = capacity
self.tokens = capacity

def consume(self, n: float) -> bool:
"""Consume n tokens from the bucket if available."""
with self.lock:
if self.tokens >= n:
self.tokens -= n
return True
return False

def deposit(self, n: float) -> None:
"""Deposit n tokens back into the bucket."""
with self.lock:
self.tokens = min(self.capacity, self.tokens + n)
```

#### Handshake changes

Drivers conforming to this spec MUST add `“backpressure”: True` to the connection handshake. This flag allows the server
to identify clients which do and do not support backpressure. Currently, this flag is unused but in the future the
server may offer different rate limiting behavior for clients that do not support backpressure.

#### Implementation notes

On some platforms sleep() can have a very low precision, meaning an attempt to sleep for 50ms may actually sleep for a
much larger time frame. Drivers are not required to work around this limitation.

### Logging Retry Attempts

[As with retryable writes](../retryable-writes/retryable-writes.md#logging-retry-attempts), drivers MAY choose to log
retry attempts for load shed operations. This specification does not define a format for such log messages.

### Command Monitoring

[As with retryable writes](../retryable-writes/retryable-writes.md#command-monitoring), in accordance with the
[Command Logging and Monitoring](../command-logging-and-monitoring/command-logging-and-monitoring.md) specification,
drivers MUST guarantee that each `CommandStartedEvent` has either a correlating `CommandSucceededEvent` or
`CommandFailedEvent` and that every "command started" log message has either a correlating "command succeeded" log
message or "command failed" log message. If the first attempt of a retryable operation encounters a retryable error,
drivers MUST fire a `CommandFailedEvent` and emit a "command failed" log message for the retryable error and fire a
separate `CommandStartedEvent` and emit a separate "command started" log message when executing the subsequent retry
attempt. Note that the second `CommandStartedEvent` and "command started" log message may have a different
`connectionId`, since a server is reselected for a retry attempt.

### Documentation

1. Drivers MUST document that all operations support retries on server overload.
2. Driver release notes MUST make it clear to users that they may need to adjust custom retry logic to prevent an
application from inadvertently retrying for too long (see [Backwards Compatibility](#backwards-compatibility) for
details).

### Backwards Compatibility

The server's rate limiting can introduce higher error rates than previously would have been exposed to users under
periods of extreme server overload. The increased error rates is a tradeoff: given the choice between an overloaded
server (potential crash), or at minimum dramatically slower query execution time and a stable but lowered throughput
with higher error rate as the server load sheds, we have chosen the latter.

The changes in this specification help smooth out the impact of the server's rate limiting on users by reducing the
number of errors users see during spikes or burst workloads and help prevent retry storms by spacing out retries.
However, older drivers do not have this benefit. Drivers MUST document that:

- Users SHOULD upgrade to driver versions that officially support backpressure to avoid any impacts of server changes.
- Users who do not upgrade might need to update application error handling to handle higher error rates of
SystemOverloadedErrors.

## Test Plan

See the [README](./tests/README.md) for tests.

## Motivation for Change

New load shedding mechanisms are being introduced to the server that improve its ability to remain available under
extreme load, however clients do not know how to handle the errors returned when one of its requests has been rejected.
As a result, such overload errors would currently either be propagated back to applications, increasing
externally-visible command failure rates, or be retried immediately, increasing the load on already overburdened
servers. To minimize these effects, this specification enables clients to retry requests that have been load shed in a
way that does not overburden already overloaded servers. This retry behavior allows for more aggressive and effective
load shedding policies to be deployed in the future. This will also help unify the currently-divergent retry behavior
between drivers and the server (mongos).

## Reference Implementation

The Node and Python drivers will provide the reference implementations. See
[NODE-7142](https://jira.mongodb.org/browse/NODE-7142) and [PYTHON-5528](https://jira.mongodb.org/browse/PYTHON-5528).

## Future work

1. [DRIVERS-3333](https://jira.mongodb.org/browse/DRIVERS-3333) Add a backoff state into the connection pool.
2. [DRIVERS-3241](https://jira.mongodb.org/browse/DRIVERS-3241) Add diagnostic metadata to retried commands.
3. [DRIVERS-3352](https://jira.mongodb.org/browse/DRIVERS-3352) Add support for RetryableError labels to retryable reads
and writes.

## Q&A

### Why are drivers not required to work around timing limitations in their language's sleep() APIs?

The client backpressure retry loop is primarily concerned with spreading out retries to avoid retry storms. The exact
sleep duration is not critical to the intended behavior, so long as we sleep at least as long as we say we will.

## Changelog

- 2025-XX-XX: Initial version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how we handle the date... Is there an automation for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of. Usually the spec author fills it out before merging

I'll just leave this thread open to remind myself to add changelog dates before merging once all changes are completed.

61 changes: 61 additions & 0 deletions source/client-backpressure/tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Client Backpressure Tests

______________________________________________________________________

## Introduction

The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of
retryable reads. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md).

Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to
be manually implemented by each driver.

### Prose Tests

#### Test 1: Operation Retry Uses Exponential Backoff

Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered.

1. Let `client` be a `MongoClient`
2. Let `collection` be a collection
3. Now, run transactions without backoff:
1. Configure the random number generator used for jitter to always return `0` -- this effectively disables backoff.

2. Configure the following failPoint:

```javascript
{
configureFailPoint: 'failCommand',
mode: 'alwaysOn',
data: {
failCommands: ['insert'],
errorCode: 2,
errorLabels: ['SystemOverloadedError', 'RetryableError']
}
}
```

3. Execute the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution.

```javascript
const start = performance.now();
expect(
await coll.insertOne({ a: 1 }).catch(e => e)
).to.be.an.instanceof(MongoServerError);
const end = performance.now();
```

4. Configure the random number generator used for jitter to always return `1`.

5. Execute step 3 again.

6. Compare the two time between the two runs.
```python
assertTrue(with_backoff_time - no_backoff_time >= 2.1)
```
The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two
runs.

## Changelog

- 2025-XX-XX: Initial version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO item you'll update before merging. I think most files just create an empty changelog for new copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended as a TODO, yeah.

If I understand correctly: you're saying new specs generally just leave the changelog empty? That's fine with me as well if that's the usual practice with new specs and test readmes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine, but I've definitely seen this section omitted entirely or left empty instead of having a placeholder entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the changelog

Loading