DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862

baileympearson · 2025-12-02T17:15:50Z

Overview

This PR adds support for a new class of errors (SystemOverloadedError) to drivers' operation retry logic, as outlined in the design document.

Additionally, it includes a new argument to the MongoDB handshake (also defined in the design document).

Python will be second implementer.
Node implementation: mongodb/node-mongodb-native#4806

Testing

The testing strategy is two-fold:

Building off of Ezra's work to generate unified tests for retryable handshake errors, this PR generates unified tests to confirm that:
- operations are retried using the new SystemOverloadedError label
- operations are retried no more than 5 (current MAX_ATTEMPTS, as defined in the spec) times
Following Iris's work in DRIVERS-1934: withTransaction API retries too frequently #1851, this PR adds a prose test that ensures drivers apply exponential backoff in the retryability loop.
Update changelog.
Test changes in at least one language driver.
Test these changes against all server versions and topologies (including standalone, replica set, and sharded
clusters).

- add prose test - add assertions on the number of retries for maxAttempts tests - don't run clientBulkWrite tests on <8.0 servers

baileympearson · 2025-12-02T18:57:17Z

source/logging/logging.md

    > - If the value is "stderr" (case-insensitive), log to stderr.
    > - Else, if direct logging to files is supported, log to a file at the specified path. If the file already exists, it
-    >     MUST be appended to.
+    >   MUST be appended to.


this is failing lint on main. I'll fix separately and rebase this PR.

source/client-backpressure/client-backpressure.md

source/retryable-reads/retryable-reads.md

source/retryable-writes/retryable-writes.md

source/client-backpressure/client-backpressure.md

blink1073 · 2025-12-03T16:31:19Z

It looks like you also need to bump the schema version:

source/client-backpressure/tests/backpressure-retry-loop.yml invalid
[
  {
    instancePath: '/tests/0/operations/3/expectError',
    schemaPath: '#/definitions/expectedError/type',
    keyword: 'type',
    params: { type: 'object' },
    message: 'must be object'
  }
]
 using schema v1.3
source/client-backpressure/tests/backpressure-retry-max-attempts.yml invalid
[
  {
    instancePath: '/tests/0/operations/1/expectError',
    schemaPath: '#/definitions/expectedError/type',
    keyword: 'type',
    params: { type: 'object' },
    message: 'must be object'
  }
]
 using schema v1.3source/client-backpressure/tests/backpressure-retry-loop.yml invalid
[
  {
    instancePath: '/tests/0/operations/3/expectError',
    schemaPath: '#/definitions/expectedError/type',
    keyword: 'type',
    params: { type: 'object' },
    message: 'must be object'
  }
]
 using schema v1.3
source/client-backpressure/tests/backpressure-retry-max-attempts.yml invalid
[
  {
    instancePath: '/tests/0/operations/1/expectError',
    schemaPath: '#/definitions/expectedError/type',
    keyword: 'type',
    params: { type: 'object' },
    message: 'must be object'
  }
]
 using schema v1.3

blink1073 · 2025-12-03T21:05:18Z

WIP Python implementation: mongodb/mongo-python-driver#2635

blink1073 · 2025-12-03T23:11:38Z

All unified and prose tests are passing in the Python implementation.

Edit: we're still failing one unified test, "client.clientBulkWrite retries using operation loop", investigating...

Edit 2: we're all good now

jyemin

I only reviewed the specification changes, not the pseudocode or tests. Those are best reviewed by implementers.

jyemin · 2025-12-10T19:26:49Z

source/client-backpressure/client-backpressure.md

+    - This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within the
+        timeout to avoid retry storms.
+5. If the previous error includes the `SystemOverloadedError` label, the client MUST apply exponential backoff according
+    to according to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`


Suggested change

to according to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`

to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`

jyemin · 2025-12-10T19:33:11Z

source/client-backpressure/client-backpressure.md

+
+This specification expands the driver's retry ability to all commands, including those not currently considered
+retryable such as updateMany, create collection, getMore, and generic runCommand. The new command execution method obeys
+the following rules:


Since the rules include all the deposits into the token bucket, consider adding withdrawals as well.

jyemin · 2025-12-10T19:36:28Z

source/client-backpressure/client-backpressure.md

+
+## Q&A
+
+TODO


Anything to add here, or just remove?

Nothing yet .. I'll remove it for now. It can always be added back if there are items worth mentioning here.

jyemin · 2025-12-10T19:37:07Z

source/client-backpressure/client-backpressure.md

+
+## Changelog
+
+- 2025-XX-XX: Initial version.


Not sure how we handle the date... Is there an automation for this?

Not that I know of. Usually the spec author fills it out before merging

I'll just leave this thread open to remind myself to add changelog dates before merging once all changes are completed.

Jibola · 2025-12-10T20:34:13Z

source/client-backpressure/tests/README.md

+    ```python
+    assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 3.1 seconds)) < 1)
+    ```


Thoughts:
Could this stick to being javascript from top-to-bottom?
Can we also do BIG_TIME - SMALL_TIME >= 2.1?

To me that's more human-readable.

It maintains that BIG_TIME must always be bigger (removing the need for absolute_value).

It captures the 1 second variation whilst still being rigid to the 3.1 second window and stays minimally invasive.

For sure. This phrasing was taken from the withTransaction backoff tests. Would you want me to update it there as well?

Jibola · 2025-12-10T20:42:12Z

source/client-backpressure/client-backpressure.md

+the following rules:
+
+1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens.
+    - The value is 0.1 and non-configurable.


Per the concerns for golang, I thought updating these values by a scalar of 10 in those cases was fine?
cc: @matthewdale

I think the outcome was the opposite: https://docs.google.com/document/d/1teqNgeWbW6dpRQOALrJTEBRoO6sYcrpq9T3_NJE0QfU/edit?disco=AAABtSVfUxI

Jibola · 2025-12-10T20:51:37Z

source/client-backpressure/client-backpressure.md

+to identify clients which do and do not support backpressure. Currently, this flag is unused but in the future the
+server may offer different rate limiting behavior for clients that do not support backpressure.
+
+##### Implementation notes


Suggested change

##### Implementation notes

#### Implementation notes

source/retryable-writes/retryable-writes.md

stIncMale · 2025-12-11T23:20:46Z

source/client-backpressure/client-backpressure.md

+    deprioritized server address for server selection. This behavior is the same as existing behavior for retryable
+    reads and writes.
+
+Note: drivers MUST share deprioritized servers between retries used for the exponential backoff loop and regular


Let's make this note part of item 7 above.

stIncMale · 2025-12-11T23:22:46Z

source/client-backpressure/client-backpressure.md

+            if attempt >= MAX_ATTEMPTS:
+                raise
+
+            # Raise if the error if non retryable.


Suggested change

# Raise if the error if non retryable.

# Raise if the error is non-retryable.

stIncMale · 2025-12-11T23:25:52Z

source/client-backpressure/client-backpressure.md

-                backoff = jitter * min(BASE_BACKOFF * (2 ** attempt), MAX_BACKOFF)
+            # Raise if the error if non retryable.
+            if exc.has_error_label("RetryableError") and exc.has_error_label("SystemOverloadedError"):
+                raise


Shouldn't we continue preparing the next attempt instead of raising if both labels are present?

stIncMale · 2025-12-12T16:36:01Z

source/client-backpressure/client-backpressure.md

+However, older drivers do not have this benefit. Drivers MUST document that:
+
+- Users SHOULD upgrade to driver versions that officially support backpressure to avoid any impacts of server changes.
+- Users who do not upgrade might see increased might need to update application error handling to handle higher error


""might see increased might need to" - something is wrong here.

stIncMale · 2025-12-12T17:41:47Z

source/client-backpressure/client-backpressure.md

+Some drivers might not have retryability implementations that allow easy separation of the existing retryable
+reads/writes mechanisms from the exponential backoff and jitter retry algorithm. An example pseudocode is defined below
+that demonstrates a combined retryable reads/writes implementation with the corresponding backpressure changes (adapted
+from the Node driver's implementation):


It is irrelevant for the purpose of the specification whether an implementation may separate some pieces of logic on not. However, there is indeed a very important relevant piece that is currently omitted from both the design and the specification: how should the current retry policy and the overload retry policy coexist? (I elaborate on this issue in #1862 (comment))

A reader cannot be expected to extract that missing part of the specification from the large code piece below:

a pseudocode piece present in a specification is nothing but an illustration to a part of the specification expressed in plain English; There is no piece of the specification illustrated by the code below.

a pseudocode illustration should not contain irrelevant syntax; The code below contains async– and generics-related syntax, which has no business being there.

a pseudocode piece should not contain irrelevant detais; The code below not only duplicates the values of the constants defined above (10_000, // MAX_BACKOFF, 100, // base backoff), but also uses constants which are not explained in the specification (2 // backoff rate).

if the illustrated specification introduces named constants, the pseudocode should use constants with the same names and the same semantics; The code below uses maxSystemOverloadRetryAttempts = 5 instead of MAX_ATTEMPTS. Not only it's the wrong name, but it is the wrong semantics - MAX_ATTEMPTS specifies the maximum number of attempts (first attempt + four retry attempts), while maxSystemOverloadRetryAttempts suggests that five retry attempts are allowed.

it is reasonable to expect the pseudocode pieces within the same specification, especially when introduced within the same PR, to have the same syntax; The pseudocode below uses syntax different from the pseudocode above and from the pseudocode used in the "Token Bucket" further below.

the size of a pseudocode should be such that it can reasonably be reviewed; I suspect that virtually no PR reviewers thoroughly reviewed the huge piece of pseudocode below. Reviewability is clearly important, given how many issues were identified in the much smaller trivial pseudocode above.

stIncMale · 2025-12-12T17:44:10Z

source/client-backpressure/client-backpressure.md

+The following pseudocode describes the overload retry policy:
+
+```python
+BASE_BACKOFF = 0.1


The specification uses a different name and specifies a different value:

baseBackoff

100 ms

The pseudocode and the specification should use the same names and values. A reader should not need to do any mental exercises to figure out that different names actually supposed to point to the same constant, or that different values are actually the same value expressed using different units. Furthermore, such discrepancies hamper reader's ability to search for a name/value within the specification text.

source/client-backpressure/client-backpressure.md

stIncMale · 2025-12-12T22:17:28Z

source/client-backpressure/client-backpressure.md

+
+This specification expands the driver's retry ability to all commands if the error indicates that is both an overload
+error and that it is retryable, including those not currently considered retryable such as updateMany, create
+collection, getMore, and generic runCommand. The new command execution method obeys the following rules:


runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case:

Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.

Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not). Applying the overload retry policy to runCommand, however, necessitates analyzing the server response.
2.1. Furthermore, given that the internal retry mechanisms likely require a server response containing errors to be represented as some kind of an exception, a driver will not only have to do that for runCommand, but, if all retry attempts fail, will then have to replace the propagated exception back with the server response, because runCommand should return the response, if one is present, instead of propagating an exception.

I don't think retrying runCommand is worth the effort. If we, nonetheless, still want the specification to require retries for runCommand, let's make the corresponding specification change in a separate DRIVERS ticket included in the different epic, DRIVERS-3337: Client Backpressure Improvements. That way, drivers will be able to complete DRIVERS-3160: Client Backpressure Support, and postpone implementing retries for runCommand if they so desire.

Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).

I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?

--

runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.

I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.

r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.

stIncMale · 2025-12-12T23:26:42Z

source/client-backpressure/client-backpressure.md

This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):

Is it possible to encounter a failed attempt that is is eligible for a retry by both the current and the overload policy?
1.1. I suspect, it currently is not, because the overload retry policy for now requires both RetryableError and SystemOverloadedError to be present. However, he specification should make the answer clear.

What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the overload retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the current retry policy?
2.1. The same question is for two attempts a(n), a(n+1) where the latter immediately¹ follows the former, with the former, a(n), not being the first attempt.
2.1.1. Note that such a situation may be encountered more than once for a single operation.

What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the current retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the overload retry policy?
3.1. The same question is for two attempts a(n), a(n+1) where the latter immediately¹ follows the former, with the former, a(n), not being the first attempt.
3.1.1. Note that such a situation may be encountered more than once for a single operation.

The current retry policy for reads and writes specify which error is to be propagated to an application (if all attempts fail, there are multiple errors to choose from). The proposed overload retry policy does not do this even within itself; it should further specify which error is to be propagated to an application when some attempts of the same requested operation are done according to the current retry policy, while others are done according to the overload retry policy.

¹ In terms of ordering relations, not in the temporal sense.

source/client-backpressure/client-backpressure.md

baileympearson

submitting comments

baileympearson · 2025-12-10T21:05:52Z

source/client-backpressure/client-backpressure.md

+
+## Q&A
+
+TODO


Nothing yet .. I'll remove it for now. It can always be added back if there are items worth mentioning here.

baileympearson · 2025-12-10T21:09:57Z

source/client-backpressure/tests/README.md

+    ```python
+    assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 3.1 seconds)) < 1)
+    ```


For sure. This phrasing was taken from the withTransaction backoff tests. Would you want me to update it there as well?

source/retryable-writes/retryable-writes.md

baileympearson · 2025-12-11T20:00:48Z

source/client-backpressure/client-backpressure.md

+#### Goodput
+
+The throughput of positive, useful output. In the context of drivers, this refers to the number of non-error results
+that the driver processes per unit of time.


Do you have an alternative definition you think makes more sense here?

source/client-backpressure/client-backpressure.md

baileympearson · 2025-12-12T20:16:59Z

source/client-backpressure/tests/README.md

+
+## Changelog
+
+- 2025-XX-XX: Initial version.


It's intended as a TODO, yeah.

If I understand correctly: you're saying new specs generally just leave the changelog empty? That's fine with me as well if that's the usual practice with new specs and test readmes.

baileympearson · 2025-12-15T19:06:25Z

source/client-backpressure/client-backpressure.md

+
+This specification expands the driver's retry ability to all commands if the error indicates that is both an overload
+error and that it is retryable, including those not currently considered retryable such as updateMany, create
+collection, getMore, and generic runCommand. The new command execution method obeys the following rules:


Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).

I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?

--

runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.

I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.

r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.

baileympearson · 2025-12-15T23:06:59Z

source/client-backpressure/tests/backpressure-retry-loop.yml.template

+        object: *{{operation.object}}
+        name: {{operation.operation_name}}
+        {%- if operation.arguments|length > 0 %}
+        arguments:


Yeah, the yaml parser complains if you remove the if-statement.

baileympearson · 2025-12-15T23:07:56Z

source/client-backpressure/tests/backpressure-retry-max-attempts.yml.template

+          client: *failPointClient
+          failPoint:
+            configureFailPoint: failCommand
+            mode: alwaysOn


baileympearson · 2025-12-15T23:08:05Z

source/client-backpressure/tests/backpressure-retry-max-attempts.yml.template

+          {%- endfor -%}
+        {%- endif %}
+        expectError: 
+          isError: true


Yeah, good idea

vbabanin · 2025-12-16T21:11:24Z

source/client-backpressure/client-backpressure.md

+            if attempt > attempts:
+                raise
+
+            deprioritized_servers.append(server.address)


There appears to be a potential inconsistency between the prose specification and the pseudocode regarding server deprioritization.

Step 7:

"If the request is eligible for retry (as outlined in step 4), the client MUST add the previously used server's address to the list of deprioritized server addresses"

Step 4 requires both SystemOverloadedError and RetryableError labels.

However, the pseudocode uses deprioritized_servers.append(server.address) unconditionally - which deprioritizes the server for all retries, including traditional retryable reads and retryable writes.

If the pseudocode in this specification is intended to show the unified retry behavior combining both the new overload retry policy and existing retryable reads/writes logic (as stated in retryable-writes.md:341-346), then we should note that the existing specifications (retryable-writes/reads) restrict server deprioritization to sharded clusters only.

Line 325-326 in retryable-writes.md:

"In a sharded cluster, the server on which the operation failed MUST be provided to the server selection mechanism as a deprioritized server."

My reading is that the unified behavior should:

Deprioritize the server when the error has both SystemOverloadedError and RetryableError labels (matching Step 7), and

Continue to deprioritize only mongos in sharded clusters for existing retryable reads/writes behavior.

Could you confirm if this understanding is correct?

No, that is incorrect. All topologies and all retry mechanisms now always deprioritize servers: 3e577e8

@NoahStapp We missed a reference specifying that sharded clusters behavior in our last PR - would you mind opening a follow up that removes the sentence that Slav pointed out?

That sentence was already fixed in the DRIVERS-3344 commit:

specifications/source/retryable-writes/retryable-writes.md

Lines 320 to 321 in 3e577e8

For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be

provided to the server selection mechanism as a member of the deprioritized server address list.

I'd use the latest commit of master as a reference instead of Bailey's fork.

vbabanin · 2025-12-16T21:35:26Z

source/client-backpressure/client-backpressure.md

+
+#### Pseudocode
+
+The following pseudocode describes the overload retry policy:


retryable-writes.md:341-346 states:

"[!NOTE]
The rules above and the pseudocode below only demonstrate the rules for retryable writes... the pseudocode was intentionally unmodified. For a pseudocode block that contains both retryable writes logic as defined in this specification and backoff retryabilitity as defined in the client backpressure specification, see the pseudocode in the Backpressure Specification."

So retryable-writes tells readers go to the backpressure spec for the unified pseudocode.

However, client-backpressure.md does not seem to say it's providing a unified pseudocode. It just:

Says it's "separate from" existing behavior

Says the pseudocode shows "overload retry policy", which reads as if it only describes the overload retry policy.

If the intent is for client-backpressure.md pseudo-code to present the unified pseudocode, I’d suggest making that explicit in the text and adding links back to the relevant specs, similar to retryable-writes.md:341-346. What do you think?

The pseudocode in the client backpressure specification is unified - it handles both existing retryable reads/writes, and the new client backpressure retryability. It doesn't include all of the same detail as the pseudocodes in each retryability spec though, but I intentionally omitted things like error handling for brevity.

ex:

is_retryable = is_retryable_write() or is_retryable_read() or (exc.has_error_label("RetryableError") and exc.has_error_label("SystemOverloadedError"))

Is there something specific you feel is missing? I can add more detail to this pseudocode if there are things missing.

matthewdale · 2025-12-17T00:25:57Z

source/client-backpressure/client-backpressure.md

+    - The value of `MAX_ATTEMPTS` is 5 and non-configurable.
+    - This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within the
+        timeout to avoid retry storms.
+5. If a retry attempt is to be attempted, a token will be consumed from the token bucket.


This wording is difficult to understand. Consider a clearer sentence:

Suggested change

5. If a retry attempt is to be attempted, a token will be consumed from the token bucket.

5. A retry attempt consumes 1 token from the token bucket.

matthewdale · 2025-12-17T00:58:15Z

source/client-backpressure/client-backpressure.md

+The overload retry policy introduces a per-client token bucket to limit SystemOverloaded retry attempts. Although the
+server rejects excess operations as quickly as possible, doing so costs CPU and creates extra contention on the
+connection pool which can eventually negatively affect goodput. To reduce this risk, the token bucket will limit retry
+attempts during a prolonged overload.


This seems to directly contradict the assertions in the CSOT spec section Why don't drivers use backoff/jitter between retry attempts?.

Should we remove that section from the CSOT spec?

Is there a reason not to apply backoff+jitter to all retry attempts, not just those with the SystemOverloadedError label?

P.S. I realize both of these are arguably out of the scope of this change, mostly wanted to know if there are existing answers to these questions.

baileympearson added 8 commits December 1, 2025 13:44

initial commit

588f1f2

new files

e467f5b

add tests for handshake changes

d55fdb9

add generated tests

8e74b41

test fixes and add prose test

072b453

- add prose test - add assertions on the number of retries for maxAttempts tests - don't run clientBulkWrite tests on <8.0 servers

fix run on requirements

52e2a35

fix run on requirements?

391c951

fix CI

92501c0

baileympearson commented Dec 2, 2025

View reviewed changes

source/client-backpressure/client-backpressure.md Outdated Show resolved Hide resolved

baileympearson marked this pull request as ready for review December 2, 2025 18:59

baileympearson requested review from a team as code owners December 2, 2025 18:59

baileympearson requested review from jmikola and jyemin and removed request for a team December 2, 2025 18:59

blink1073 reviewed Dec 3, 2025

View reviewed changes

source/retryable-reads/retryable-reads.md Outdated Show resolved Hide resolved

source/retryable-writes/retryable-writes.md Outdated Show resolved Hide resolved

source/client-backpressure/client-backpressure.md Outdated Show resolved Hide resolved

baileympearson added 3 commits December 3, 2025 11:36

comments

0fdef39

Fix broken unified tests

82acab8

fix UTR linting failures

b3a7b6c

blink1073 mentioned this pull request Dec 3, 2025

PYTHON-5528 & PYTHON-5651 Add exponential backoff to operation retry loop for server overloaded errors mongodb/mongo-python-driver#2635

Draft

11 tasks

remove broken deleteMany() from unified tests

60a87b8

add backwards compat section

399a56b

jyemin requested changes Dec 10, 2025

View reviewed changes

Jibola requested changes Dec 10, 2025

View reviewed changes

stIncMale reviewed Dec 11, 2025

View reviewed changes

source/retryable-writes/retryable-writes.md Show resolved Hide resolved

stIncMale reviewed Dec 11, 2025

View reviewed changes

stIncMale reviewed Dec 12, 2025

View reviewed changes

source/client-backpressure/client-backpressure.md Outdated Show resolved Hide resolved

stIncMale reviewed Dec 12, 2025

View reviewed changes

baileympearson added 2 commits December 15, 2025 15:52

unified retry loop, handshake phrasing change

034b85e

Jeremy's last comments

779e171

blink1073 reviewed Dec 16, 2025

View reviewed changes

source/client-backpressure/client-backpressure.md Outdated Show resolved Hide resolved

Other misc comments

e5d4de6

baileympearson requested review from Jibola, blink1073, jmikola, jyemin and stIncMale December 16, 2025 18:19

baileympearson commented Dec 16, 2025

View reviewed changes

vbabanin reviewed Dec 16, 2025

View reviewed changes

matthewdale reviewed Dec 17, 2025

View reviewed changes

update transaction spec and add unified tests

1cd95fc

blink1073 requested a review from a team as a code owner December 17, 2025 12:41

blink1073 requested review from durran and removed request for a team December 17, 2025 12:41

blink1073 added 5 commits December 17, 2025 10:18

update transaction logic and add more tests

6912e45

verify commitTransaction fails after 5 backoff attempts

88e6067

clean up transactions spec

2019678

update test names

d3ce32b

address writeconcern on retries

de7e862

	to according to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`
	to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)`

	# Raise if the error if non retryable.
	# Raise if the error is non-retryable.

	For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be
	provided to the server selection mechanism as a member of the deprioritized server address list.


		#### Pseudocode

		The following pseudocode describes the overload retry policy:

	5. If a retry attempt is to be attempted, a token will be consumed from the token bucket.
	5. A retry attempt consumes 1 token from the token bucket.


		## Q&A

		TODO


		## Q&A

		TODO

DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862

Are you sure you want to change the base?

DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862

Conversation

baileympearson commented Dec 2, 2025 • edited by blink1073 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blink1073 commented Dec 3, 2025

Uh oh!

blink1073 commented Dec 3, 2025

Uh oh!

blink1073 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jyemin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stIncMale Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stIncMale Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stIncMale Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

baileympearson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

baileympearson commented Dec 2, 2025 •

edited by blink1073

Loading

blink1073 commented Dec 3, 2025 •

edited

Loading

stIncMale Dec 12, 2025 •

edited

Loading

stIncMale Dec 12, 2025 •

edited

Loading

stIncMale Dec 12, 2025 •

edited

Loading

vbabanin Dec 16, 2025 •

edited

Loading

NoahStapp Dec 17, 2025 •

edited

Loading

vbabanin Dec 16, 2025 •

edited

Loading