-
Notifications
You must be signed in to change notification settings - Fork 245
DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- add prose test - add assertions on the number of retries for maxAttempts tests - don't run clientBulkWrite tests on <8.0 servers
| > - If the value is "stderr" (case-insensitive), log to stderr. | ||
| > - Else, if direct logging to files is supported, log to a file at the specified path. If the file already exists, it | ||
| > MUST be appended to. | ||
| > MUST be appended to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is failing lint on main. I'll fix separately and rebase this PR.
|
It looks like you also need to bump the schema version: |
|
WIP Python implementation: mongodb/mongo-python-driver#2635 |
|
All unified and prose tests are passing in the Python implementation. Edit: we're still failing one unified test, "client.clientBulkWrite retries using operation loop", investigating... Edit 2: we're all good now |
jyemin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the specification changes, not the pseudocode or tests. Those are best reviewed by implementers.
| - This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within the | ||
| timeout to avoid retry storms. | ||
| 5. If the previous error includes the `SystemOverloadedError` label, the client MUST apply exponential backoff according | ||
| to according to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| to according to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)` | |
| to the following formula: `delayMS = j * min(maxBackoff, baseBackoff * 2^i)` |
|
|
||
| This specification expands the driver's retry ability to all commands, including those not currently considered | ||
| retryable such as updateMany, create collection, getMore, and generic runCommand. The new command execution method obeys | ||
| the following rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the rules include all the deposits into the token bucket, consider adding withdrawals as well.
|
|
||
| ## Q&A | ||
|
|
||
| TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything to add here, or just remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing yet .. I'll remove it for now. It can always be added back if there are items worth mentioning here.
|
|
||
| ## Changelog | ||
|
|
||
| - 2025-XX-XX: Initial version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we handle the date... Is there an automation for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of. Usually the spec author fills it out before merging
I'll just leave this thread open to remind myself to add changelog dates before merging once all changes are completed.
| ```python | ||
| assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 3.1 seconds)) < 1) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts:
Could this stick to being javascript from top-to-bottom?
Can we also do BIG_TIME - SMALL_TIME >= 2.1?
- To me that's more human-readable.
- It maintains that
BIG_TIMEmust always be bigger (removing the need for absolute_value). - It captures the 1 second variation whilst still being rigid to the 3.1 second window and stays minimally invasive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. This phrasing was taken from the withTransaction backoff tests. Would you want me to update it there as well?
| the following rules: | ||
|
|
||
| 1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens. | ||
| - The value is 0.1 and non-configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the concerns for golang, I thought updating these values by a scalar of 10 in those cases was fine?
cc: @matthewdale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the outcome was the opposite: https://docs.google.com/document/d/1teqNgeWbW6dpRQOALrJTEBRoO6sYcrpq9T3_NJE0QfU/edit?disco=AAABtSVfUxI
| to identify clients which do and do not support backpressure. Currently, this flag is unused but in the future the | ||
| server may offer different rate limiting behavior for clients that do not support backpressure. | ||
|
|
||
| ##### Implementation notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ##### Implementation notes | |
| #### Implementation notes |
| deprioritized server address for server selection. This behavior is the same as existing behavior for retryable | ||
| reads and writes. | ||
|
|
||
| Note: drivers MUST share deprioritized servers between retries used for the exponential backoff loop and regular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this note part of item 7 above.
| if attempt >= MAX_ATTEMPTS: | ||
| raise | ||
|
|
||
| # Raise if the error if non retryable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Raise if the error if non retryable. | |
| # Raise if the error is non-retryable. |
| backoff = jitter * min(BASE_BACKOFF * (2 ** attempt), MAX_BACKOFF) | ||
| # Raise if the error if non retryable. | ||
| if exc.has_error_label("RetryableError") and exc.has_error_label("SystemOverloadedError"): | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we continue preparing the next attempt instead of raising if both labels are present?
| However, older drivers do not have this benefit. Drivers MUST document that: | ||
|
|
||
| - Users SHOULD upgrade to driver versions that officially support backpressure to avoid any impacts of server changes. | ||
| - Users who do not upgrade might see increased might need to update application error handling to handle higher error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""might see increased might need to" - something is wrong here.
| Some drivers might not have retryability implementations that allow easy separation of the existing retryable | ||
| reads/writes mechanisms from the exponential backoff and jitter retry algorithm. An example pseudocode is defined below | ||
| that demonstrates a combined retryable reads/writes implementation with the corresponding backpressure changes (adapted | ||
| from the Node driver's implementation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is irrelevant for the purpose of the specification whether an implementation may separate some pieces of logic on not. However, there is indeed a very important relevant piece that is currently omitted from both the design and the specification: how should the current retry policy and the overload retry policy coexist? (I elaborate on this issue in #1862 (comment))
A reader cannot be expected to extract that missing part of the specification from the large code piece below:
- a pseudocode piece present in a specification is nothing but an illustration to a part of the specification expressed in plain English; There is no piece of the specification illustrated by the code below.
- a pseudocode illustration should not contain irrelevant syntax; The code below contains async– and generics-related syntax, which has no business being there.
- a pseudocode piece should not contain irrelevant detais; The code below not only duplicates the values of the constants defined above (
10_000, // MAX_BACKOFF,100, // base backoff), but also uses constants which are not explained in the specification (2 // backoff rate). - if the illustrated specification introduces named constants, the pseudocode should use constants with the same names and the same semantics; The code below uses
maxSystemOverloadRetryAttempts = 5instead ofMAX_ATTEMPTS. Not only it's the wrong name, but it is the wrong semantics -MAX_ATTEMPTSspecifies the maximum number of attempts (first attempt + four retry attempts), whilemaxSystemOverloadRetryAttemptssuggests that five retry attempts are allowed. - it is reasonable to expect the pseudocode pieces within the same specification, especially when introduced within the same PR, to have the same syntax; The pseudocode below uses syntax different from the pseudocode above and from the pseudocode used in the "Token Bucket" further below.
- the size of a pseudocode should be such that it can reasonably be reviewed; I suspect that virtually no PR reviewers thoroughly reviewed the huge piece of pseudocode below. Reviewability is clearly important, given how many issues were identified in the much smaller trivial pseudocode above.
| The following pseudocode describes the overload retry policy: | ||
|
|
||
| ```python | ||
| BASE_BACKOFF = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification uses a different name and specifies a different value:
baseBackoff- 100 ms
The pseudocode and the specification should use the same names and values. A reader should not need to do any mental exercises to figure out that different names actually supposed to point to the same constant, or that different values are actually the same value expressed using different units. Furthermore, such discrepancies hamper reader's ability to search for a name/value within the specification text.
|
|
||
| This specification expands the driver's retry ability to all commands if the error indicates that is both an overload | ||
| error and that it is retryable, including those not currently considered retryable such as updateMany, create | ||
| collection, getMore, and generic runCommand. The new command execution method obeys the following rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case:
- Retrying
runCommandrequires changing its implementation such that it uses whatever internal retry mechanisms a driver has. - Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via
runCommand(unless we are talking aboutrunCursorCommand, which I am not). Applying the overload retry policy torunCommand, however, necessitates analyzing the server response.
2.1. Furthermore, given that the internal retry mechanisms likely require a server response containing errors to be represented as some kind of an exception, a driver will not only have to do that forrunCommand, but, if all retry attempts fail, will then have to replace the propagated exception back with the server response, becauserunCommandshould return the response, if one is present, instead of propagating an exception.
I don't think retrying runCommand is worth the effort. If we, nonetheless, still want the specification to require retries for runCommand, let's make the corresponding specification change in a separate DRIVERS ticket included in the different epic, DRIVERS-3337: Client Backpressure Improvements. That way, drivers will be able to complete DRIVERS-3160: Client Backpressure Support, and postpone implementing retries for runCommand if they so desire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).
I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?
--
runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.
I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.
r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):
- Is it possible to encounter a failed attempt that is is eligible for a retry by both the current and the overload policy?
1.1. I suspect, it currently is not, because the overload retry policy for now requires bothRetryableErrorandSystemOverloadedErrorto be present. However, he specification should make the answer clear. - What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the overload retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the current retry policy?
2.1. The same question is for two attemptsa(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.
2.1.1. Note that such a situation may be encountered more than once for a single operation. - What happens if the first attempt (so not a retry attempt) fails in a way that triggers a retry attempt according to the current retry policy, and then the second attempt (the first retry attempt) fails in a way that could have triggered a retry attempt according to the overload retry policy?
3.1. The same question is for two attemptsa(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.
3.1.1. Note that such a situation may be encountered more than once for a single operation. - The current retry policy for reads and writes specify which error is to be propagated to an application (if all attempts fail, there are multiple errors to choose from). The proposed overload retry policy does not do this even within itself; it should further specify which error is to be propagated to an application when some attempts of the same requested operation are done according to the current retry policy, while others are done according to the overload retry policy.
1 In terms of ordering relations, not in the temporal sense.
baileympearson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitting comments
|
|
||
| ## Q&A | ||
|
|
||
| TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing yet .. I'll remove it for now. It can always be added back if there are items worth mentioning here.
| ```python | ||
| assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 3.1 seconds)) < 1) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. This phrasing was taken from the withTransaction backoff tests. Would you want me to update it there as well?
| #### Goodput | ||
|
|
||
| The throughput of positive, useful output. In the context of drivers, this refers to the number of non-error results | ||
| that the driver processes per unit of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an alternative definition you think makes more sense here?
|
|
||
| ## Changelog | ||
|
|
||
| - 2025-XX-XX: Initial version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intended as a TODO, yeah.
If I understand correctly: you're saying new specs generally just leave the changelog empty? That's fine with me as well if that's the usual practice with new specs and test readmes.
|
|
||
| This specification expands the driver's retry ability to all commands if the error indicates that is both an overload | ||
| error and that it is retryable, including those not currently considered retryable such as updateMany, create | ||
| collection, getMore, and generic runCommand. The new command execution method obeys the following rules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).
I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?
--
runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.
I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.
r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.
| object: *{{operation.object}} | ||
| name: {{operation.operation_name}} | ||
| {%- if operation.arguments|length > 0 %} | ||
| arguments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the yaml parser complains if you remove the if-statement.
| client: *failPointClient | ||
| failPoint: | ||
| configureFailPoint: failCommand | ||
| mode: alwaysOn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup!
| {%- endfor -%} | ||
| {%- endif %} | ||
| expectError: | ||
| isError: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea
| if attempt > attempts: | ||
| raise | ||
|
|
||
| deprioritized_servers.append(server.address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a potential inconsistency between the prose specification and the pseudocode regarding server deprioritization.
Step 7:
"If the request is eligible for retry (as outlined in step 4), the client MUST add the previously used server's address to the list of deprioritized server addresses"
Step 4 requires both SystemOverloadedError and RetryableError labels.
However, the pseudocode uses deprioritized_servers.append(server.address) unconditionally - which deprioritizes the server for all retries, including traditional retryable reads and retryable writes.
If the pseudocode in this specification is intended to show the unified retry behavior combining both the new overload retry policy and existing retryable reads/writes logic (as stated in retryable-writes.md:341-346), then we should note that the existing specifications (retryable-writes/reads) restrict server deprioritization to sharded clusters only.
Line 325-326 in retryable-writes.md:
"In a sharded cluster, the server on which the operation failed MUST be provided to the server selection mechanism as a deprioritized server."
My reading is that the unified behavior should:
- Deprioritize the server when the error has both
SystemOverloadedErrorandRetryableErrorlabels (matching Step 7), and - Continue to deprioritize only mongos in sharded clusters for existing retryable reads/writes behavior.
Could you confirm if this understanding is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that is incorrect. All topologies and all retry mechanisms now always deprioritize servers: 3e577e8
@NoahStapp We missed a reference specifying that sharded clusters behavior in our last PR - would you mind opening a follow up that removes the sentence that Slav pointed out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sentence was already fixed in the DRIVERS-3344 commit:
specifications/source/retryable-writes/retryable-writes.md
Lines 320 to 321 in 3e577e8
| For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be | |
| provided to the server selection mechanism as a member of the deprioritized server address list. |
I'd use the latest commit of master as a reference instead of Bailey's fork.
|
|
||
| #### Pseudocode | ||
|
|
||
| The following pseudocode describes the overload retry policy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retryable-writes.md:341-346 states:
"[!NOTE]
The rules above and the pseudocode below only demonstrate the rules for retryable writes... the pseudocode was intentionally unmodified. For a pseudocode block that contains both retryable writes logic as defined in this specification and backoff retryabilitity as defined in the client backpressure specification, see the pseudocode in the Backpressure Specification."
So retryable-writes tells readers go to the backpressure spec for the unified pseudocode.
However, client-backpressure.md does not seem to say it's providing a unified pseudocode. It just:
- Says it's "separate from" existing behavior
- Says the pseudocode shows "overload retry policy", which reads as if it only describes the overload retry policy.
If the intent is for client-backpressure.md pseudo-code to present the unified pseudocode, I’d suggest making that explicit in the text and adding links back to the relevant specs, similar to retryable-writes.md:341-346. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudocode in the client backpressure specification is unified - it handles both existing retryable reads/writes, and the new client backpressure retryability. It doesn't include all of the same detail as the pseudocodes in each retryability spec though, but I intentionally omitted things like error handling for brevity.
ex:
is_retryable = is_retryable_write() or is_retryable_read() or (exc.has_error_label("RetryableError") and exc.has_error_label("SystemOverloadedError"))
Is there something specific you feel is missing? I can add more detail to this pseudocode if there are things missing.
| - The value of `MAX_ATTEMPTS` is 5 and non-configurable. | ||
| - This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within the | ||
| timeout to avoid retry storms. | ||
| 5. If a retry attempt is to be attempted, a token will be consumed from the token bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is difficult to understand. Consider a clearer sentence:
| 5. If a retry attempt is to be attempted, a token will be consumed from the token bucket. | |
| 5. A retry attempt consumes 1 token from the token bucket. |
| The overload retry policy introduces a per-client token bucket to limit SystemOverloaded retry attempts. Although the | ||
| server rejects excess operations as quickly as possible, doing so costs CPU and creates extra contention on the | ||
| connection pool which can eventually negatively affect goodput. To reduce this risk, the token bucket will limit retry | ||
| attempts during a prolonged overload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to directly contradict the assertions in the CSOT spec section Why don't drivers use backoff/jitter between retry attempts?.
- Should we remove that section from the CSOT spec?
- Is there a reason not to apply backoff+jitter to all retry attempts, not just those with the
SystemOverloadedErrorlabel?
P.S. I realize both of these are arguably out of the scope of this change, mostly wanted to know if there are existing answers to these questions.
DRIVERS-3239
Overview
This PR adds support for a new class of errors (
SystemOverloadedError) to drivers' operation retry logic, as outlined in the design document.Additionally, it includes a new argument to the MongoDB handshake (also defined in the design document).
Python will be second implementer.
Node implementation: mongodb/node-mongodb-native#4806
Testing
The testing strategy is two-fold:
Building off of Ezra's work to generate unified tests for retryable handshake errors, this PR generates unified tests to confirm that:
SystemOverloadedErrorlabelFollowing Iris's work in DRIVERS-1934: withTransaction API retries too frequently #1851, this PR adds a prose test that ensures drivers apply exponential backoff in the retryability loop.
Update changelog.
Test changes in at least one language driver.
Test these changes against all server versions and topologies (including standalone, replica set, and sharded
clusters).