-
Notifications
You must be signed in to change notification settings - Fork 245
DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 21 commits
588f1f2
e467f5b
d55fdb9
8e74b41
072b453
52e2a35
391c951
92501c0
0fdef39
82acab8
b3a7b6c
60a87b8
399a56b
0545e15
ff5475a
6211624
c1001bc
def5fbd
08da5c4
034b85e
779e171
e5d4de6
1cd95fc
6912e45
88e6067
2019678
d3ce32b
de7e862
ab85a5b
eb10ddb
2b90697
0abf373
d07d49e
747c18c
be27bc0
92e479a
b674f13
03065ad
1b7f6df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,314 @@ | ||||||
| # Client Backpressure | ||||||
|
|
||||||
| - Status: Accepted | ||||||
| - Minimum Server Version: N/A | ||||||
|
|
||||||
| ______________________________________________________________________ | ||||||
|
|
||||||
| ## Abstract | ||||||
|
|
||||||
| This specification adds the ability for drivers to automatically retry requests that fail due to server overload errors | ||||||
| while applying backpressure to avoid further overloading the server. | ||||||
|
|
||||||
| ## META | ||||||
|
|
||||||
| The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | ||||||
| "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). | ||||||
|
|
||||||
| ## Specification | ||||||
|
|
||||||
| ### Terms | ||||||
|
|
||||||
| #### Ingress Connection Rate Limiter | ||||||
|
|
||||||
| A token-bucket based system introduced in MongoDB 8.2 to admit, reject or queue connection requests. It aims to prevent | ||||||
| connection spikes from overloading the system. | ||||||
|
|
||||||
| #### Ingress Request Rate Limiter | ||||||
|
|
||||||
| A token bucket based system introduced in MongoDB 8.2 to admit an operation or reject it with a System Overload Error at | ||||||
| the front door of a mongod/s. It aims to prevent operations spikes from overloading the system. | ||||||
|
|
||||||
| #### MongoTune | ||||||
|
|
||||||
| Mongotune is a policy engine outside the server (mongod or mongos) which monitors a set of metrics (MongoDB or system | ||||||
| host) to dynamically configure MongoDB settings. MongoTune is deployed to Atlas clusters and will dynamically configure | ||||||
| the connection and request rate limiters to prevent and mitigate overloading the system. | ||||||
|
|
||||||
| #### RetryableError label | ||||||
|
|
||||||
| An error is considered retryable if it includes the "RetryableError" label. This error label indicates that an operation | ||||||
| is safely retryable regardless of the type of operation, its metadata, or any of its arguments. | ||||||
|
|
||||||
| Note that for the initial draft of the spec, only errors that have both the RetryableError label and the | ||||||
| SystemOverloadedError label are eligible for the retry backoff loop. | ||||||
|
||||||
|
|
||||||
| #### SystemOverloadedError label | ||||||
|
|
||||||
| An error is considered overloaded if it includes the "SystemOverloadError" label. This error label indicates that the | ||||||
| server is overloaded. If this error label is present, drivers will backoff before attempting a retry. | ||||||
|
|
||||||
| #### Overload Errors | ||||||
|
|
||||||
| An overload error is any command or network error that occurs due to a server overload. For example, when a request | ||||||
| exceeds the ingress request rate limit: | ||||||
|
|
||||||
| ```js | ||||||
| { | ||||||
| 'ok': 0.0, | ||||||
| 'errmsg': "Rate limiter 'ingressRequestRateLimiter' rate exceeded", | ||||||
| 'code': 462, | ||||||
| 'codeName': 'IngressRequestRateLimitExceeded', | ||||||
| 'errorLabels': ['SystemOverloadedError', 'RetryableError'], | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| When a new connection attempt exceeds the ingress connection rate limit, the server closes the TCP connection before TLS | ||||||
| handshake is complete. Drivers will observe this as a network error (e.g. "connection reset by peer" or "connection | ||||||
| closed"). | ||||||
|
|
||||||
| When a new connection attempt is queued by the server for so long that the driver-side timeout expires, drivers will | ||||||
| observe this as a network timeout error. | ||||||
|
|
||||||
| Note that there is no guarantee that all SystemOverloaded errors are retryable or that all RetryableErrors also have the | ||||||
| SystemOverloaded error label. | ||||||
|
|
||||||
| #### Goodput | ||||||
|
|
||||||
| The throughput of positive, useful output. In the context of drivers, this refers to the number of non-error results | ||||||
| that the driver processes per unit of time. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "the number of non-error results that the driver processes per unit of time" is neither throughput, nor the "good throughput" ("goodput"). Throughput is the characteristic of a system (the combination of the application, the driver, the DBMS, their configuration, the network connecting them, the hardware, etc.), which is a constant for a given system, and tells about system capacity at its peak. SPECjbb2012: Updated Metrics for a Business Benchmark explains nicely what a throughput is, and how it may be measured. "the number of non-error results
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If, however, we want to define "throughput"/"goodput" the way it is currently proposed, then when we use the term in
we have to say "max goodput" / "max throughput", or something like that, instead of just "goodput"/"throughput".
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have an alternative definition you think makes more sense here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think the bounds of throughput can be scoped to whatever end-to-end system you're defining as the receiver. In our case, that's couched in the "output" which is just the data received from the server. We can have that say "packets retrieved from the server". As well, we can change the language to say "achieved" or "realized" throughput/goodput everywhere we use the word in a sentence. I think the only correction needed here is:
|
||||||
|
|
||||||
| See [goodput](https://en.wikipedia.org/wiki/Goodput). | ||||||
|
|
||||||
| ### Requirements for Client Backpressure | ||||||
|
|
||||||
| #### Overload retry policy | ||||||
|
|
||||||
| This specification expands the driver's retry ability to all commands if the error indicates that is both an overload | ||||||
|
||||||
| This specification expands the driver's retry ability to all commands if the error indicates that is both an overload | |
| This specification expands the driver's retry ability to all commands if the error indicates that it is both an overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case:
- Retrying
runCommandrequires changing its implementation such that it uses whatever internal retry mechanisms a driver has. - Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via
runCommand(unless we are talking aboutrunCursorCommand, which I am not). Applying the overload retry policy torunCommand, however, necessitates analyzing the server response.
2.1. Furthermore, given that the internal retry mechanisms likely require a server response containing errors to be represented as some kind of an exception, a driver will not only have to do that forrunCommand, but, if all retry attempts fail, will then have to replace the propagated exception back with the server response, becauserunCommandshould return the response, if one is present, instead of propagating an exception.
I don't think retrying runCommand is worth the effort. If we, nonetheless, still want the specification to require retries for runCommand, let's make the corresponding specification change in a separate DRIVERS ticket included in the different epic, DRIVERS-3337: Client Backpressure Improvements. That way, drivers will be able to complete DRIVERS-3160: Client Backpressure Support, and postpone implementing retries for runCommand if they so desire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (before back pressure), there seem to be no reason for a driver to inspect the server response to a command run via runCommand (unless we are talking about runCursorCommand, which I am not).
I thought that all drivers inspected the response to throw an exception if the response includes: ok: 0. Is this not the case in Java?
--
runCommand is mentioned here as if the overload retry policy is trivially applicable to it, but I am not sure that's the case
and
Retrying runCommand requires changing its implementation such that it uses whatever internal retry mechanisms a driver has.
I don't think the assumption is that it is trivial for drivers to implement for all commands, because we're explicitly adding support for commands that previously were not retryable in addition to runCommand (ex: getMore). Debating whether or not it is trivial to implement misses the point imo: this is a new feature that drivers are building, so it is understood that it requires code changes to implement.
r.e. including runCommand in this project: I'm going to keep it for now, unless there's a strong technical argument against including it. I don't think we can say that drivers handle backpressure if there is a user-facing API that doesn't include the backpressure retry logic.
Jibola marked this conversation as resolved.
Show resolved
Hide resolved
stIncMale marked this conversation as resolved.
Show resolved
Hide resolved
blink1073 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
stIncMale marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording is difficult to understand. Consider a clearer sentence:
| 5. If a retry attempt is to be attempted, a token will be consumed from the token bucket. | |
| 5. A retry attempt consumes 1 token from the token bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we decided to have retry number start we 0? It makes the requirements confusing. Why don't we start with 1 and in fact we will have the same formula as for withTransaction:
delayMS = j * min(maxBackoff, baseBackoff * 2^(i-1))
With the only difference here we have 2 as the base for pow function, in convinientTransaction API we have 1.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the formula from withTransaction:
jitter * min(BACKOFF_INITIAL * 1.5 ** (transactionAttempt - 1), BACKOFF_MAX)
Where transactionAttempt started with 0 and is being incremented AFTER the delay, but before executing the callback attempt. Which is also confusing... but in C# implementation we wait AFTER the attempt so it's more natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing, including "Retries start at 0", was just taken from the design. There's no need to keep it this way if it causes confusion.
I can adjust the phrasing to more closely align with the transaction spec, if that's preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the following formula?
delayMS = j * min(maxBackoff, baseBackoff * 2^(i-1))
or
we can keep the formula as is, but adjust the baseBackoff :
delayMS = j * min(maxBackoff, baseBackoff * 2^i)
where baseBackoff is 50 instead of 100.
It produces the same results, but starts i with 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use any of those three formulas in individual driver implementations if it increases readability so long as they result in the same outputs. As far as reducing confusion, I can say it didn't substantially change my understanding of the formula.
+1 to Bailey changing the phrasing, which I think would suffice.
vbabanin marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
stIncMale marked this conversation as resolved.
Show resolved
Hide resolved
vbabanin marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification says "i is the retry attempt (starting with 0 for the first retry)." The pseudocode deviates from it by using the next attempt number (attempt) here instead of using the next retry attempt number (all attempts but the first one are retry attempts). The minimal attempt value at this point in execution is 1, making the minimal backoff equal to BASE_BACKOFF * (2^1) = 200 ms (before accounting for jitter), instead of the expected 100 ms.
To correctly illustrate the specification, the pseudocode should be
backoff = jitter * min(BASE_BACKOFF * (2^(attempt - 1)), MAX_BACKOFF)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to directly contradict the assertions in the CSOT spec section Why don't drivers use backoff/jitter between retry attempts?.
- Should we remove that section from the CSOT spec?
- Is there a reason not to apply backoff+jitter to all retry attempts, not just those with the
SystemOverloadedErrorlabel?
P.S. I realize both of these are arguably out of the scope of this change, mostly wanted to know if there are existing answers to these questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re 1. - I think the section is still accurate because we only apply backoff and jitter to a particular subset of errors (system overloaded errors). I could update the phrasing to mention the client backpressure specification if you'd like.
re 2. - Not that I can think of but I also don't know that it is necessary. Especially with Noah's server selection changes, because now there's no chance of selecting the same server again.
The exception would be really extreme server overload (what this feature addresses) or other extreme driver scenarios where requests to all nodes are failing, exhausting the deprioritized server list. The only such non-overload scenario I can think of is a complete network outage to all nodes but I don't think backoff and jitter would help much in this scenario at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the responses! Consider this resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how we handle the date... Is there an automation for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I know of. Usually the spec author fills it out before merging
I'll just leave this thread open to remind myself to add changelog dates before merging once all changes are completed.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Client Backpressure Tests | ||
|
|
||
| ______________________________________________________________________ | ||
|
|
||
| ## Introduction | ||
|
|
||
| The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of | ||
| retryable reads. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md). | ||
|
|
||
| Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to | ||
| be manually implemented by each driver. | ||
|
|
||
| ### Prose Tests | ||
|
|
||
| #### Test 1: Operation Retry Uses Exponential Backoff | ||
|
|
||
| Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. | ||
|
|
||
| 1. Let `client` be a `MongoClient` | ||
| 2. Let `collection` be a collection | ||
| 3. Now, run transactions without backoff: | ||
| 1. Configure the random number generator used for jitter to always return `0` -- this effectively disables backoff. | ||
|
|
||
| 2. Configure the following failPoint: | ||
|
|
||
| ```javascript | ||
| { | ||
| configureFailPoint: 'failCommand', | ||
| mode: 'alwaysOn', | ||
| data: { | ||
| failCommands: ['insert'], | ||
| errorCode: 2, | ||
| errorLabels: ['SystemOverloadedError', 'RetryableError'] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| 3. Execute the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution. | ||
|
|
||
| ```javascript | ||
| const start = performance.now(); | ||
| expect( | ||
| await coll.insertOne({ a: 1 }).catch(e => e) | ||
| ).to.be.an.instanceof(MongoServerError); | ||
| const end = performance.now(); | ||
| ``` | ||
|
|
||
| 4. Configure the random number generator used for jitter to always return `1`. | ||
|
|
||
| 5. Execute step 3 again. | ||
|
|
||
| 6. Compare the two time between the two runs. | ||
| ```python | ||
| assertTrue(with_backoff_time - no_backoff_time >= 2.1) | ||
| ``` | ||
| The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two | ||
| runs. | ||
|
|
||
| ## Changelog | ||
|
|
||
| - 2025-XX-XX: Initial version. | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):
1.1. I suspect, it currently is not, because the overload retry policy for now requires both
RetryableErrorandSystemOverloadedErrorto be present. However, he specification should make the answer clear.2.1. The same question is for two attempts
a(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.2.1.1. Note that such a situation may be encountered more than once for a single operation.
3.1. The same question is for two attempts
a(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.3.1.1. Note that such a situation may be encountered more than once for a single operation.
1 In terms of ordering relations, not in the temporal sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated pseudocode should answer these questions - let me know if there's anything else you'd like clarified.