-
Notifications
You must be signed in to change notification settings - Fork 245
DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
baileympearson
wants to merge
33
commits into
mongodb:master
Choose a base branch
from
baileympearson:DRIVERS-3239
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+13,124
−9
Open
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
588f1f2
initial commit
baileympearson e467f5b
new files
baileympearson d55fdb9
add tests for handshake changes
baileympearson 8e74b41
add generated tests
baileympearson 072b453
test fixes and add prose test
baileympearson 52e2a35
fix run on requirements
baileympearson 391c951
fix run on requirements?
baileympearson 92501c0
fix CI
baileympearson 0fdef39
comments
baileympearson 82acab8
Fix broken unified tests
baileympearson b3a7b6c
fix UTR linting failures
baileympearson 60a87b8
remove broken deleteMany() from unified tests
baileympearson 399a56b
add backwards compat section
baileympearson 0545e15
Jeff's and Jib's comments
baileympearson ff5475a
adjust backpressure spec phrasing to make it clear retryable errors a…
baileympearson 6211624
squash: jeremy's casing comments
baileympearson c1001bc
squash: other comments
baileympearson def5fbd
squash: other comments
baileympearson 08da5c4
last round comments
baileympearson 034b85e
unified retry loop, handshake phrasing change
baileympearson 779e171
Jeremy's last comments
baileympearson e5d4de6
Other misc comments
baileympearson 1cd95fc
update transaction spec and add unified tests
blink1073 6912e45
update transaction logic and add more tests
blink1073 88e6067
verify commitTransaction fails after 5 backoff attempts
blink1073 2019678
clean up transactions spec
blink1073 d3ce32b
update test names
blink1073 de7e862
address writeconcern on retries
blink1073 ab85a5b
add retryable get more tests
baileympearson eb10ddb
transaction test fixes
baileympearson 2b90697
deduplicate ids
baileympearson 0abf373
update transaction writeconcern logic and add changelog entry
blink1073 d07d49e
last few comments
baileympearson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| # Client Backpressure Tests | ||
|
|
||
| ______________________________________________________________________ | ||
|
|
||
| ## Introduction | ||
|
|
||
| The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of | ||
| retryable reads. These tests utilize the [Unified Test Format](../../unified-test-format/unified-test-format.md). | ||
|
|
||
| Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to | ||
| be manually implemented by each driver. | ||
|
|
||
| ### Prose Tests | ||
|
|
||
| #### Test 1: Operation Retry Uses Exponential Backoff | ||
|
|
||
| Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. | ||
|
|
||
| 1. Let `client` be a `MongoClient` | ||
| 2. Let `collection` be a collection | ||
| 3. Now, run transactions without backoff: | ||
| 1. Configure the random number generator used for jitter to always return `0` -- this effectively disables backoff. | ||
|
|
||
| 2. Configure the following failPoint: | ||
|
|
||
| ```javascript | ||
| { | ||
| configureFailPoint: 'failCommand', | ||
| mode: 'alwaysOn', | ||
| data: { | ||
| failCommands: ['insert'], | ||
| errorCode: 2, | ||
| errorLabels: ['SystemOverloadedError', 'RetryableError'] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| 3. Insert the document `{ a: 1 }`. Expect that the command errors. Measure the duration of the command execution. | ||
|
|
||
| ```javascript | ||
| const start = performance.now(); | ||
| expect( | ||
| await coll.insertOne({ a: 1 }).catch(e => e) | ||
| ).to.be.an.instanceof(MongoServerError); | ||
| const end = performance.now(); | ||
| ``` | ||
|
|
||
| 4. Configure the random number generator used for jitter to always return `1`. | ||
|
|
||
| 5. Execute step 3 again. | ||
|
|
||
| 6. Compare the two time between the two runs. | ||
| ```python | ||
| assertTrue(with_backoff_time - no_backoff_time >= 2.1) | ||
| ``` | ||
| The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two | ||
| runs. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specification introduces the overload retry policy, but similarly to the design, omits a very important piece: how should the current retry policy and the overload retry policy coexist? At the very least, the specification should cover the following (generally speaking, it's better if it does that by clearly expressing a principle from which the answers may be easily derived, rather than answering each question explicitly, as there may be more questions that have to be answered that I did not think about at the moment):
1.1. I suspect, it currently is not, because the overload retry policy for now requires both
RetryableErrorandSystemOverloadedErrorto be present. However, he specification should make the answer clear.2.1. The same question is for two attempts
a(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.2.1.1. Note that such a situation may be encountered more than once for a single operation.
3.1. The same question is for two attempts
a(n),a(n+1)where the latter immediately1 follows the former, with the former,a(n), not being the first attempt.3.1.1. Note that such a situation may be encountered more than once for a single operation.
1 In terms of ordering relations, not in the temporal sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated pseudocode should answer these questions - let me know if there's anything else you'd like clarified.