Skip to content

Fix data race in NetworkTransport send/receive continuations#207

Closed
1amageek wants to merge 3 commits intomodelcontextprotocol:mainfrom
1amageek:fix/network-transport-data-race-clean
Closed

Fix data race in NetworkTransport send/receive continuations#207
1amageek wants to merge 3 commits intomodelcontextprotocol:mainfrom
1amageek:fix/network-transport-data-race-clean

Conversation

@1amageek
Copy link

@1amageek 1amageek commented Mar 6, 2026

Summary

  • Remove Task { @MainActor in } wrapper and mutable var continuation guard flags (sendContinuationResumed / receiveContinuationResumed) that caused data races under Swift 6 strict concurrency
  • Resume CheckedContinuation directly in NWConnection callbacks — Apple guarantees these fire exactly once, making double-resume guards unnecessary
  • Extract reconnection logic into actor-isolated handleSendError(_:) method for safe access to isStopping, isConnected, and reconnectionConfig

Problem

The send() and receiveData() methods use local var flags captured by @Sendable closures to guard against double continuation resume:

var sendContinuationResumed = false

connection.send(..., completion: .contentProcessed { error in
    Task { @MainActor in
        if !sendContinuationResumed {
            sendContinuationResumed = true  // data race: mutable var in @Sendable closure
            continuation.resume(...)
        }
    }
})

These flags are neither actor-isolated nor otherwise synchronized, resulting in data race diagnostics under Swift 6 strict concurrency checking.

Fix

Since NWConnection.send(completion: .contentProcessed) and NWConnection.receive callbacks are each invoked exactly once per call, the double-resume guard is unnecessary. The fix:

  1. Removes the var flags and Task { @MainActor in } wrappers
  2. Calls continuation.resume() directly in the NWConnection callback
  3. Moves reconnection logic to handleSendError(_:), an actor-isolated method that safely accesses actor state

Test plan

  • Existing 16 tests pass
  • 5 new tests added and passing (21 total):
    • Concurrent sends do not cause data races — 100 parallel sends to verify no continuation race
    • Send error resumes continuation and triggers reconnection — error resumes immediately, handleSendError reconnects asynchronously
    • Send error without reconnection enabled does not reconnect — no reconnection when disabled
    • Receive error resumes continuation without crash — receive error surfaces correctly
    • Concurrent sends with intermittent errors — mixed success/failure verifies continuation always resumes exactly once
  • No data race warnings under Swift 6 strict concurrency

1amageek added 3 commits March 6, 2026 14:19
Remove Task { @mainactor in } wrapper and mutable continuation guard
flags that caused data races under Swift 6 strict concurrency.

NWConnection completion handlers are guaranteed to fire exactly once,
making the double-resume guard unnecessary.

- Extract reconnection logic into actor-isolated handleSendError()
- Resume continuations directly in NWConnection callbacks
- Simplify receiveData() by removing unnecessary guard pattern
- Concurrent sends: 100 parallel sends to surface continuation races
- Send error with reconnection: verify continuation resumes and
  handleSendError triggers reconnection asynchronously
- Send error without reconnection: verify no reconnection attempt
- Receive error: verify continuation resumes without crash
- Concurrent sends with intermittent errors: mix of success/failure
  to verify continuation is always resumed exactly once
- handleSendError: remove fixed 500ms sleep before reconnection.
  connect() already handles backoff via ReconnectionConfiguration.
  Properly catch connect() errors instead of swallowing with try?.
- Tests: remove Task.sleep-based assertions. Verify send() throws
  immediately; reconnection behavior is covered by existing tests.
@movetz movetz requested a review from yehorsobko-mac March 11, 2026 14:55
@yehorsobko-mac
Copy link
Collaborator

Thanks for the PR, I suppose the issue should be fixed here - https://github.com/modelcontextprotocol/swift-sdk/pull/206 but you PR definitely give some thoughts about general improvement of NetworkTransport. I will close this PR because the issue should be fixed already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants