Skip to content

fix(orch): keep the sandbox in the map even after it starts shutting down to handle ip reverse lookups#2116

Draft
jakubno wants to merge 9 commits intomainfrom
chore/error-finding-sandbox-after-kill
Draft

fix(orch): keep the sandbox in the map even after it starts shutting down to handle ip reverse lookups#2116
jakubno wants to merge 9 commits intomainfrom
chore/error-finding-sandbox-after-kill

Conversation

@jakubno
Copy link
Member

@jakubno jakubno commented Mar 12, 2026

Note

Medium Risk
Changes sandbox lifecycle state and eviction timing, which can affect routing/firewall lookups and cleanup correctness under concurrent deletes or slot reuse. Risk is moderated by CAS idempotency and guarded eviction, but timing-based behavior could still cause edge-case leaks or mis-association.

Overview
Keeps sandboxes discoverable by host IP for a short grace period after shutdown by introducing an explicit SandboxStatus (starting/running/dead) and replacing immediate map removal with MarkDead plus delayed eviction; live queries (Get, Items, Count) now ignore non-running entries while GetByHostPort can still resolve dead sandboxes. Also adds safeguards to prevent stale IP reuse and accidental eviction of replacement sandboxes (CAS state transition, pointer-checked eviction, and insert-time removal of older entries sharing the same IP).

Written by Cursor Bugbot for commit f923bee. This will update automatically on new commits. Configure here.

@jakubno jakubno changed the base branch from main to chore/simplify-sandobx-map-cleanup-flow March 12, 2026 08:50
// MarkRunning transitions a sandbox from starting to running and notifies OnInsert subscribers
func (m *Map) MarkRunning(sbx *Sandbox) {
sbx.started.Store(true)
sbx.status.Store(int32(Status`Running))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax error / compilation failure: Status followed by a backtick is an unterminated raw string literal in Go, so the package will not compile. This should be StatusRunning:

sbx.status.Store(int32(StatusRunning))

// notified immediately (so the proxy / firewall limiter can clean up), but the
// entry stays in the map for deadEvictionGracePeriod so that IP-based lookups
// still resolve while the Firecracker process finishes shutting down.
func (m *Map) MarkDead(ctx context.Context, sandboxID string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkDead is not idempotent, so OnRemove can fire twice in a concurrent scenario. The old Remove used RemoveCb(return exists) which was naturally idempotent: a second concurrent call would see exists=false and skip subscriber notification.

With MarkDead, the entry intentionally stays in the map for 30 s. If two goroutines both pass the Get/IsRunning() check before either calls MarkDead (a TOCTOU race on concurrent Delete RPCs for the same sandbox), both find the entry still present, both call sbx.status.Store(StatusDead), and both fire go m.trigger(OnRemove), notifying every subscriber twice.

Using CAS would prevent this:

if !sbx.status.CompareAndSwap(int32(StatusRunning), int32(StatusDead)) {
    return
}

@jakubno jakubno force-pushed the chore/error-finding-sandbox-after-kill branch from c14e526 to 7562220 Compare March 12, 2026 08:55
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Base automatically changed from chore/simplify-sandobx-map-cleanup-flow to main March 13, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants