chore(api): log reason for failing to connect, resume sandboxes on state change by matthewlouisbrockman · Pull Request #2145 · e2b-dev/infra

matthewlouisbrockman · 2026-03-16T22:00:34Z

currently, we drop the err from err = a.orchestrator.WaitForStateChange(ctx, teamID, sandboxID). This makes it hard for us to know why requests during pausing errored.

This change logs the dropped err so we can get better confidence in why we're gettign 500 errors during state transitions.

also updates the error logging to use the telemetry.reportCriticalError to maintain Error text while helping with the spans

Note

Low Risk
Low risk: changes are limited to additional telemetry/error reporting on existing failure paths, without altering core sandbox state logic.

Overview
Adds telemetry.ReportCriticalError calls in sandbox connect/resume handlers to capture the underlying errors (including WaitForStateChange failures, unexpected non-running/unknown states, snapshot fetch failures, and secure envd token errors) and to attach sandbox/team/build/template identifiers for easier debugging of 500s during state transitions.

^{Written by Cursor Bugbot for commit ab21f5a. This will update automatically on new commits. Configure here.}

claude · 2026-03-16T22:02:54Z

packages/api/internal/handlers/sandbox_connect.go


 		err = a.orchestrator.WaitForStateChange(ctx, teamID, sandboxID)
 		if err != nil {
+			logger.L().Error(ctx, "Error waiting for sandbox state change",


WaitForStateChange returns ctx.Err() when the polling loop context is cancelled (state_change.go:250). A normal client disconnect will trigger this path, logging a spurious Error and attempting to send a 500 to a client that is already gone.

Consider skipping the Error log when the error is a context cancellation or deadline exceeded.

making 'em debug

claude · 2026-03-16T22:02:59Z

packages/api/internal/handlers/sandbox_resume.go

 			logger.L().Debug(ctx, "Waiting for sandbox to pause", logger.WithSandboxID(sandboxID))
 			err = a.orchestrator.WaitForStateChange(ctx, teamID, sandboxID)
 			if err != nil {
+				logger.L().Error(ctx, "Error waiting for sandbox to pause",


Same issue: WaitForStateChange returns ctx.Err() on context cancellation, so client disconnects will be logged at Error level. Filter context.Canceled / context.DeadlineExceeded before logging.

making 'em debug, i want to know they're happening

dobrac

you can use the telemetry.ReportError and telemetry.ReportCriticalError instead to also populate the traces correctly. It does logging automatically

matthewlouisbrockman · 2026-03-16T22:25:57Z

you can use the telemetry.ReportError and telemetry.ReportCriticalError instead to also populate the traces correctly. It does logging automatically

shoud we add him to logger.L().Error(ctx, "Error getting last snapshot", logger.WithSandboxID(sandboxID), zap.Error(err)) too to debug those guys? (i think it's a 30s timeout somewhere but can't find it yet)

matthewlouisbrockman · 2026-03-17T01:42:36Z

eh, can worry about getting the reason for the failed db guys later, pretty sure those are all just timing out on the 30 second callback from the redis route

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

packages/api/internal/handlers/sandbox_connect.go

…r debug makes sense

log reason for failing to connect, resume sandboxes on state change

2c55a0d

e2b-request-same-site-reviewers bot assigned djeebus Mar 16, 2026

claude bot reviewed Mar 16, 2026

View reviewed changes

matthewlouisbrockman added 2 commits March 16, 2026 15:08

make the logs debug cause it's just debugging if we want to change these

272e8f0

Merge branch 'main' into pause-resume-failure-logs

2b32998

matthewlouisbrockman marked this pull request as ready for review March 16, 2026 22:09

matthewlouisbrockman requested review from ValentaTomas, dobrac and jakubno as code owners March 16, 2026 22:09

dobrac requested changes Mar 16, 2026

View reviewed changes

switch to ReportError instead of logger

c9f20f3

djeebus approved these changes Mar 16, 2026

View reviewed changes

need to manually pass team id, sandbox id in i think

ca10545

matthewlouisbrockman requested a review from dobrac March 17, 2026 01:53

matthewlouisbrockman changed the title ~~log reason for failing to connect, resume sandboxes on state change~~ observability: log reason for failing to connect, resume sandboxes on state change Mar 17, 2026

matthewlouisbrockman changed the title ~~observability: log reason for failing to connect, resume sandboxes on state change~~ chore[api]: log reason for failing to connect, resume sandboxes on state change Mar 17, 2026

matthewlouisbrockman changed the title ~~chore[api]: log reason for failing to connect, resume sandboxes on state change~~ chore(api): log reason for failing to connect, resume sandboxes on state change Mar 17, 2026

update errors in sandbox connect to use telemetry

cba89e7

cursor bot reviewed Mar 17, 2026

View reviewed changes

packages/api/internal/handlers/sandbox_connect.go Outdated Show resolved Hide resolved

matthewlouisbrockman added 2 commits March 17, 2026 00:14

move telemetry for sandbox resume

512e864

stick with critical error for now, can downgrade later as see if erro…

ab21f5a

…r debug makes sense

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(api): log reason for failing to connect, resume sandboxes on state change#2145

chore(api): log reason for failing to connect, resume sandboxes on state change#2145
matthewlouisbrockman wants to merge 8 commits intomainfrom
pause-resume-failure-logs

matthewlouisbrockman commented Mar 16, 2026 •

edited

Loading

Uh oh!

claude bot Mar 16, 2026

Uh oh!

matthewlouisbrockman Mar 16, 2026

Uh oh!

claude bot Mar 16, 2026

Uh oh!

matthewlouisbrockman Mar 16, 2026

Uh oh!

dobrac left a comment •

edited

Loading

Uh oh!

matthewlouisbrockman commented Mar 16, 2026

Uh oh!

matthewlouisbrockman commented Mar 17, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthewlouisbrockman commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

matthewlouisbrockman Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

matthewlouisbrockman Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

dobrac left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewlouisbrockman commented Mar 16, 2026

Uh oh!

matthewlouisbrockman commented Mar 17, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthewlouisbrockman commented Mar 16, 2026 •

edited

Loading

dobrac left a comment •

edited

Loading