Stream volume file uploads and downloads instead of buffering in memory#1453
Conversation
…memory
- Volume.writeFile/write_file: stream ReadableStream (JS, non-browser) and
file-like objects (Python) to the API instead of buffering them in memory
- Sandbox.files.write with octet-stream upload: stream ReadableStream data
(JS, non-browser) and file-like objects (Python), with chunked gzip
compression
- Python Sandbox.files.read(format="stream"): stream the response body
instead of downloading it into memory before iterating (sync and async)
- JS Sandbox.files.read({ format: 'stream' }): bound only the initial
handshake by the request timeout instead of killing an actively-consumed
stream; the user signal can still cancel it mid-stream
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Empty files short-circuit response parsing (Content-Length: 0), so Sandbox.files.read() with format 'blob' returned '' and Volume.readFile() with 'blob'/'stream' returned undefined. Return an empty Blob or ReadableStream matching the requested format instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- js-sdk: handle empty files explicitly in the bytes path of read() instead of relying on new Uint8Array(undefined) coercion - python-sdk: bound the request timeout to the initial handshake for read(format="stream") in both sync and async implementations, matching the JS SDK behavior; document the semantics in the stream overloads Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ite-file-upload # Conflicts: # packages/js-sdk/src/sandbox/filesystem/index.ts # packages/python-sdk/e2b/sandbox_async/filesystem/filesystem.py # packages/python-sdk/e2b/sandbox_sync/filesystem/filesystem.py
read(format="stream") returned a bare (async) generator whose finally only closed the response if iteration had begun. A reader that was created but never consumed (or never started) held its pooled connection open until the client was closed, leaking connections. Wrap the streamed response in FileStreamReader / AsyncFileStreamReader, which: - release the connection when the stream is fully consumed or errors, - expose deterministic cleanup via close()/aclose() and (async) context manager support, - register a weakref.finalize safety net so an abandoned reader releases its connection on garbage collection (the async variant schedules aclose() on the running loop). Both remain Iterator[bytes] / AsyncIterator[bytes], so existing usage is unchanged. Adds credential-free unit tests covering consume/context manager/close/GC, plus live-sandbox tests for the context manager and partial-then-close paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Match the Python SDK's connection lifecycle for read(format='stream'): - explicitly cancel the unconsumed error body before propagating, instead of relying solely on the abort controller (parity with r.close()) - add a FinalizationRegistry safety net so an abandoned stream releases its connection on GC, mirroring Python's weakref.finalize on FileStreamReader Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eader The async stream reader's garbage-collection safety net (_schedule_response_aclose via weakref.finalize) was best-effort at best: loop.create_task(aclose()) from a finalizer is not thread-safe, has no guarantee of running before loop teardown, and is useless once the loop is gone. Remove it and rely on the cleanup that actually works—auto-close on full consume / read error, aclose(), and the async context manager. The sync FileStreamReader keeps its weakref.finalize(response.close) net, which is reliable because close() is synchronous. Document that an abandoned async stream holds its pooled connection until the client is closed, and update the unit test accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-ups on the stream upload/download work, applying the established stream policy consistently and addressing review findings: - volumes: drop the client read timeout on `read_file`/`readFile` streams (Python `httpx.Timeout(..., read=None)`, JS handshake-bounded controller + `wrapStreamWithConnectionCleanup`), matching the sandbox files stream path and the RPC streams. The request timeout now bounds only the handshake, not body consumption. - JS sandbox streaming uploads: use the file-transfer timeout (1h) instead of the 60s request default so large streamed uploads aren't aborted mid-transfer; buffered uploads keep the short default. Centralize `FILE_TIMEOUT_MS` in connectionConfig and reuse it from volume. - JS: factor the stream cleanup + GC-finalizer logic into a shared `wrapStreamWithConnectionCleanup` used by both sandbox files and volumes. - stream handshake error mapping (Bugbot): map dropped connections during the stream handshake to typed, health-checked errors — JS via `handleEnvdApiFetchError`, Python via the `httpx.RemoteProtocolError` wrapper — mirroring the non-stream read paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streaming an upload only happens on the octet-stream path; the multipart path buffers (`toBlob` in JS, `.read()` for text file-likes in Python), so with the old `useOctetStream`/`use_octet_stream` default of false a streamed write was silently buffered into memory. Default the flag to auto-detect instead: use octet-stream when any write entry is streamable (JS `ReadableStream`; Python file-like / non-str-bytes), and `multipart/form-data` otherwise. Browsers stay on multipart since they can't stream request bodies. An explicit flag value still wins, gzip still implies octet-stream, and the old-envd fallback is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streamed (file-like) sandbox writes used the 60s request timeout for the write phase, so a large or slow streamed upload could trip WriteTimeout while the body was still being sent — inconsistent with the JS SDK (1h) and Python volume writes (1h). Relax the write timeout to FILE_TIMEOUT (1h) when any write entry is streamable, keeping connection setup and the response read bounded by the request timeout. Buffered str/bytes uploads keep the request timeout. FILE_TIMEOUT is shared via e2b/connection_config.py, mirroring the JS SDK's FILE_TIMEOUT_MS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Revert the volume read/write streaming changes so this PR is scoped to the sandbox files streaming work. The volume changes land in a follow-up PR that builds on the shared streaming infrastructure introduced here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stream `Volume.writeFile`/`write_file` request bodies and bound only the `Volume.readFile`/`read_file` stream handshake with the request timeout, reusing the shared streaming infrastructure from the sandbox files PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
🦋 Changeset detectedLatest commit: 90ae65f The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
PR SummaryMedium Risk Overview
Reviewed by Cursor Bugbot for commit 90ae65f. Bugbot is set up for automated code reviews on this repo. Configure here. |
Package ArtifactsBuilt from 859bb7b. Download artifacts from this workflow run. JS SDK ( CLI ( Python SDK ( |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…ite-volumes # Conflicts: # packages/js-sdk/src/connectionConfig.ts
…ite-file-upload-v1 # Conflicts: # packages/js-sdk/src/connectionConfig.ts
A fully consumed stream returns its connection to the pool, where it can linger as an idle keep-alive entry until the server-side close is observed. Asserting on total pool size therefore flaked under load (test_sync_full_ consume_releases_connection saw 1 instead of 0 on CI). Count only checked-out (non-idle) connections, which is what the helper name promises and is the actual leak condition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FinalizationRegistry safety net for `read({ format: 'stream' })` only ran
`cleanup()` (aborting the handshake AbortController), unlike the cancel and
error paths which explicitly cancel the response body to release the pooled
envd connection. Abandoned streams could leave connections checked out until
the client was torn down. Mirror the cancel/error paths (and the Python sync
finalizer's `response.close`) by cancelling the body reader before cleanup.
Adds unit tests for wrapStreamWithConnectionCleanup, including a GC-abandonment
test (needs --expose-gc, enabled for the connectionConfig vitest project).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the FinalizationRegistry (JS) and weakref.finalize (sync Python) safety nets on streamed reads in favor of a deterministic idle-read timeout that reclaims a stalled stream's pooled connection. Python maps it to httpx's per-chunk read timeout; JS arms a per-chunk timer that aborts the request controller. Configurable via streamIdleTimeoutMs / stream_idle_timeout (default 60s, 0/None disables), and the consume/close contract is now documented consistently across all three readers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bound a stalled streamed transfer with a per-chunk idle timeout (default the request timeout, configurable via streamIdleTimeoutMs / stream_idle_timeout, 0/None disables) on both reads and writes, so a producer or consumer that stops making progress no longer holds the pooled connection. Reads map it to httpx's per-chunk read timeout / a JS idle-abort wrapper; writes use httpx's per-write timeout / a JS upload idle-abort wrapper. The total-transfer cap is intentionally left to the server (envd): a client cap is advisory, can't protect against non-conforming clients, and would mean maintaining the same ceiling across three SDKs. The pre-existing client-side 1h upload total is removed for consistency with reads. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aiter_io_chunks / agzip_iter were async generators doing synchronous file reads and zlib compression inline, stalling the asyncio event loop for the duration of those operations on large AsyncSandbox uploads. Offload both to a worker thread via asyncio.to_thread. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Arm the JS read idle timer only around the network read and clear it the moment a chunk arrives, so a slow or paused consumer no longer trips it; it fires only when the server stops sending mid-stream (a held-but-unread stream is reclaimed server-side). Matches Python's httpx read timeout, which only counts during socket reads. Drop the JS upload idle wrapper: it bounded producer latency (local), not the upload wire (not observable through fetch). Stalled uploads are bounded server-side or via the caller's signal; Python keeps its per-write httpx timeout, which does bound the wire. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After the filesystem streaming revert, FILE_TIMEOUT is used only by the volume client; sandbox filesystem streaming bounds each chunk by the request timeout and leaves the total to the server. Reword the comment so it no longer reads as a general streaming-transfer timeout (addresses a Bugbot review note). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imeout The per-write httpx timeout on streamed uploads guards a stuck socket write (server stops reading); it can't observe the opposite direction. Record that envd >= 0.6.7's per-read idle timeout backstops that case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stack the volume streaming work on top of file-upload's refactored shared infrastructure instead of the pre-refactor snapshot it was cut from. - connectionConfig: replace the GC FinalizationRegistry safety net with the idle-read-timeout `wrapStreamWithConnectionCleanup`. - volume.readFile(stream): pass `controller` + `idleTimeoutMs` and expose a `streamIdleTimeoutMs` option (JS) / `stream_idle_timeout` (Python), defaulting to the request timeout. - volume.writeFile: drop the client-side request timeout for streamed uploads (bounded server-side / via signal), matching the sandbox upload path. - volume/client: define FILE_TIMEOUT_MS locally now that connectionConfig no longer exports it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dc59e62 to
24f99d2
Compare
Introduces a named VolumeReadOpts type for the readFile stream options
(replacing the inline `{ streamIdleTimeoutMs?: number }` literal) and
renames VolumeWriteOptions -> VolumeWriteOpts and VolumeMetadataOptions
-> VolumeMetadataOpts, matching the sandbox filesystem `*Opts`
convention (FilesystemReadOpts / FilesystemWriteOpts).
The old VolumeWriteOptions and VolumeMetadataOptions names are kept as
deprecated type aliases, so this is non-breaking.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
44eebd6 to
11254d8
Compare
matthewlouisbrockman
left a comment
There was a problem hiding this comment.
hmm, don't think there's timeouts on this is there?
…1433) ## Description Removes full in-memory buffering from the SDK **sandbox** file-transfer paths, in both JS and Python (sync + async). **Streamed uploads** — `Sandbox.files.write` / `write_files` streams `ReadableStream` (JS, outside the browser) and file-like (Python) input to the sandbox with chunk-by-chunk gzip compression, instead of buffering the whole body in memory. `useOctetStream`/`use_octet_stream` now defaults to auto-detect — octet-stream when any entry is streamable (so streamed uploads aren't silently buffered), `multipart/form-data` otherwise; browsers always use `multipart/form-data` since streaming request bodies aren't supported there. A streamed upload is bounded by a per-chunk timeout on the wire (Python's per-write `httpx` timeout, default the request timeout); a stalled upload the wire can't observe is bounded server-side. On Python's `AsyncSandbox`, the blocking file reads and gzip compression of a streamed upload now run in a worker thread so a large upload doesn't stall the event loop. **Streamed downloads** — `Sandbox.files.read(format="stream")` now streams the response body from the sandbox instead of downloading it into memory before iterating (Python sync + async), and the 60s request timeout no longer kills the stream while it's being consumed: - The request timeout now bounds only the initial handshake. - The body is bounded by a per-chunk **idle-read timeout** on the wire — a per-`read()` option (`streamIdleTimeoutMs` in JS, `stream_idle_timeout` in Python; default the request timeout — 60s — `0`/`None` to disable). It's armed only while waiting on a network read and cleared the moment a chunk arrives, so it aborts only when the server stops sending mid-stream; a slow or paused consumer never trips it (a held-but-unread stream is reclaimed server-side, not by this timer). - A dropped connection during the handshake surfaces the same typed, health-checked error as non-stream reads. In JS, `signal` can still cancel an in-flight stream. - The stream holds its pooled connection until it is consumed to the end, cancelled/closed, errors, or the idle timeout fires — consume it fully, use the context manager, or close it. (This replaces the earlier GC-finalizer net.) Python returns a `FileStreamReader`/`AsyncFileStreamReader` supporting deterministic cleanup via `close()`/`aclose()` and (async) context-manager use; both still satisfy `Iterator[bytes]`/`AsyncIterator[bytes]`, so existing iteration is unchanged. **Empty files** — JS `Sandbox.files.read()` with `blob` or `stream` format now returns a format-correct empty value (empty `Blob` / empty `ReadableStream`) for empty files instead of `""`. > [!NOTE] > The equivalent **volume** streaming changes (`Volume.writeFile`/`write_file`, `Volume.readFile`/`read_file` streams) live in a follow-up PR, #1453, which is based on this branch. ## Usage ```ts // JS: upload a large file without holding it in memory const file = createReadStream('large.bin') await sandbox.files.write('large.bin', Readable.toWeb(file), { gzip: true }) // JS: consume a download for longer than 60s without it being killed const stream = await sandbox.files.read('large.bin', { format: 'stream' }) for await (const chunk of stream) { /* ... */ } // JS: tune (or disable) the per-chunk idle-read timeout for a read const stream = await sandbox.files.read('large.bin', { format: 'stream', streamIdleTimeoutMs: 120_000, // 0 to disable }) // JS: empty files now return format-correct empty values const blob = await sandbox.files.read('empty.txt', { format: 'blob' }) // Blob (size 0), not '' ``` ```python # Python: streamed upload and download with open("large.bin", "rb") as f: sandbox.files.write("large.bin", f, gzip=True) for chunk in sandbox.files.read("large.bin", format="stream"): ... # Python: deterministic cleanup when not reading the stream to the end with sandbox.files.read("large.bin", format="stream") as stream: first_chunk = next(iter(stream)) # connection released on block exit # Python: tune (or disable) the per-chunk idle-read timeout for a read for chunk in sandbox.files.read( "large.bin", format="stream", stream_idle_timeout=120.0 # None to disable ): ... ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want fixes drafted automatically? Bugbot Autofix can create code changes for findings. A team admin can enable Autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b9039a2. Configure here.
The constant was dead code: every consumer imports FILE_TIMEOUT from e2b.volume.connection_config, which has its own copy. Keeping a second definition risked the two drifting apart. Move the explanatory comment to the volume constant that's actually used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Follow-up to #1433. Builds on the shared streaming infrastructure introduced there (
FILE_TIMEOUT_MS, request-controller/stream-cleanup helpers inconnectionConfig,io_utilschunk iterators, theruntimeguard) and applies the same streaming model to volumes.Note
Based on
mishushakov/stream-write-file-upload(#1433). Merge that PR first; this PR's diff will then retarget tomainautomatically.What changed
Volume.writeFile()/Volume.write_file()— stream the request body instead of buffering it in memory.ReadableStreamdata is streamed outside the browser (half-duplex); browsers still buffer since they can't stream request bodies.Volume.readFile(format="stream")/read_file(format="stream")— the request timeout now bounds only the initial handshake, not the body read, matching the sandboxfiles.readstream path. A dropped connection during the handshake surfaces the same typed, health-checked error; JS supportssignalto cancel an in-flight stream and cancels unconsumed bodies on error so the pooled connection is released.Usage
JS — stream a file straight to a volume without buffering:
Python — stream a file-like object:
Testing
pnpm run format,pnpm run lint,pnpm run typecheckpass.tests/volume/file.test.ts; Python sync/asynctest_file.pytext-stream cases).🤖 Generated with Claude Code