feat: dynamic ingress control for running sandboxes#2093
Open
feat: dynamic ingress control for running sandboxes#2093
Conversation
Add API endpoint to dynamically update sandbox egress network rules on a running sandbox. The handler updates the API sandbox store atomically; the orchestrator gRPC call is stubbed for now (stage 2). Includes integration tests for success, not-found, unauthorized, clear rules, replace rules, and multiple sequential updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Found The previous ID "nonexistent-sandbox-id" had too many hyphens, failing ID validation (400) before reaching the not-found path (404). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full gRPC flow for runtime sandbox network config updates: - Proto: SandboxUpdateNetworkRequest + UpdateNetwork RPC - Orchestrator: UpdateNetwork handler updates nftables (non-TCP) and in-memory egress config (TCP proxy) with RWMutex thread safety - API: normalize bare IPs to CIDR notation before gRPC call - Slot.UpdateInternet: reset + re-apply firewall rules in sandbox netns - Thread-safe networkEgress field on Metadata (same pattern as endAt) - Integration tests verify actual connectivity changes (not just 204s) - test-network-update Makefile target for convenience Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dedicated test target Remove unused *sandbox.Sandbox return value from UpdateSandboxNetworkConfig since no caller uses it. Remove dedicated test-network-update Makefile target in favor of the generic test/% pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The creation path splits AllowOut entries into CIDRs and domains via ParseAddressesAndDomains, but the update path only handled CIDRs. Now the update path mirrors the creation path: domains are extracted, the default nameserver is added when domains are present, and domains are passed through gRPC to the orchestrator's in-memory config for the TCP proxy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that network egress rules set via PUT /sandboxes/{id}/network
survive a pause/resume cycle — the resumed sandbox should still enforce
the same allow/deny rules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that dynamically adding a domain to AllowOut (via TLS SNI matching in the TCP proxy) works, and that removing the domain re-blocks traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pdate API docs Consolidate startedAtMu/endAtMu/networkEgressMu into single rwmu. GetNetwork() now always returns non-nil with non-nil Ingress, removing nil checks in proxy.go and tcpfirewall. Update openapi.yml allowOut/denyOut descriptions to document domain support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…se nil-safe getters Validate all CIDRs upfront in UpdateInternet to prevent partial firewall state if an invalid CIDR is passed after rules have been reset. Add missing 409 Conflict response to the PUT /network OpenAPI spec. Use GetIngress() nil-safe getter in proxy.go for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After refactoring GetNetwork() to read egress from the mutex-protected networkEgress field, tests that construct Metadata directly must call SetNetworkEgress() to mirror what the constructor does. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o lev-allow-deny-dynamic
Move the sandbox store mutation after the gRPC call to the orchestrator node. Previously the store was updated first, so a failed gRPC call left the store with the new config while the sandbox still enforced the old rules. Now: Get (read-only) → gRPC → Update (persist). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…at-a-time functions All firewall set mutations (ConfigureInternet, UpdateInternet, ResetInternet, and the constructor) now go through ReplaceUserRules which buffers all 4 set replacements and issues a single atomic nftables Flush. Removes AddAllowedCIDR, AddDeniedCIDR, Reset, ResetDeniedSets, ResetAllowedSets, and addCIDRToSet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The firewall_toolkit's ValidateAddress rejects 0.0.0.0 as "unspecified", which broke sandbox creation with internetAccess=false after the atomic ReplaceUserRules refactoring. The old addCIDRToSet bypassed the toolkit for this case by creating raw nftables elements directly; restore that bypass in clearAndReplaceCIDRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the create path's validation in the PUT /sandboxes/{id}/network
handler: reject non-IP/CIDR values in denyOut (400 instead of 500),
and require 0.0.0.0/0 in denyOut when allowOut contains domains.
Add integration tests for both cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… table-driven suite Single sandbox, sequential subtests covering: 8 rejected inputs, 10 accepted inputs, 12 firewall rule steps (deny-all, allow precedence, replace semantics, domain/IP/CIDR, clear/reapply, allow-without-deny), and pause/resume rule preservation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract validateEgressRules from validateNetworkConfig so the network update handler reuses the same validation logic as sandbox creation. Switch to ReportErrorByCode so error severity matches the HTTP status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reuse the existing SandboxNetworkEgressConfig message in SandboxUpdateNetworkRequest instead of repeating flat fields. Change UpdateInternet to accept the egress proto message directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aradigm Eliminates TOCTOU race by checking sandbox state atomically inside the store update function, and reorders to store-first then node-call, consistent with KeepAliveFor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Once UpdateInternet is called, the slot has been explicitly touched and ResetInternet should always clean up, even if the update cleared all rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tional apply/revert Combine the separate UpdateNetwork and Update gRPC RPCs into a single Update RPC that handles both end_time and egress changes. Updates are applied sequentially using a functional updater pattern — on failure, already-applied changes are rolled back in reverse order. Publish a SandboxUpdatedEvent for any successful update, including network egress details (allowed/denied CIDRs, allowed domains) in the event data alongside the existing set_timeout field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The networkEgress field is set explicitly by the Update handler after UpdateInternet succeeds. Initializing it in ResumeSandbox was not needed by any change in this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit (ed87f50) incorrectly removed networkEgress initialization from ResumeSandbox, assuming it was only needed for the dynamic update path. However, the API's Create handler also calls ResumeSandbox, so the egress config was never populated — causing the TCP proxy to see nil egress and allow all traffic through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- buildEgressConfig uses ParseAddressesAndDomains again instead of inlined IP/domain split loop - Replace undefined ErrMsg* const references in tests with actual message strings - Fix stale IPv6 comment in validateIngressRules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove 4 duplicate/redundant validation tests, keeping the more interesting variant of each (wildcard domain over plain domain, mixed domain+CIDR over single domain). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant nil check in parseIngressRules (make handles empty) - Set AllowPublicAccess on ingressUpdate before assigning to sbx.Network Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace mutex-protected proto config with atomic Egress/Ingress pointers on sandbox.Config. Proto types are now only used at the gRPC boundary. - Add Egress and Ingress structs in sandbox-network with domain matching, rule checking, and port filtering - Config.SetNetworkEgress/SetNetworkIngress swap atomically and return the old value for rollback - Separate egress/ingress update and rollback paths in the server handler - Move matchDomain from tcpfirewall into Egress.MatchDomain - Extract network functions from sandbox.go into sandbox/network.go - Rename rule.go to access_control.go - Rewrite TestIsEgressAllowed to use runtime types instead of proto - Remove AllPorts/HasPort; fold logic into PortInRange Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete IngressRule proto message; ingress now uses repeated string allowed_cidrs/denied_cidrs in CIDR[:port[-port]] format, matching egress. Merge parseEgressRules and parseIngressRules into a single ParseValidRules in sandbox-network. Strings pass through from API to orchestrator without intermediate conversion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…duplicated Restore original allowOut/denyOut/maskRequestHost descriptions per review. Keep PUT /network endpoint descriptions self-contained instead of referencing SandboxNetworkConfig. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always send an IPv4 client IP in X-Forwarded-For for ingress tests. CI runners may connect via IPv6, which is unconditionally denied by the IPv6 fail-closed guard. Use TEST-NET-2 (198.51.100.99) as the default client IP when no specific fromIP is provided. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ev-ingress-control
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jakub Dobry <jakub.dobry8@gmail.com>
…ev-ingress-control
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eral Replace all "0.0.0.0/0" string literals with sandbox_network.AllTraffic in tests and production code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove proto.Clone — EgressFromProto/IngressFromProto create new values, original proto is not modified after handoff - Build Egress/Ingress before Config, set via SetNetworkEgress/Ingress - Remove NewConfigWithNetwork — no longer needed - Deny-all assignment when internet disabled now uses parsed rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntIP
ExtractExternalClientIP reads from XFF/RemoteAddr (used by client-proxy
at the edge). ExtractE2BClientIP reads only the internal E2B header
(used by orchestrator proxy). Missing header returns empty string,
which triggers fail-closed deny via net.ParseIP("") == nil.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
levb
commented
Mar 20, 2026
| wantErrMsg: "invalid denied CIDR not-a-cidr", | ||
| wantErrMsg: `invalid denied CIDR not-a-cidr`, | ||
| }, | ||
| // Domain validation tests |
Contributor
Author
There was a problem hiding this comment.
Removed because these are already covered by the wildcard tests, redundant
…nsts - Fix internal/ to pkg/ import paths after orchestrator restructure - Revert AllTraffic rename back to AllInternetTrafficCIDR to minimize diff from main - Restore ErrMsgDomainsRequireBlockAll and ErrMsgAllowInRequiresBlockAll consts for test readability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reduce assertBlockedHTTPRequest to 1s connect / 2s max (from 3s/5s) - Reduce assertSuccessfulHTTPRequest to 1s connect / 2s max (from 5s/10s) - Replace google.com with IP in step 8 to avoid 15s DNS resolver timeout - Step 8 drops from 20s to 0.13s, total TestUpdateNetworkConfig from 36s to 10s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f912e52 to
35ae165
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends
PUT /sandboxes/{sandboxID}/networkwith ingress filtering and host masking. All fields (egress + ingress) are replaced on every PUT — clients must send the full desired network config.Changes
Ingress filtering (
allowIn/denyIn)10.0.0.0/8:80-443)allowInrequiresdenyInto include0.0.0.0/0(explicit deny-all-first)Host masking (
maskRequestHost){port}placeholder for dynamic port substitutionnullto clearClient IP flow
X-Forwarded-For(second-to-last entry, prevents spoofing)X-E2B-Client-IPheader, stripped by the orchestrator proxy before reaching the sandboxPre-parsed ACLs
Egress/Ingressstructs (shared/pkg/sandbox-network/)Other
AllowPublicAccessis preserved across updates (not changeable via PUT)