Skip to content

feat: dynamic ingress control for running sandboxes#2093

Open
levb wants to merge 168 commits intomainfrom
lev-ingress-control
Open

feat: dynamic ingress control for running sandboxes#2093
levb wants to merge 168 commits intomainfrom
lev-ingress-control

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented Mar 11, 2026

Summary

Extends PUT /sandboxes/{sandboxID}/network with ingress filtering and host masking. All fields (egress + ingress) are replaced on every PUT — clients must send the full desired network config.

Changes

Ingress filtering (allowIn / denyIn)

  • Allow/deny rules by client IP/CIDR V4 with optional port or port-range (e.g. 10.0.0.0/8:80-443)
  • allowIn requires denyIn to include 0.0.0.0/0 (explicit deny-all-first)
  • Allow takes precedence over deny; unmatched traffic is allowed by default
  • Domains not supported
  • envd port is exempt from ingress restrictions

Host masking (maskRequestHost)

  • Replaces the Host header for all proxied sandbox requests
  • Supports {port} placeholder for dynamic port substitution
  • Set to null to clear

Client IP flow

  • Client proxy extracts real client IP from X-Forwarded-For (second-to-last entry, prevents spoofing)
  • Sets internal X-E2B-Client-IP header, stripped by the orchestrator proxy before reaching the sandbox

Pre-parsed ACLs

  • Egress and ingress rules parsed once at config time into Egress/Ingress structs (shared/pkg/sandbox-network/)
  • Egress ACL used by TCP firewall; ingress ACL evaluated per-request in orchestrator HTTP proxy

Other

  • AllowPublicAccess is preserved across updates (not changeable via PUT)
  • Network update applied transactionally on orchestrator (egress + ingress); rolled back on failure
  • Browser-friendly HTML error page for denied ingress requests

levb and others added 30 commits March 8, 2026 10:04
Add API endpoint to dynamically update sandbox egress network rules on
a running sandbox. The handler updates the API sandbox store atomically;
the orchestrator gRPC call is stubbed for now (stage 2).

Includes integration tests for success, not-found, unauthorized, clear
rules, replace rules, and multiple sequential updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Found

The previous ID "nonexistent-sandbox-id" had too many hyphens, failing
ID validation (400) before reaching the not-found path (404).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full gRPC flow for runtime sandbox network config updates:
- Proto: SandboxUpdateNetworkRequest + UpdateNetwork RPC
- Orchestrator: UpdateNetwork handler updates nftables (non-TCP) and
  in-memory egress config (TCP proxy) with RWMutex thread safety
- API: normalize bare IPs to CIDR notation before gRPC call
- Slot.UpdateInternet: reset + re-apply firewall rules in sandbox netns
- Thread-safe networkEgress field on Metadata (same pattern as endAt)
- Integration tests verify actual connectivity changes (not just 204s)
- test-network-update Makefile target for convenience

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dedicated test target

Remove unused *sandbox.Sandbox return value from UpdateSandboxNetworkConfig
since no caller uses it. Remove dedicated test-network-update Makefile target
in favor of the generic test/% pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The creation path splits AllowOut entries into CIDRs and domains via
ParseAddressesAndDomains, but the update path only handled CIDRs. Now
the update path mirrors the creation path: domains are extracted, the
default nameserver is added when domains are present, and domains are
passed through gRPC to the orchestrator's in-memory config for the TCP
proxy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that network egress rules set via PUT /sandboxes/{id}/network
survive a pause/resume cycle — the resumed sandbox should still enforce
the same allow/deny rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that dynamically adding a domain to AllowOut (via TLS SNI matching
in the TCP proxy) works, and that removing the domain re-blocks traffic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pdate API docs

Consolidate startedAtMu/endAtMu/networkEgressMu into single rwmu.
GetNetwork() now always returns non-nil with non-nil Ingress, removing
nil checks in proxy.go and tcpfirewall. Update openapi.yml allowOut/denyOut
descriptions to document domain support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…se nil-safe getters

Validate all CIDRs upfront in UpdateInternet to prevent partial firewall
state if an invalid CIDR is passed after rules have been reset. Add
missing 409 Conflict response to the PUT /network OpenAPI spec. Use
GetIngress() nil-safe getter in proxy.go for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After refactoring GetNetwork() to read egress from the mutex-protected
networkEgress field, tests that construct Metadata directly must call
SetNetworkEgress() to mirror what the constructor does.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the sandbox store mutation after the gRPC call to the orchestrator
node. Previously the store was updated first, so a failed gRPC call
left the store with the new config while the sandbox still enforced the
old rules. Now: Get (read-only) → gRPC → Update (persist).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…at-a-time functions

All firewall set mutations (ConfigureInternet, UpdateInternet, ResetInternet,
and the constructor) now go through ReplaceUserRules which buffers all 4 set
replacements and issues a single atomic nftables Flush. Removes AddAllowedCIDR,
AddDeniedCIDR, Reset, ResetDeniedSets, ResetAllowedSets, and addCIDRToSet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The firewall_toolkit's ValidateAddress rejects 0.0.0.0 as "unspecified",
which broke sandbox creation with internetAccess=false after the atomic
ReplaceUserRules refactoring. The old addCIDRToSet bypassed the toolkit
for this case by creating raw nftables elements directly; restore that
bypass in clearAndReplaceCIDRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the create path's validation in the PUT /sandboxes/{id}/network
handler: reject non-IP/CIDR values in denyOut (400 instead of 500),
and require 0.0.0.0/0 in denyOut when allowOut contains domains.
Add integration tests for both cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… table-driven suite

Single sandbox, sequential subtests covering: 8 rejected inputs, 10 accepted inputs,
12 firewall rule steps (deny-all, allow precedence, replace semantics, domain/IP/CIDR,
clear/reapply, allow-without-deny), and pause/resume rule preservation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract validateEgressRules from validateNetworkConfig so the network
update handler reuses the same validation logic as sandbox creation.
Switch to ReportErrorByCode so error severity matches the HTTP status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reuse the existing SandboxNetworkEgressConfig message in
SandboxUpdateNetworkRequest instead of repeating flat fields.
Change UpdateInternet to accept the egress proto message directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aradigm

Eliminates TOCTOU race by checking sandbox state atomically inside the
store update function, and reorders to store-first then node-call,
consistent with KeepAliveFor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Once UpdateInternet is called, the slot has been explicitly touched
and ResetInternet should always clean up, even if the update cleared
all rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tional apply/revert

Combine the separate UpdateNetwork and Update gRPC RPCs into a single
Update RPC that handles both end_time and egress changes. Updates are
applied sequentially using a functional updater pattern — on failure,
already-applied changes are rolled back in reverse order.

Publish a SandboxUpdatedEvent for any successful update, including
network egress details (allowed/denied CIDRs, allowed domains) in
the event data alongside the existing set_timeout field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The networkEgress field is set explicitly by the Update handler after
UpdateInternet succeeds. Initializing it in ResumeSandbox was not
needed by any change in this PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit (ed87f50) incorrectly removed networkEgress
initialization from ResumeSandbox, assuming it was only needed for the
dynamic update path. However, the API's Create handler also calls
ResumeSandbox, so the egress config was never populated — causing the
TCP proxy to see nil egress and allow all traffic through.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
levb and others added 20 commits March 19, 2026 14:18
- buildEgressConfig uses ParseAddressesAndDomains again instead of
  inlined IP/domain split loop
- Replace undefined ErrMsg* const references in tests with actual
  message strings
- Fix stale IPv6 comment in validateIngressRules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove 4 duplicate/redundant validation tests, keeping the more
interesting variant of each (wildcard domain over plain domain,
mixed domain+CIDR over single domain).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant nil check in parseIngressRules (make handles empty)
- Set AllowPublicAccess on ingressUpdate before assigning to sbx.Network

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace mutex-protected proto config with atomic Egress/Ingress pointers
on sandbox.Config. Proto types are now only used at the gRPC boundary.

- Add Egress and Ingress structs in sandbox-network with domain
  matching, rule checking, and port filtering
- Config.SetNetworkEgress/SetNetworkIngress swap atomically and return
  the old value for rollback
- Separate egress/ingress update and rollback paths in the server handler
- Move matchDomain from tcpfirewall into Egress.MatchDomain
- Extract network functions from sandbox.go into sandbox/network.go
- Rename rule.go to access_control.go
- Rewrite TestIsEgressAllowed to use runtime types instead of proto
- Remove AllPorts/HasPort; fold logic into PortInRange

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete IngressRule proto message; ingress now uses repeated string
allowed_cidrs/denied_cidrs in CIDR[:port[-port]] format, matching
egress. Merge parseEgressRules and parseIngressRules into a single
ParseValidRules in sandbox-network. Strings pass through from API
to orchestrator without intermediate conversion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…duplicated

Restore original allowOut/denyOut/maskRequestHost descriptions per review.
Keep PUT /network endpoint descriptions self-contained instead of
referencing SandboxNetworkConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always send an IPv4 client IP in X-Forwarded-For for ingress tests.
CI runners may connect via IPv6, which is unconditionally denied by
the IPv6 fail-closed guard. Use TEST-NET-2 (198.51.100.99) as the
default client IP when no specific fromIP is provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jakub Dobry <jakub.dobry8@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eral

Replace all "0.0.0.0/0" string literals with sandbox_network.AllTraffic
in tests and production code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove proto.Clone — EgressFromProto/IngressFromProto create new
  values, original proto is not modified after handoff
- Build Egress/Ingress before Config, set via SetNetworkEgress/Ingress
- Remove NewConfigWithNetwork — no longer needed
- Deny-all assignment when internet disabled now uses parsed rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntIP

ExtractExternalClientIP reads from XFF/RemoteAddr (used by client-proxy
at the edge). ExtractE2BClientIP reads only the internal E2B header
(used by orchestrator proxy). Missing header returns empty string,
which triggers fail-closed deny via net.ParseIP("") == nil.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wantErrMsg: "invalid denied CIDR not-a-cidr",
wantErrMsg: `invalid denied CIDR not-a-cidr`,
},
// Domain validation tests
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because these are already covered by the wildcard tests, redundant

levb and others added 4 commits March 20, 2026 10:51
…nsts

- Fix internal/ to pkg/ import paths after orchestrator restructure
- Revert AllTraffic rename back to AllInternetTrafficCIDR to minimize
  diff from main
- Restore ErrMsgDomainsRequireBlockAll and ErrMsgAllowInRequiresBlockAll
  consts for test readability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reduce assertBlockedHTTPRequest to 1s connect / 2s max (from 3s/5s)
- Reduce assertSuccessfulHTTPRequest to 1s connect / 2s max (from 5s/10s)
- Replace google.com with IP in step 8 to avoid 15s DNS resolver timeout
- Step 8 drops from 20s to 0.13s, total TestUpdateNetworkConfig from 36s to 10s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@levb levb requested a review from dobrac March 20, 2026 18:56
@levb levb force-pushed the lev-ingress-control branch from f912e52 to 35ae165 Compare March 25, 2026 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants