Conversation
This design document covers the V0 implementation of persistent volumes that can be mounted in sandboxes and accessed via SDK/API: - NFS v3 proxy server using willscott/go-nfs - GCS backend for file storage - Database schema for filesystem metadata - API endpoints for filesystem CRUD and file operations - SDK interface for TypeScript/JavaScript and Python - Volume mount configuration for sandbox creation - Security and observability considerations Based on discussion in #proj-filesystem-persistence Slack channel. Co-authored-by: vasek <vasek@e2b.dev>
|
Cursor Agent can help with this pull request. Just |
| team_id UUID NOT NULL REFERENCES teams(id) ON DELETE CASCADE, | ||
| name VARCHAR(255) NOT NULL, | ||
| size_bytes BIGINT NOT NULL, -- Quota/limit for the volume | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), |
There was a problem hiding this comment.
The SQL schema defines size_bytes BIGINT NOT NULL but the API spec uses sizeGB (integer gigabytes). This creates a conversion point that could lead to precision issues or overflow. Consider using bytes consistently throughout the API and SDK, or clearly document the conversion behavior (e.g., does 1.5GB get truncated to 1GB?).
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
|
|
||
| UNIQUE(team_id, name) |
There was a problem hiding this comment.
The schema is missing updated_at timestamp update trigger. Based on existing patterns in the codebase (e.g., migrations/20231124185944_create_schemas_and_tables.sql), tables typically need a trigger to auto-update the updated_at field. Without this, the timestamp will never change after creation.
| UNIQUE(team_id, name) | ||
| ); | ||
|
|
||
| -- Index for efficient team lookups |
There was a problem hiding this comment.
Missing RLS (Row Level Security) policy. Based on existing patterns in the codebase, tables should have ENABLE ROW LEVEL SECURITY and appropriate policies to prevent cross-team data access. Without this, the security model relies solely on application-level checks, which is less secure than database-level enforcement.
| type: integer | ||
| minimum: 1 | ||
| maximum: 100 | ||
| description: Storage quota in gigabytes |
There was a problem hiding this comment.
The API spec allows maximum 100GB quotas (maximum: 100) but doesn't define minimum constraints. Should there be a minimum size (e.g., 1GB)? Also, consider if 100GB is appropriate for a PoC - this could lead to high storage costs if many users create max-size volumes.
| tags: [filesystems] | ||
| security: | ||
| - ApiKeyAuth: [] | ||
| parameters: |
There was a problem hiding this comment.
The path parameter in file operations endpoints uses wildcard matching but lacks validation constraints. File paths need sanitization to prevent directory traversal attacks (e.g., ../../etc/passwd). The design should specify: (1) path validation rules, (2) whether absolute vs relative paths are allowed, and (3) how to handle symbolic links.
|
|
||
| ### Concurrency Considerations | ||
|
|
||
| For V0, we allow concurrent access via NFS (no explicit locking): |
There was a problem hiding this comment.
GCS provides eventual consistency for concurrent writes, which means multiple sandboxes writing to the same file could result in data loss or corruption. The last write wins, but there's no guarantee of ordering. This is a significant data integrity risk for concurrent access scenarios. Consider adding file locking mechanisms or documenting this limitation prominently.
| 2. **Orchestrator** resolves filesystem IDs to NFS server addresses | ||
| 3. **Firecracker VM** starts with NFS client configured | ||
| 4. **Init script** mounts NFS shares to specified paths: | ||
| ```bash |
There was a problem hiding this comment.
Using NFS nolock option (line 236) disables file locking entirely. This combined with eventual consistency in GCS means there's no coordination mechanism for concurrent access. This could lead to: (1) corrupted files from concurrent writes, (2) partial reads of files being written, (3) race conditions in file creation/deletion. The design needs either locking support or very clear documentation of these limitations.
| "createdAt": "2026-01-20T12:00:00Z" | ||
| } | ||
| ] | ||
|
|
There was a problem hiding this comment.
The design doesn't specify what happens when a filesystem is deleted while it's still mounted in active sandboxes. This could lead to: (1) sandboxes experiencing I/O errors, (2) stale NFS mounts, (3) orphaned data. Consider adding reference counting or preventing deletion of in-use filesystems.
| ```typescript | ||
| // Create a filesystem | ||
| const filesystem = await Filesystem.create('my-volume', '5gb') | ||
|
|
There was a problem hiding this comment.
The SDK allows mounting by filesystem name ('my-volume') in addition to ID, but name resolution isn't specified. Since names are only unique within a team, the orchestrator needs to resolve names to IDs during sandbox creation. This adds latency and a potential point of failure. Consider whether name-based mounting is necessary for v0 or if ID-only would be simpler.
| ### NFS Proxy → GCS Mapping | ||
|
|
||
| | NFS Operation | GCS Operation | | ||
| |---------------|---------------| |
There was a problem hiding this comment.
NFS WRITE operations are mapped to GCS objects.insert, but GCS objects are immutable. Each write creates a new object version. For frequently modified files, this will: (1) generate excessive object versions consuming storage, (2) increase costs significantly, (3) require lifecycle policies for cleanup. Consider using objects.patch or implementing a write-back cache to batch updates.
Add a design document for the persistent filesystem feature to outline its architecture, API, and implementation plan.
Slack Thread