Skip to content

Commit 26cf668

Browse files
committed
update skill
1 parent 66b0f58 commit 26cf668

1 file changed

Lines changed: 26 additions & 0 deletions

File tree

  • .agents/skills/memory-load-check

.agents/skills/memory-load-check/SKILL.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,35 @@ Read these when doing a deeper pass:
4949
- cap downloads and parsed output separately
5050
- preserve partial results when a later item exceeds the cap
5151
- never read untrusted response bodies without a byte cap
52+
- KB connector file downloads in `apps/sim/connectors/utils.ts`
53+
- `CONNECTOR_MAX_FILE_BYTES`: shared per-file cap (aligned with the manual KB upload limit)
54+
- `readBodyWithLimit`: stream a download body to a Buffer with a hard byte cap (null on overflow)
55+
- `stubOrSkipBySize`: listing-time skip when the reported size exceeds the cap
56+
- `markSkipped` / `sizeLimitSkipReason`: surface oversized files as failed (skipped) KB rows
57+
- `ConnectorFileTooLargeError`: thrown mid-download when the listing under-reported size
5258
- Large workflow value payloads
5359
- prefer durable references/manifests over inlining large arrays or files
5460
- materialize refs only behind an explicit byte budget
5561

62+
## KB Connector File Size Handling
63+
64+
The connector size pattern in `apps/sim/connectors/utils.ts` (`CONNECTOR_MAX_FILE_BYTES` + `readBodyWithLimit` + `stubOrSkipBySize`/`markSkipped`) exists for one risk: a knowledge-base connector downloading **arbitrary, user-controlled file bytes** that the source does not hard-cap. Apply it by that risk, not by the connector's name.
65+
66+
Use the pattern when the connector downloads file content via a stream/`download_url` where the user controls the size:
67+
- file-storage connectors: Dropbox, OneDrive, SharePoint, Google Drive, S3, GitHub, GitLab, Azure DevOps
68+
- any connector that fetches a file via a download URL even if it is not a "storage" service (e.g. the Zoom transcript `.vtt`)
69+
70+
For those, require all three:
71+
- stream the body with `readBodyWithLimit(resp, CONNECTOR_MAX_FILE_BYTES)` — never raw `response.text()`/`response.arrayBuffer()`
72+
- skip oversize at listing (`stubOrSkipBySize` with the reported size) and again at fetch time (overflow -> `markSkipped`), since the listing size can be missing or under-reported
73+
- never drop/truncate silently — oversized files become content-less failed rows carrying `skippedReason`, so they stay visible in the KB UI instead of vanishing from the index
74+
75+
Skip the pattern when the source already bounds the payload:
76+
- pure API/structured-data connectors (Jira, Linear, Notion, Confluence, Sentry, Slack, Zendesk, Gmail, ...) — paginated JSON/text; apply normal pagination + concurrency bounds instead of a per-file byte cap
77+
- native-document connectors capped by the platform (Google Docs ~50 MB, Google Sheets via `MAX_ROWS`, Evernote ~25 MB/note) — a 100 MB cap can never fire, and wrapping a `response.json()`/Thrift parse in `readBodyWithLimit` is cargo-culting
78+
79+
Litmus test: "Can a user make this one fetch arbitrarily large, with nothing upstream stopping it?" Yes -> use the pattern. No (platform hard-cap, or already paginated) -> a per-file byte cap adds noise, not safety. Borderline: a user-configured/self-hosted endpoint with no platform cap (e.g. Obsidian) — bound it only if the content is genuinely unbounded.
80+
5681
## Review Workflow
5782

5883
1. Identify every changed data source:
@@ -96,6 +121,7 @@ Read these when doing a deeper pass:
96121
- fetches all pages from an external API before processing
97122
- reads an entire file, HTTP response, or stream without a max byte budget
98123
- checks size only after `Buffer.concat`, `arrayBuffer`, `text`, `JSON.parse`, or parse expansion
124+
- a KB connector silently drops or truncates an oversized file instead of recording it as a failed (skipped) row
99125
- chunks only after loading the complete dataset
100126
- paginates with unbounded/deep `OFFSET` on a mutable or large table
101127
- creates one queue job per row without batching or a queue-level concurrency key

0 commit comments

Comments
 (0)