Skip to content

feat(ai-proxy): add Snowflake integration with PAT authentication#1573

Open
christophebrun-forest wants to merge 3 commits intomainfrom
feat/ai-proxy-snowflake-mcp-tool
Open

feat(ai-proxy): add Snowflake integration with PAT authentication#1573
christophebrun-forest wants to merge 3 commits intomainfrom
feat/ai-proxy-snowflake-mcp-tool

Conversation

@christophebrun-forest
Copy link
Copy Markdown
Member

@christophebrun-forest christophebrun-forest commented Apr 29, 2026

Summary

Adds a new Forest integration in @forestadmin/ai-proxy that exposes 3 MCP tools backed by Snowflake's REST API v2, authenticated via Programmatic Access Tokens (PAT).

Tools shipped:

  • snowflake_cortex_search — semantic search via a Cortex Search service (caller passes database/schema/service as arguments).
  • snowflake_cortex_analyst — natural-language analytical Q&A backed by a semantic model file (@stage/model.yaml) or a semantic view (XOR validated at runtime).
  • snowflake_execute_query — synchronous SQL execution restricted to read-only statements.

Authentication: PAT bearer token (Authorization: Bearer <pat> + X-Snowflake-Authorization-Token-Type: PROGRAMMATIC_ACCESS_TOKEN). The token is passed per-request via the integration config — no shared OAuth flow.

Config shape:
```ts
{
accountIdentifier: string; // e.g. "myorg-myaccount" or account locator
programmaticAccessToken: string;
defaultWarehouse?: string;
defaultDatabase?: string;
defaultSchema?: string;
defaultRole?: string;
}
```

Read-only SQL guard (defense-in-depth):

  • Allowlist of leading keywords: `SELECT | SHOW | DESCRIBE | DESC | EXPLAIN`.
  • Multi-statement detection runs after stripping comments and string/identifier literals (no false positive on `SELECT 'a;b' FROM t`).
  • Forbidden-keyword scan over the normalized SQL covers bypasses like `EXPLAIN INSERT …`, `WITH cte AS (…) DELETE …`, and `SELECT 1 -- ;\nDROP …`.
  • Primary safety net remains the Snowflake role bound to the PAT — give it SELECT/USAGE only.

Test plan

  • `yarn workspace @forestadmin/ai-proxy lint`
  • `yarn workspace @forestadmin/ai-proxy build`
  • `yarn workspace @forestadmin/ai-proxy test` (328 passed, 64 new)
  • End-to-end smoke test against a real Snowflake account with a PAT (warehouse + role with read-only privileges)
  • Verify error path: invalid PAT → `McpConnectionError` with status + URL in the message
  • Verify SQL guard blocks `WITH cte … DELETE …` end-to-end via the agent

🤖 Generated with Claude Code

Note

Add Snowflake integration with PAT authentication to ai-proxy

  • Adds a Snowflake integration to ForestIntegrationClient, wiring up loadTools and checkConnection alongside existing integrations.
  • Introduces three LangChain DynamicStructuredTool instances: snowflake_cortex_search, snowflake_cortex_analyst, and snowflake_execute_query, each calling Snowflake's REST API with Programmatic Access Token (PAT) auth headers.
  • snowflake_execute_query enforces read-only SQL by rejecting mutating keywords and multi-statement queries before any network call via assertReadOnlySql in utils.ts.
  • validateSnowflakeConfig performs a live connectivity check by executing SELECT 1 against the Snowflake statements API, throwing McpConnectionError on failure.
  • Risk: normalizeSql in utils.ts contains a reported syntax error (stray leading +) that may cause a load-time SyntaxError in the utils module.

Changes since #1573 opened

  • Added account identifier validation that throws AIBadRequestError for invalid Snowflake account identifiers [534eaae]
  • Refactored Snowflake base URL construction to centralize validation logic and remove local implementations [534eaae]
  • Extended SQL statement restrictions to forbid UPDATE operations in addition to existing forbidden keywords [534eaae]
  • Updated test suites to cover new account identifier validation behavior and removed tests for deprecated functions [534eaae]

Macroscope summarized 7d4ba41.

Exposes 3 MCP tools backed by Snowflake REST API v2 (Cortex Search,
Cortex Analyst, read-only SQL execution), authenticated via Programmatic
Access Tokens. The execute-query tool enforces a defense-in-depth
read-only SQL guard on top of Snowflake role privileges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qltysh
Copy link
Copy Markdown

qltysh Bot commented Apr 29, 2026

3 new issues

Tool Category Rule Count
qlty Structure Function with many returns (count = 4): checkConnection 1
qlty Structure Function with high complexity (count = 35): normalizeSql 1
qlty Structure Deeply nested control flow (level = 4) 1

@qltysh
Copy link
Copy Markdown

qltysh Bot commented Apr 29, 2026

Qlty


Coverage Impact

⬆️ Merging this pull request will increase total coverage on main by 0.03%.

Modified Files with Diff Coverage (6)

RatingFile% DiffUncovered Line #s
Coverage rating: A Coverage rating: A
packages/ai-proxy/src/forest-integration-client.ts100.0%
New Coverage rating: A
...ges/ai-proxy/src/integrations/snowflake/tools/execute-query.ts100.0%
New Coverage rating: A
packages/ai-proxy/src/integrations/snowflake/tools.ts100.0%
New Coverage rating: A
...ges/ai-proxy/src/integrations/snowflake/tools/cortex-search.ts100.0%
New Coverage rating: A
...es/ai-proxy/src/integrations/snowflake/tools/cortex-analyst.ts100.0%
New Coverage rating: A
packages/ai-proxy/src/integrations/snowflake/utils.ts98.5%69
Total99.2%
🤖 Increase coverage with AI coding...
In the `feat/ai-proxy-snowflake-mcp-tool` branch, add test coverage for this new code:

- `packages/ai-proxy/src/integrations/snowflake/utils.ts` -- Line 69

🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

Comment thread packages/ai-proxy/src/integrations/snowflake/utils.ts
Comment thread packages/ai-proxy/src/integrations/snowflake/utils.ts
@linear
Copy link
Copy Markdown

linear Bot commented May 5, 2026

The previous regex pipeline stripped comments before string literals, so
attacker-crafted statements like `SELECT '--' DELETE FROM users` had the
forbidden `DELETE` consumed as part of a "line comment" and the guard
accepted the query. SQL is context-sensitive: a `--` may live inside a
string, and a `'` may live inside a comment. A regex pipeline cannot
disambiguate the two in any order. Replace it with a single-pass lexer
that tracks string/identifier/comment state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
name: 'snowflake_execute_query',
description:
'Execute a read-only SQL query on Snowflake and return the results. ' +
'Only SELECT, WITH, SHOW, DESCRIBE, and EXPLAIN statements are allowed. ' +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WITH is in FORBIDDEN_KEYWORD_RE, so it should be removed from this description.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


export function getSnowflakeValidationBaseUrl(accountIdentifier: string): string {
return `https://${accountIdentifier}.snowflakecomputing.com`;
}
Copy link
Copy Markdown
Member

@nbouliol nbouliol May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accountIdentifier is interpolated into the URL without validation — accountIdentifier: "victim.com#" would route the bearer token to victim.com. Should validate against /^[A-Za-z0-9_-]+(?:[.-][A-Za-z0-9_-]+)*$/ at config time.

Maybe we can do this later and for zendesk mcp too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assertValidAccountIdentifier a control function in utils.ts. We can then look into creating a global method that can be reused in Zendesk, Snowflake, etc..


export function buildSnowflakeBaseUrl(config: SnowflakeConfig): string {
return `https://${config.accountIdentifier}.snowflakecomputing.com`;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSnowflakeValidationBaseUrl and buildSnowflakeBaseUrl return the same URL — could collapse into one helper.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete buildSnowflakeBaseUrl
rename getSnowflakeValidationBaseUrl to getSnowflakeBaseUrl
Use this method in tools.ts

- Validate accountIdentifier against a strict regex before interpolating
  it into the base URL. Without this, a value like `attack.com#` made the
  URL resolve to `https://attack.com` (the `#` opens the URL fragment),
  redirecting validation and tool calls to attacker-controlled hosts.
- Consolidate getSnowflakeValidationBaseUrl and buildSnowflakeBaseUrl
  (now identical) into a single getSnowflakeBaseUrl in utils.ts.
- Drop WITH from the snowflake_execute_query description: WITH is in
  FORBIDDEN_KEYWORD_RE and the description was misleading the LLM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants