Skip to content

safe-outputs: no permissive / reputation mode — research workflows produce all-redacted URLs #33970

@benissimo

Description

@benissimo

Summary

The safe-outputs content sanitizer (actions/setup/js/sanitize_content_core.cjs) redacts any URL whose domain isn't on the merged network.allowed + safe-outputs.allowed-domains allowlist, rewriting it to (domain)/redacted in the final discussion / issue body. There is no permissive escape hatch for workflows that legitimately need to surface URLs from an open set of domains — research, news, and competitive-analysis use cases in particular.

For an example of the resulting output, see RealPage/ai-internal-enablement#521: a weekly-research run where every external citation is rendered as [text]((domain.com/redacted), killing the readability of the report. Cross-references upstream githubnext/agentics#309.

What I tried

Verified against current source on main:

  • safe-outputs.allowed-domains: ["*"] — explicitly rejected by pkg/workflow/network_firewall_validation.go:211-218 (wildcard-only domain '*' is not allowed).
  • network.allowed: ["*"] — special-cased in pkg/workflow/firewall.go:185 to disable the egress firewall, but the sanitizer matcher (actions/setup/js/sanitize_content_core.cjs:276-292) does not honor "*", so output URLs still get redacted.
  • network: {} — denies egress and falls back to the hardcoded GitHub-only sanitizer set (sanitize_content_core.cjs:112); makes things worse.
  • Manually allowlisting domains — the workaround we just shipped (RealPage/ai-internal-enablement#563) — works but isn't tractable for open-web research workflows.

There is no URL-reputation hook in the source (no Safe Browsing / URLhaus / VirusTotal / blocklist integration).

Why this matters

weekly-research.md ships as a sample workflow in githubnext/agentics. It is structurally incompatible with a static domain allowlist — research surfaces URLs from arbitrary publishers. Any user who installs it sees the same redacted output. The current docs ("If you see (redacted) in workflow outputs, add the domain to your network.allowed list") imply this is tractable; it isn't, for this class of workflow.

The current design optimizes hard for one threat (URL-based exfiltration through agent output) and produces unusable output for any open-web reporting use case. There should be a way to opt in to a different threat model.

Proposed solutions (in priority order)

1. safe-outputs.url-policy mode (preferred)

Add a safe-outputs.url-policy: field with values:

  • allowlist — current behavior (default; backwards-compatible).
  • audit — pass all URLs through unchanged, but emit a workflow log line for any URL whose domain isn't on the allowlist. Lets users see what would have been redacted without breaking the output.
  • reputation — call a pluggable URL reputation service (Google Safe Browsing API is the obvious default; URLhaus, PhishTank, VirusTotal could be alternatives via config). Redact only entries flagged as malicious. Configurable via safe-outputs.reputation: { provider: google-safe-browsing, api-key-secret: SB_API_KEY } or similar.

The audit mode alone would cover most cases and is cheap to implement.

2. Accept "*" as a valid value in safe-outputs.allowed-domains

If the policy-mode approach is too large a surface, the minimal change is: have the validator and sanitizer matcher accept "*" to mean "pass any URL through unchanged". This mirrors the existing network.allowed: ["*"] semantic on the egress side. Users pair it with network: defaults to retain the egress firewall as the remaining defensive gate.

3. Document the limitation clearly

Independent of the above, the (redacted) paragraph in docs/.../reference/network.md should call out explicitly that open-web research workflows are not viable under the current model, and link to whichever fix lands. Right now the docs read as if "add the domain to the allowlist" is a complete answer.

Threat model note

The user's real-world concern when adopting an audit or * mode is malicious URLs in agent output (phishing, drive-by). That's a real risk, but:

  • It's already mitigated for the egress side (firewall) — the sanitizer is an additional belt on top of suspenders.
  • Allowlisting doesn't actually defend against it once the attacker compromises an allowlisted domain.
  • A reputation-based mode addresses the actual threat far better than an allowlist does.

Happy to send a PR for option 2 (smallest change) if there's directional agreement. Option 1 is the more correct fix but a larger spec change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions