Skip to content

guard: GraphQL response labeling, backend enrichment, and sub-method skip#2335

Merged
lpcox merged 5 commits intomainfrom
feat/guard-graphql-response-labeling
Mar 22, 2026
Merged

guard: GraphQL response labeling, backend enrichment, and sub-method skip#2335
lpcox merged 5 commits intomainfrom
feat/guard-graphql-response-labeling

Conversation

@lpcox
Copy link
Collaborator

@lpcox lpcox commented Mar 22, 2026

Summary

Fixes integrity filtering bugs in the GitHub guard that caused false positives when processing responses from the GitHub MCP Server.

Problem

The GitHub MCP Server returns responses in different formats depending on the API used (GraphQL vs REST), and several fields critical for integrity labeling are missing:

  1. GraphQL format not handledlist_issues and list_pull_requests return GraphQL-shaped responses ({issues: [...]}, {pullRequests: [...]}) that the guard's response labeling didn't recognize
  2. Missing author_association — GraphQL IssueFragment doesn't request authorAssociation; MinimalPullRequest lacks the field entirely (github/github-mcp-server#2250)
  3. Sub-method response mismatchpull_request_read(method: get_check_runs) returns check run objects, but the guard tried to parse them as PRs → pr:#unknown with none integrity → incorrectly filtered
  4. Error responses mislabeled — 404 Not Found responses were parsed as domain objects → pr:#unknown → filtered

Fixes

  1. GraphQL response format support (response_items.rs, response_paths.rs) — Handle {issues: [...]}, {pullRequests: [...]}, and GraphQL single-node formats alongside existing REST array format

  2. Backend enrichment (helpers.rs) — When author_association is missing on public repos, the guard now makes backend calls to fetch it:

    • Issues: calls issue_read (REST endpoint includes the field)
    • PRs: calls get_pull_request_facts for association + fork/merge status
  3. Sub-method skip (response_items.rs, response_paths.rs) — For pull_request_read and issue_read sub-methods (get_check_runs, get_files, get_comments, etc.), skip per-item response labeling. These return non-PR/issue objects. Resource-level labels from tool_rules provide correct integrity.

  4. Error response skip (response_items.rs, response_paths.rs) — When isError=true, skip response labeling entirely. Resource-level labels handle error cases.

  5. Search scope extraction (tool_rules.rs) — search_issues and search_pull_requests now extract repo:owner/repo from query strings to apply scoped secrecy/integrity labels.

Validation

Tested via 5 sequential repo-assist CI runs:

Run DIFC-FILTERED Result
23407151206 (v0.1.24 baseline) 1
23408145677 (GraphQL fix) 6 ❌ lpcox admin issues filtered
23410257857 (+enrichment) 4 ✅ lpcox fixed; check_runs + search remaining
23410932567 (+sub-method skip) 1 404 error mislabeled
23411168049 (+isError skip) 0 ✅ All clean

Related

Note

repo-assist.lock.yml is temporarily set to build a local container for testing. Must be reverted before merge.

lpcox and others added 5 commits March 22, 2026 10:07
The guard's response labeling code only recognized REST-like formats
(arrays, {items: [...]}) but not GraphQL nested responses like
data.repository.pullRequests.nodes. This caused the proxy to apply
coarse-grained filtering (blocking entire responses) instead of
per-item integrity filtering for gh CLI calls.

Changes:
- extract_items_array() now traverses GraphQL nested paths:
  data.repository.{issues,pullRequests,discussions,...}.nodes
  and data.search.nodes
- extract_graphql_nodes() extracts collection arrays from GraphQL
- extract_graphql_single_object() extracts singular objects
  (data.repository.issue, data.repository.pullRequest)
- is_graphql_wrapper() prevents treating {data:...} as a single item
- response_items.rs PR and issue handlers check GraphQL formats
  before the single-object fallback
- 9 new tests covering GraphQL collection, single object, search,
  path labeling, and wrapper detection

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the GitHub MCP Server omits author_association from list responses
(e.g. list_issues via GraphQL, list_pull_requests via REST), the guard
now fetches the individual item via a backend call to obtain the correct
author_association value. This prevents incorrectly assigning 'none'
integrity to repo members/collaborators.

Changes:
- Add has_author_association() helper to detect absent field
- issue_integrity(): enrich via get_issue_author_association when field
  is missing on public repos (private repos already get writer integrity)
- pr_integrity(): enrich via get_pull_request_facts when field is missing,
  also backfills is_forked and is_merged status
- Add 6 tests for has_author_association and enrichment scenarios

Upstream fix filed: github/github-mcp-server#2250

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- response_items.rs/response_paths.rs: Skip per-item response labeling
  for pull_request_read and issue_read sub-methods (get_check_runs,
  get_files, get_comments, etc.) that return non-PR/issue objects.
  Resource-level labels from tool_rules provide correct integrity.

- tool_rules.rs: Extract repo scope from search_issues and
  search_pull_requests query strings using extract_repo_info_from_search_query.
  When repo:owner/repo is present, apply scoped secrecy and integrity
  instead of deferring entirely to response labeling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the MCP backend returns isError=true (e.g. 404 Not Found), skip
per-item response labeling in both response_items.rs and
response_paths.rs. Error responses don't contain valid domain objects
and would be mislabeled as pr:#unknown or issue:#unknown with none
integrity, causing incorrect DIFC filtering.

Resource-level labels from tool_rules still apply to error responses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 22, 2026 20:15
@lpcox lpcox changed the title Feat/guard graphql response labeling guard: GraphQL response labeling, backend enrichment, and sub-method skip Mar 22, 2026
@lpcox lpcox merged commit b2b0527 into main Mar 22, 2026
23 checks passed
@lpcox lpcox deleted the feat/guard-graphql-response-labeling branch March 22, 2026 20:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GraphQL-aware response labeling to the GitHub guard so PR/issue results can be labeled correctly even when returned in GraphQL wrapper shapes, and adjusts workflow wiring to run against a locally built MCPG container.

Changes:

  • Improve repo scoping for search_issues / search_pull_requests by parsing repo:owner/repo from the search query.
  • Extend response labeling to handle GraphQL collection/single-object shapes and skip per-item labeling for *_read sub-methods and error responses.
  • Add backend “enrichment” in pr_integrity / issue_integrity when author_association is missing, and update the repo-assist workflow to build/run a local MCPG image.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
guards/github-guard/rust-guard/src/labels/tool_rules.rs Adds repo-qualified search scoping for issues/PR search tools.
guards/github-guard/rust-guard/src/labels/response_paths.rs Adds error-response skip + GraphQL items-path support + sub-method skip logic.
guards/github-guard/rust-guard/src/labels/response_items.rs Adds GraphQL node/single-object extraction + sub-method skip logic.
guards/github-guard/rust-guard/src/labels/mod.rs Re-exports new helpers and adds tests for GraphQL extraction helpers.
guards/github-guard/rust-guard/src/labels/helpers.rs Implements GraphQL extraction helpers and backend enrichment for missing author_association.
.github/workflows/repo-assist.lock.yml Switches repo-assist to build and run gh-aw-mcpg:local instead of a pinned image.
Comments suppressed due to low confidence (4)

.github/workflows/repo-assist.lock.yml:421

  • Switching the proxy container from a pinned release image to ghcr.io/github/gh-aw-mcpg:local makes this workflow dependent on the local docker build step and removes the reproducibility benefits of the lock file. If the intent is to run published code, keep the pinned tag; if the intent is to test PR builds, consider limiting this to PR-triggered runs (not scheduled) or make it configurable via workflow inputs.
          docker run -d --name awmg-proxy --network host \
            -e GH_TOKEN \
            -e DEBUG='*' \
            -v "$PROXY_LOG_DIR:$PROXY_LOG_DIR" \
            ghcr.io/github/gh-aw-mcpg:local proxy \
              --policy "$POLICY" \

.github/workflows/repo-assist.lock.yml:517

  • The image pre-pull step no longer includes the MCPG image; combined with the shift to a locally built MCPG image, this reduces cache usefulness and can increase runtime variability. If the workflow should rely on released images, keep MCPG in the pre-pull list; if it should rely on local builds, consider adjusting the step name/documentation and ensuring scheduled runs aren’t impacted.
      - name: Download container images
        run: bash ${RUNNER_TEMP}/gh-aw/actions/download_docker_images.sh ghcr.io/github/gh-aw-firewall/agent:0.24.3 ghcr.io/github/gh-aw-firewall/api-proxy:0.24.3 ghcr.io/github/gh-aw-firewall/squid:0.24.3 ghcr.io/github/github-mcp-server:v0.32.0 node:lts-alpine

guards/github-guard/rust-guard/src/labels/helpers.rs:985

  • pr_integrity now does a backend get_pull_request call per item when author_association is missing. With MAX_ITEMS_PER_RESPONSE = 100, this can turn a single list/search response into up to 100 extra GitHub API calls, increasing latency and risk of rate limiting. Consider adding caching for PR facts (keyed by repo+number), lowering the max for enrichment, or only enriching when is_forked/merge state is unknown and the integrity outcome would otherwise remain empty.
    // Backend enrichment: when author_association is absent from the response
    // (e.g. GitHub MCP Server omits it from MinimalPullRequest), fetch the
    // individual PR via REST to obtain the correct association, fork status,
    // and merge status.
    if integrity.is_empty() && !has_author_association(item) && !repo_private {
        if let Some(number) = item.get("number").and_then(|v| v.as_u64()) {
            let (owner, repo) = repo_full_name.split_once('/').unwrap_or(("", ""));
            if !owner.is_empty() && !repo.is_empty() {
                let number_str = number.to_string();
                if let Some(facts) =
                    super::backend::get_pull_request_facts(owner, repo, &number_str)
                {
                    crate::log_debug(&format!(
                        "[integrity] pr:{}#{} enriched: author_association={:?}, is_forked={:?}, is_merged={}",
                        repo_full_name, number, facts.author_association, facts.is_forked, facts.is_merged
                    ));

.github/workflows/repo-assist.lock.yml:865

  • MCP_GATEWAY_DOCKER_COMMAND is updated to run ghcr.io/github/gh-aw-mcpg:local, coupling gateway startup to the earlier local docker build. This can be fragile on reruns/cached steps and makes the lock workflow non-reproducible. Prefer the pinned release image for scheduled/default runs, or gate the local image path behind an explicit debug/PR-only switch.
          export GH_AW_ENGINE="copilot"
          export MCP_GATEWAY_DOCKER_COMMAND='docker run -i --rm --network host -v /var/run/docker.sock:/var/run/docker.sock -e MCP_GATEWAY_PORT -e MCP_GATEWAY_DOMAIN -e MCP_GATEWAY_API_KEY -e MCP_GATEWAY_PAYLOAD_DIR -e MCP_GATEWAY_PAYLOAD_SIZE_THRESHOLD -e DEBUG -e MCP_GATEWAY_LOG_DIR -e GH_AW_MCP_LOG_DIR -e GH_AW_SAFE_OUTPUTS -e GH_AW_SAFE_OUTPUTS_CONFIG_PATH -e GH_AW_SAFE_OUTPUTS_TOOLS_PATH -e GH_AW_ASSETS_BRANCH -e GH_AW_ASSETS_MAX_SIZE_KB -e GH_AW_ASSETS_ALLOWED_EXTS -e DEFAULT_BRANCH -e GITHUB_MCP_SERVER_TOKEN -e GITHUB_MCP_GUARD_MIN_INTEGRITY -e GITHUB_MCP_GUARD_REPOS -e GITHUB_REPOSITORY -e GITHUB_SERVER_URL -e GITHUB_SHA -e GITHUB_WORKSPACE -e GITHUB_TOKEN -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e GITHUB_JOB -e GITHUB_ACTION -e GITHUB_EVENT_NAME -e GITHUB_EVENT_PATH -e GITHUB_ACTOR -e GITHUB_ACTOR_ID -e GITHUB_TRIGGERING_ACTOR -e GITHUB_WORKFLOW -e GITHUB_WORKFLOW_REF -e GITHUB_WORKFLOW_SHA -e GITHUB_REF -e GITHUB_REF_NAME -e GITHUB_REF_TYPE -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GH_AW_SAFE_OUTPUTS_PORT -e GH_AW_SAFE_OUTPUTS_API_KEY -v /tmp/gh-aw/mcp-payloads:/tmp/gh-aw/mcp-payloads:rw -v /opt:/opt:ro -v /tmp:/tmp:rw -v '"${GITHUB_WORKSPACE}"':'"${GITHUB_WORKSPACE}"':rw ghcr.io/github/gh-aw-mcpg:local'

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1109 to +1130
// Backend enrichment: when author_association is absent from the response
// (e.g. GitHub MCP Server's list_issues GraphQL path omits it), fetch the
// individual issue via REST to obtain the correct value. This avoids
// incorrectly assigning "none" integrity to members/collaborators.
if integrity.is_empty() && !has_author_association(item) && !repo_private {
if let Some(number) = item.get("number").and_then(|v| v.as_u64()) {
let (owner, repo) = repo_full_name.split_once('/').unwrap_or(("", ""));
if !owner.is_empty() && !repo.is_empty() {
let number_str = number.to_string();
if let Some(association) =
super::backend::get_issue_author_association(owner, repo, &number_str)
{
crate::log_debug(&format!(
"[integrity] issue:{}#{} enriched author_association='{}'",
repo_full_name, number, association
));
// Re-check trusted bot status with enriched login
let enriched_floor =
author_association_floor_from_str(repo_full_name, Some(&association), ctx);
integrity =
max_integrity(repo_full_name, integrity, enriched_floor, ctx);
} else {
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue_integrity adds per-item backend enrichment via get_issue_author_association when author_association is missing. This can create an N+1 pattern (up to MAX_ITEMS_PER_RESPONSE extra backend calls) on list/search issue responses. Consider caching associations (repo+issue number) or otherwise bounding/avoiding enrichment to reduce latency and rate-limit exposure.

This issue also appears on line 970 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines 401 to +406
# NOTE: Local container build kept for debugging. Uncomment to test unpublished changes.
# - name: Build local MCPG container (debugging only)
# run: |
# rustup target add wasm32-wasip1
# cd guards/github-guard/rust-guard && ./build.sh && cd ../../..
# docker build . -t ghcr.io/github/gh-aw-mcpg:local
- name: Build local MCPG container (debugging only)
run: |
rustup target add wasm32-wasip1
cd guards/github-guard/rust-guard && ./build.sh && cd ../../..
docker build . -t ghcr.io/github/gh-aw-mcpg:local
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is labeled “debugging only” and the preceding note says it should be uncommented for debugging, but it is now always enabled. Building the MCPG container in this scheduled workflow will add significant runtime and introduces additional failure modes (toolchain install, docker build). If this is meant for occasional debugging, gate it behind an input/flag and keep the default path using the pinned published image.

This issue also appears in the following locations of the same file:

  • line 416
  • line 516
  • line 864

Copilot uses AI. Check for mistakes.
"[integrity] issue:{}#{} enriched author_association='{}'",
repo_full_name, number, association
));
// Re-check trusted bot status with enriched login
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment // Re-check trusted bot status with enriched login is misleading here: the enrichment path only fetches author_association (no author login), so there is no bot re-check happening. Please update/remove the comment to reflect what the code actually does.

Suggested change
// Re-check trusted bot status with enriched login
// Recompute integrity floor using enriched author_association

Copilot uses AI. Check for mistakes.
desc = format!("search_issues:{}", q_repo_id);
secrecy =
apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx);
integrity = private_writer_integrity(&q_repo_id, repo_private, ctx);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the repo-qualified search path, repo_private is computed from extract_repo_info(tool_args) (which is typically empty for search tools), so private_writer_integrity(&q_repo_id, repo_private, ctx) will never grant writer integrity even when the repo:owner/repo target is private. Compute repo visibility for q_owner/q_repo and pass that result into private_writer_integrity (or return the privacy flag from apply_repo_visibility_secrecy). Also update baseline_scope to q_repo_id when you emit repo-scoped integrity so ensure_integrity_baseline doesn’t downgrade labels.

Suggested change
integrity = private_writer_integrity(&q_repo_id, repo_private, ctx);
// Derive repo privacy from the applied secrecy labels for this repo scope
let repo_private_for_search =
secrecy.contains(&policy_private_scope_label(&q_repo_id, ctx));
integrity = private_writer_integrity(&q_repo_id, repo_private_for_search, ctx);
// Ensure the integrity baseline is applied at the repo scope for this query
baseline_scope = q_repo_id.clone();

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +236
desc = format!("search_pull_requests:{}", q_repo_id);
secrecy =
apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx);
integrity = private_writer_integrity(&q_repo_id, repo_private, ctx);
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as search_issues: repo_private here is derived from tool_args owner/repo (usually absent for search), so private_writer_integrity(&q_repo_id, repo_private, ctx) won’t grant writer integrity for private repos even when the query includes repo:owner/repo. Determine privacy for q_owner/q_repo and set baseline_scope = q_repo_id when applying repo-scoped integrity.

Suggested change
desc = format!("search_pull_requests:{}", q_repo_id);
secrecy =
apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx);
integrity = private_writer_integrity(&q_repo_id, repo_private, ctx);
// Determine privacy for the repository referenced in the query and
// scope integrity baselines to that repository.
let (_, _, _, q_repo_private) = extract_repo_info(&q_owner, &q_repo, ctx);
desc = format!("search_pull_requests:{}", q_repo_id);
secrecy =
apply_repo_visibility_secrecy(&q_owner, &q_repo, &q_repo_id, secrecy, ctx);
baseline_scope = q_repo_id.clone();
integrity = private_writer_integrity(&q_repo_id, q_repo_private, ctx);

Copilot uses AI. Check for mistakes.
lpcox added a commit that referenced this pull request Mar 22, 2026
Switch repo-assist.lock.yml from local container build to published
`ghcr.io/github/gh-aw-mcpg:v0.1.25` image (includes GraphQL response
labeling, backend enrichment, sub-method skip, and error response
handling fixes from #2335).

### Changes
- Comment out local container build step (kept for debugging)
- Update proxy image: `:local` → `:v0.1.25`
- Update gateway image: `:local` → `:v0.1.25`
lpcox added a commit that referenced this pull request Mar 22, 2026
## Summary

Switch repo-assist from local container build to published image
`v0.1.26`, which includes all DIFC integrity filtering fixes:

- **Backend enrichment tool name fix** (PR #2340): corrected
`get_pull_request` → `pull_request_read` and added missing `method:
"get"` parameters
- **Sub-method response labeling skip** (PR #2335): prevents mislabeling
`get_check_runs`/`get_comments` sub-method responses
- **isError response skip** (PR #2335): avoids labeling 404/error
responses as domain objects
- **Search scope extraction** (PR #2335): enables filtering for
`search_issues`/`search_pull_requests`

## Changes

- Comment out local container build step (kept as comment for future
debugging)
- Update proxy image tag: `:local` → `:v0.1.26`
- Update MCP gateway image tag: `:local` → `:v0.1.26`

## Validation

v0.1.26 was validated with **0 DIFC-FILTERED events** in run
[23412918230](https://github.com/github/gh-aw-mcpg/actions/runs/23412918230).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants