[POC] Add AI submission audit system by msaroufim · Pull Request #452 · gpu-mode/kernelbot

msaroufim · 2026-03-01T20:03:29Z

Summary

Adds automated LLM-based auditing of submissions for cheating (reward hacking, hardcoded outputs, eval bypasses)
Uses OpenRouter (gpt-4o-mini) — runs as fire-and-forget after each submission, never blocks/breaks the submission flow
If OPENROUTER_API_KEY is unset, auditing is silently skipped (graceful degradation)
Admins review flagged submissions via two new API endpoints

Changes

File	What
`src/migrations/20260301_01_audit-add-submission-audit.py`	New `leaderboard.submission_audit` table
`src/libkernelbot/audit.py`	New module — sends reference code + submission to OpenRouter, stores verdict
`src/libkernelbot/leaderboard_db.py`	4 new DB methods (create audit, get audits, mark reviewed, get task by id)
`src/libkernelbot/backend.py`	Fire-and-forget `asyncio.create_task` after `mark_submission_done`
`src/kernelbot/api/main.py`	`GET /admin/audits` and `POST /admin/audits/{id}/reviewed`
`pyproject.toml`	Add `openai` dependency (OpenAI SDK used as OpenRouter client)

What this does NOT do

No score-based filtering — audits every completed submission (can add threshold later)
No retry logic — if OpenRouter call fails, audit is just skipped
No Discord integration — admin reviews audits via API only
No batch/backfill — only audits new submissions going forward

Test plan

Run migration: yoyo apply src/migrations -d $DATABASE_URL
Run existing tests: uv run pytest tests/ -v (all 80 passing tests still pass)
Verify graceful skip: without OPENROUTER_API_KEY set, submissions work normally with no errors
Submit a kernel with key set, check leaderboard.submission_audit has a row
curl -H "Authorization: Bearer $ADMIN_TOKEN" localhost:8000/admin/audits
curl -X POST -H "Authorization: Bearer $ADMIN_TOKEN" localhost:8000/admin/audits/1/reviewed

Automatically audit submissions for cheating using an LLM (gpt-4o-mini via OpenRouter). Runs as fire-and-forget after each submission completes. Admins can review flagged submissions via API.

github-actions · 2026-03-01T20:04:33Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
src/libkernelbot
audit.py					48, 51, 61-126
backend.py					248-249
leaderboard_db.py					1242-1248
utils.py
Project Total

_{This report was generated by python-coverage-comment-action}

Add AI submission audit system via OpenRouter

3d0ec1b

Automatically audit submissions for cheating using an LLM (gpt-4o-mini via OpenRouter). Runs as fire-and-forget after each submission completes. Admins can review flagged submissions via API.

Clean up submission audit feature and add coverage

2367e19

msaroufim changed the title ~~Add AI submission audit system~~ [POC] Add AI submission audit system Mar 2, 2026

msaroufim marked this pull request as draft March 3, 2026 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Add AI submission audit system#452

[POC] Add AI submission audit system#452
msaroufim wants to merge 2 commits intomainfrom
submission-audit

msaroufim commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msaroufim commented Mar 1, 2026

Summary

Changes

What this does NOT do

Test plan

Uh oh!

github-actions bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 1, 2026 •

edited

Loading