Conversation
|
Caution Review failedPull request was closed or merged during review WalkthroughUpdated the perf GitHub Actions workflow to use a configurable artifact directory and detached git worktree for baseline checkout; benchmark outputs (JSON + Markdown) are written to and uploaded from that artifact dir. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as Runner
participant Git as Git (worktree)
participant Script as benchmark_report.py
participant Art as Artifact Store
participant PR as PR Comment Action
GH->>Runner: trigger perf workflow (bench-related push/PR)
Runner->>Runner: create BENCHMARK_ARTIFACT_DIR & BENCHMARK_BASE_WORKTREE
Runner->>Git: git worktree add --detach -> BENCHMARK_BASE_WORKTREE
Runner->>Runner: run baseline tests in base worktree -> write benchmark-base.json to artifact dir
Runner->>Runner: run current benchmarks -> write benchmark-head.json and markdowns to artifact dir
Runner->>Script: run benchmark_report.py (compare, flags)
Script-->>Runner: emit comparison markdown(s) and exit status
Runner->>Art: upload BENCHMARK_ARTIFACT_DIR
alt run is PR
Runner->>PR: post/refresh sticky benchmark comment from artifact markdown
else non-PR run
Runner->>Runner: write benchmark-summary.md and append to $GITHUB_STEP_SUMMARY
end
Runner->>Git: cleanup detached worktree (always)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/perf.yml:
- Around line 43-45: Job-level env currently uses runner.temp expressions which
are invalid there; move the BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE
assignments into the first step of the job (e.g., at the top step of the
benchmarks job) and export them to subsequent steps with echo
"BENCHMARK_ARTIFACT_DIR=$RUNNER_TEMP/modelaudit-benchmarks" >> $GITHUB_ENV and
echo "BENCHMARK_BASE_WORKTREE=$RUNNER_TEMP/modelaudit-base" >> $GITHUB_ENV so
all later steps that reference BENCHMARK_ARTIFACT_DIR and
BENCHMARK_BASE_WORKTREE get the resolved values; remove the runner.temp uses
from the job-level env block.
In `@tests/test_benchmark_report.py`:
- Around line 81-83: The current assertions use summary_text.index(...) which
can be satisfied by mentions outside the results table; instead extract the
benchmark table rows from summary_text (e.g., parse the "Slowest
median"/"Fastest median" table section or use the same row-extraction helper
used elsewhere in this test file) and assert that the row list contains
"test_scan_pytorch_zip" and "test_scan_safe_pickle" and that the index of
"test_scan_pytorch_zip" in that extracted rows list is less than the index of
"test_scan_safe_pickle"; update the assertions around summary_text,
summary_file.read_text, and the two benchmark names to validate order against
the parsed table rows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3ffd15f9-d30a-47cc-9745-c59e76575128
📒 Files selected for processing (4)
.github/workflows/README.md.github/workflows/perf.ymlscripts/benchmark_report.pytests/test_benchmark_report.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/test_benchmark_report.py (1)
101-106:⚠️ Potential issue | 🟡 MinorAssert ordering from the table rows, not the whole markdown blob.
Both benchmark names also appear in the
Slowest benchmarks:bullets, sosummary_text.index(...)can still pass even if the table rows regress. Parse the benchmark rows and assert their order directly.More targeted assertion
summary_text = summary_file.read_text(encoding="utf-8") assert "| Benchmark | Target | Size | Files | Median | Mean | Rounds |" in summary_text assert "safe_model.pkl" in summary_text assert "2.0 KiB" in summary_text - assert "test_scan_safe_pickle" in summary_text - assert summary_text.index("test_scan_pytorch_zip") < summary_text.index("test_scan_safe_pickle") + table_rows = [ + line for line in summary_text.splitlines() if line.startswith("| `tests/benchmarks/test_scan_benchmarks.py::") + ] + assert len(table_rows) == 2 + assert table_rows[0].startswith( + "| `tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip` |" + ) + assert table_rows[1].startswith( + "| `tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle` |" + )As per coding guidelines, "Assert the actual signal in tests — never
assert result is not Nonealone; verify specific check names, issue messages, or detail fields."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_benchmark_report.py` around lines 101 - 106, The current test uses summary_text.index(...) which can be fooled by non-table occurrences; instead parse the Markdown table rows from summary_text (e.g., split by newline and locate the table header line `"| Benchmark | Target | Size | Files | Median | Mean | Rounds |"`, then collect subsequent lines that start with `"|"` as table rows) and assert the order of benchmark names within those rows (check that the row containing "test_scan_pytorch_zip" appears before the row containing "test_scan_safe_pickle"). Use the existing variables summary_text/summary_file and replace the global index assertion with this table-row-based order check..github/workflows/perf.yml (1)
44-45:⚠️ Potential issue | 🔴 Critical
runner.tempis invalid at job scope.GitHub Actions does not allow the
runnercontext injobs.<job_id>.env, and actionlint is already failing on Lines 44-45. That meansBENCHMARK_ARTIFACT_DIRandBENCHMARK_BASE_WORKTREEnever resolve before the later steps consume them. Move these assignments into an earlyrunstep and export them via$GITHUB_ENV.Minimal fix
env: - BENCHMARK_ARTIFACT_DIR: ${{ runner.temp }}/modelaudit-benchmarks - BENCHMARK_BASE_WORKTREE: ${{ runner.temp }}/modelaudit-base BENCHMARK_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} @@ - name: Prepare benchmark temp directories run: | - rm -rf "$BENCHMARK_ARTIFACT_DIR" "$BENCHMARK_BASE_WORKTREE" - mkdir -p "$BENCHMARK_ARTIFACT_DIR" + echo "BENCHMARK_ARTIFACT_DIR=$RUNNER_TEMP/modelaudit-benchmarks" >> "$GITHUB_ENV" + echo "BENCHMARK_BASE_WORKTREE=$RUNNER_TEMP/modelaudit-base" >> "$GITHUB_ENV" + rm -rf "$RUNNER_TEMP/modelaudit-benchmarks" "$RUNNER_TEMP/modelaudit-base" + mkdir -p "$RUNNER_TEMP/modelaudit-benchmarks"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/perf.yml around lines 44 - 45, The env assignments using runner.temp (BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE) are invalid at job scope; remove them from jobs.<job_id>.env and instead create an early step (e.g., a setup step) that computes those paths from runner.temp and writes them to the job environment via $GITHUB_ENV (export lines for BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE), so subsequent steps can consume them; reference the variable names BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE and the runner.temp context when locating where to move the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/perf.yml:
- Around line 100-104: The workflow invocation of benchmark_report.py should
pass the --fail-on-missing flag and capture its exit code so missing baseline
benchmarks cause the job to fail only after the markdown summary is appended;
update the uv run command that calls benchmark_report.py to include
--fail-on-missing and store the script's exit status, then continue to append
"$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md" to the PR summary and finally
exit non-zero if the captured exit code indicates missing benchmarks. Target the
invocation that builds the compare (the uv run ... benchmark_report.py call and
its use of "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md") and ensure the
script’s exit code is propagated after writing the report.
---
Duplicate comments:
In @.github/workflows/perf.yml:
- Around line 44-45: The env assignments using runner.temp
(BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE) are invalid at job scope;
remove them from jobs.<job_id>.env and instead create an early step (e.g., a
setup step) that computes those paths from runner.temp and writes them to the
job environment via $GITHUB_ENV (export lines for BENCHMARK_ARTIFACT_DIR and
BENCHMARK_BASE_WORKTREE), so subsequent steps can consume them; reference the
variable names BENCHMARK_ARTIFACT_DIR and BENCHMARK_BASE_WORKTREE and the
runner.temp context when locating where to move the logic.
In `@tests/test_benchmark_report.py`:
- Around line 101-106: The current test uses summary_text.index(...) which can
be fooled by non-table occurrences; instead parse the Markdown table rows from
summary_text (e.g., split by newline and locate the table header line `"|
Benchmark | Target | Size | Files | Median | Mean | Rounds |"`, then collect
subsequent lines that start with `"|"` as table rows) and assert the order of
benchmark names within those rows (check that the row containing
"test_scan_pytorch_zip" appears before the row containing
"test_scan_safe_pickle"). Use the existing variables summary_text/summary_file
and replace the global index assertion with this table-row-based order check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a5e237e5-27dd-42cd-9809-1d4427b5a2e8
📒 Files selected for processing (5)
.github/workflows/README.md.github/workflows/perf.ymlscripts/benchmark_report.pytests/benchmarks/test_scan_benchmarks.pytests/test_benchmark_report.py
| uv run --locked python scripts/benchmark_report.py \ | ||
| --current /tmp/benchmark-head.json \ | ||
| --baseline /tmp/benchmark-base.json \ | ||
| --current "$BENCHMARK_ARTIFACT_DIR/benchmark-head.json" \ | ||
| --baseline "$BENCHMARK_ARTIFACT_DIR/benchmark-base.json" \ | ||
| --threshold 0.15 \ | ||
| --summary-file /tmp/benchmark-comment.md | ||
| compare_exit=$? | ||
| set -e | ||
| --summary-file "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md" |
There was a problem hiding this comment.
Wire --fail-on-missing into the PR comparison.
benchmark_report.py only returns a non-zero exit code for dropped baseline benchmarks when --fail-on-missing is passed. This invocation never sets it, so the compare step stays green on exactly the case the PR is supposed to block. Capture the exit code and fail after the markdown has been appended so the PR comment and step summary still include the report.
One way to restore the missing-benchmark gate without losing the summary output
- name: Compare against base
id: compare
if: github.event_name == 'pull_request'
run: |
+ compare_status=0
{
echo "[Workflow run and artifacts]($BENCHMARK_RUN_URL)"
echo
} > "$BENCHMARK_ARTIFACT_DIR/benchmark-comment.md"
if [ -f "$BENCHMARK_ARTIFACT_DIR/benchmark-base.json" ]; then
uv run --locked python scripts/benchmark_report.py \
--current "$BENCHMARK_ARTIFACT_DIR/benchmark-head.json" \
--baseline "$BENCHMARK_ARTIFACT_DIR/benchmark-base.json" \
--threshold 0.15 \
- --summary-file "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md"
+ --summary-file "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md" \
+ --fail-on-missing || compare_status=$?
else
{
echo "Base branch does not include the benchmark suite yet; showing current results only."
echo
@@
if [ -f "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md" ]; then
cat "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md" >> "$BENCHMARK_ARTIFACT_DIR/benchmark-comment.md"
fi
cat "$BENCHMARK_ARTIFACT_DIR/benchmark-comment.md" >> "$GITHUB_STEP_SUMMARY"
+ exit "$compare_status"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/perf.yml around lines 100 - 104, The workflow invocation
of benchmark_report.py should pass the --fail-on-missing flag and capture its
exit code so missing baseline benchmarks cause the job to fail only after the
markdown summary is appended; update the uv run command that calls
benchmark_report.py to include --fail-on-missing and store the script's exit
status, then continue to append "$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md"
to the PR summary and finally exit non-zero if the captured exit code indicates
missing benchmarks. Target the invocation that builds the compare (the uv run
... benchmark_report.py call and its use of
"$BENCHMARK_ARTIFACT_DIR/benchmark-compare.md") and ensure the script’s exit
code is propagated after writing the report.
Performance BenchmarksCompared
|
Summary
mainand address follow-up review feedback by moving runner temp path resolution into a setup step, asserting the actual Markdown table order in tests, and stabilizing the merged cache micro-benchmark threshold under xdistValidation
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation