Skip to content

test: add benchmark comparison metadata fallback coverage#825

Merged
mldangelo merged 1 commit intomainfrom
automation/test-gap-detection-20260331
Mar 31, 2026
Merged

test: add benchmark comparison metadata fallback coverage#825
mldangelo merged 1 commit intomainfrom
automation/test-gap-detection-20260331

Conversation

@mldangelo-oai
Copy link
Copy Markdown
Contributor

@mldangelo-oai mldangelo-oai commented Mar 31, 2026

Summary

  • preserve baseline metadata fields (size/files/target) when current benchmark entries contain only partial extra_info
  • add focused regression test covering the partial-metadata comparison path in benchmark reporting
  • keep change scoped to benchmark report comparison logic only

Validation

  • /Users/mdangelo/.virtualenvs/openai/bin/ruff format scripts/benchmark_report.py tests/test_benchmark_report.py
  • /Users/mdangelo/.virtualenvs/openai/bin/ruff check scripts/benchmark_report.py tests/test_benchmark_report.py
  • /Users/mdangelo/.virtualenvs/openai/bin/mypy tests/test_benchmark_report.py
  • pytest run is blocked in sandbox by ddtrace PermissionError

Summary by CodeRabbit

  • Bug Fixes
    • Improved benchmark comparison reports to handle incomplete current benchmark data. When current metrics are missing, the tool now uses baseline values as fallback to ensure complete and accurate reporting.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f2b505c-6fba-46fe-b682-2b636bdcc714

📥 Commits

Reviewing files that changed from the base of the PR and between d0d4a2d and fce85f5.

📒 Files selected for processing (2)
  • scripts/benchmark_report.py
  • tests/test_benchmark_report.py

Walkthrough

Added a helper function _merged_record_context() to intelligently merge benchmark metadata (target, size, files) between current and baseline records, using current values and falling back to baseline when current values are absent (marked as "-"). Updated the summary builder to use this new function, plus added test coverage for the fallback behavior.

Changes

Cohort / File(s) Summary
Benchmark Metadata Merging
scripts/benchmark_report.py
Added _merged_record_context() function that selects benchmark context fields (target, size, files) from current record unless they are "-", in which case it falls back to baseline values. Updated _build_summary() to use this function when building ComparisonRow entries for shared benchmarks.
Test Coverage
tests/test_benchmark_report.py
Added test_benchmark_report_uses_baseline_size_when_current_metadata_partial() to verify that when current benchmark metadata is partial (missing size/file count), the generated output correctly uses baseline values while retaining the current record's path information.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A fuzzy-eared fix for gaps in the data,
When benchmarks run thin, we've now got a matter—
Baseline steps in where current falls short,
Merging with grace, of every sort!
No "-" can stop us, we fill every place, 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a test to cover metadata fallback scenarios in benchmark comparison logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch automation/test-gap-detection-20260331

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 6 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 1 improved, 5 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 900.71ms -> 752.49ms (-16.5%).

Top improvements:

  • tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle -95.9% (155.08ms -> 6.44ms, safe_model.pkl, size=49.4 KiB, files=1)
Benchmark Target Size Files Baseline Current Change Status
tests/benchmarks/test_scan_benchmarks.py::test_scan_safe_pickle safe_model.pkl 49.4 KiB 1 155.08ms 6.44ms -95.9% improved
tests/benchmarks/test_scan_benchmarks.py::test_detect_file_format_safe_pickle safe_model.pkl 49.4 KiB 1 125.6us 127.7us +1.6% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_directory duplicate-corpus 840.0 KiB 81 123.54ms 124.68ms +0.9% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_pytorch_zip state_dict.pt 1.5 MiB 1 290.11ms 288.72ms -0.5% stable
tests/benchmarks/test_scan_benchmarks.py::test_scan_mixed_directory mixed-corpus 1.7 MiB 54 331.81ms 332.49ms +0.2% stable
tests/benchmarks/test_scan_benchmarks.py::test_validate_file_type_pytorch_zip state_dict.pt 1.5 MiB 1 42.3us 42.2us -0.2% stable

@mldangelo mldangelo merged commit ca33c83 into main Mar 31, 2026
24 checks passed
@mldangelo mldangelo deleted the automation/test-gap-detection-20260331 branch March 31, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants