feat(benchmarking): adding gas burner test#3115
Conversation
…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)
go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.
…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go
# Conflicts: # scripts/test.mk
# Conflicts: # test/e2e/benchmark/suite_test.go
# Conflicts: # test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke
📝 WalkthroughWalkthroughIntroduces a new end-to-end benchmark test, TestGasBurner, that measures gas throughput using deterministic gas-burning workloads. The test configures spammers, polls metrics through a warmup phase, collects block metrics, and derives seconds_per_gigagas performance results. Helper functions are extended to track average block intervals during measurement windows. Changes
Sequence DiagramsequenceDiagram
participant Test as TestGasBurner
participant Spamoor as Spamoor API
participant Metrics as Metrics Poller
participant Blocks as Block Headers
participant Tracer as Trace Collector
participant Results as Result Writer
Test->>Spamoor: Configure & create spammers
Test->>Metrics: Poll for warmup completion
Metrics-->>Test: Warmup complete
Test->>Blocks: Capture start block & time
Test->>Spamoor: Wait for all txs sent
Test->>Blocks: Drain pending transactions
Test->>Blocks: Collect block metrics (window)
Blocks-->>Test: Block intervals & gas data
Test->>Test: Summarize metrics<br/>(calculate avg block interval)
Test->>Tracer: Collect traces
Tracer-->>Test: Trace data & ev-node overhead
Test->>Results: Append performance metrics<br/>(seconds_per_gigagas)
Results-->>Test: Results recorded
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @chatton's task in 2m 19s —— View job Code Review:
|
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3115 +/- ##
==========================================
- Coverage 61.00% 60.94% -0.06%
==========================================
Files 113 113
Lines 11700 11700
==========================================
- Hits 7137 7131 -6
- Misses 3753 3759 +6
Partials 810 810
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/e2e/benchmark/helpers.go (1)
386-398: Include last poll error in timeout failure output.On timeout, this helper currently reports only that the target wasn’t reached. Including the last polling error makes benchmark failures much easier to triage.
💡 Suggested improvement
func waitForMetricTarget(t testing.TB, name string, poll func() (float64, error), target float64, timeout time.Duration) { t.Helper() deadline := time.Now().Add(timeout) + var lastErr error for time.Now().Before(deadline) { v, err := poll() if err == nil && v >= target { t.Logf("metric %s reached %.0f (target %.0f)", name, v, target) return } + if err != nil { + lastErr = err + } time.Sleep(2 * time.Second) } + if lastErr != nil { + t.Fatalf("metric %s did not reach target %.0f within %v (last poll error: %v)", name, target, timeout, lastErr) + } t.Fatalf("metric %s did not reach target %.0f within %v", name, target, timeout) }As per coding guidelines, "Use structured logging and include relevant context in log messages".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/helpers.go` around lines 386 - 398, The timeout failure should include the last polling error: modify waitForMetricTarget to capture the last error returned by poll (e.g., add a local variable lastErr error and update it each iteration when poll() returns an error), and when timing out use t.Fatalf to include that lastErr (and optionally last observed value v) in the message so the failure reports the final poll error context; the change should be made inside the waitForMetricTarget function around the poll() call and the final t.Fatalf.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/benchmark/gasburner_test.go`:
- Line 88: The log message reports the number of blocks as endBlock-startBlock
but the range is inclusive [startBlock, endBlock]; update the t.Logf call that
currently prints "end block: %d (range %d blocks)" to compute the range as
endBlock - startBlock + 1 so the logged block count matches the inclusive range
used for metrics collection in gasburner_test.go.
- Line 82: The call to waitForDrain(drainCtx, t.Logf, e.ethClient, 10) currently
discards its error; update the test to capture the returned error from
waitForDrain and fail the test on non-nil (e.g., if err != nil then
t.Fatalf("waitForDrain failed: %v", err)) so the test stops on drain failures
and avoids running with an undrained mempool. Ensure you reference the existing
symbols waitForDrain, drainCtx, t.Logf, and e.ethClient when making the change.
---
Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 386-398: The timeout failure should include the last polling
error: modify waitForMetricTarget to capture the last error returned by poll
(e.g., add a local variable lastErr error and update it each iteration when
poll() returns an error), and when timing out use t.Fatalf to include that
lastErr (and optionally last observed value v) in the message so the failure
reports the final poll error context; the change should be made inside the
waitForMetricTarget function around the poll() call and the final t.Fatalf.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c547032-e97b-4885-aaff-bf680aa80fef
📒 Files selected for processing (2)
test/e2e/benchmark/gasburner_test.gotest/e2e/benchmark/helpers.go
| // wait for pending txs to drain | ||
| drainCtx, drainCancel := context.WithTimeout(ctx, 30*time.Second) | ||
| defer drainCancel() | ||
| waitForDrain(drainCtx, t.Logf, e.ethClient, 10) |
There was a problem hiding this comment.
Handle waitForDrain errors instead of dropping them.
Line 82 ignores the returned error, so the test can proceed with a non-drained mempool and publish skewed throughput numbers.
✅ Proposed fix
- waitForDrain(drainCtx, t.Logf, e.ethClient, 10)
+ s.Require().NoError(waitForDrain(drainCtx, t.Logf, e.ethClient, 10), "failed waiting for mempool drain")As per coding guidelines, "Return errors early" and "Ensure tests are deterministic".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/benchmark/gasburner_test.go` at line 82, The call to
waitForDrain(drainCtx, t.Logf, e.ethClient, 10) currently discards its error;
update the test to capture the returned error from waitForDrain and fail the
test on non-nil (e.g., if err != nil then t.Fatalf("waitForDrain failed: %v",
err)) so the test stops on drain failures and avoids running with an undrained
mempool. Ensure you reference the existing symbols waitForDrain, drainCtx,
t.Logf, and e.ethClient when making the change.
| endHeader, err := e.ethClient.HeaderByNumber(ctx, nil) | ||
| s.Require().NoError(err, "failed to get end block header") | ||
| endBlock := endHeader.Number.Uint64() | ||
| t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock) |
There was a problem hiding this comment.
Fix inclusive block-range count in log message.
Line 88 reports endBlock-startBlock as “range X blocks”, but metrics are collected over an inclusive range [startBlock, endBlock].
🛠️ Proposed fix
- t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock)
+ t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock+1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock) | |
| t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock+1) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/benchmark/gasburner_test.go` at line 88, The log message reports the
number of blocks as endBlock-startBlock but the range is inclusive [startBlock,
endBlock]; update the t.Logf call that currently prints "end block: %d (range %d
blocks)" to compute the range as endBlock - startBlock + 1 so the logged block
count matches the inclusive range used for metrics collection in
gasburner_test.go.
Overview
closes: #3123
Adds
TestGasBurnerto measure gas throughput using a deterministic gasburner workload. Reports seconds_per_gigagas as the primary metric.Summary by CodeRabbit