Skip to content

feat(benchmarking): adding gas burner test#3115

Merged
chatton merged 25 commits intomainfrom
cian/gass-burner-2
Mar 9, 2026
Merged

feat(benchmarking): adding gas burner test#3115
chatton merged 25 commits intomainfrom
cian/gass-burner-2

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Mar 2, 2026

Overview

closes: #3123

Adds TestGasBurner to measure gas throughput using a deterministic gasburner workload. Reports seconds_per_gigagas as the primary metric.

Summary by CodeRabbit

  • Tests
    • Added new end-to-end EVM gas burn benchmarking test with comprehensive workload configuration and performance metric collection.
    • Enhanced benchmark metrics to track average block interval, enabling more detailed performance analysis and dashboard reporting.

…mark

- Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite)
- Move spamoor smoke test into suite as TestSpamoorSmoke
- Split helpers into focused files: traces.go, output.go, metrics.go
- Introduce resultWriter for defer-based benchmark JSON output
- Export shared symbols from evm_test_common.go for cross-package use
- Restructure CI to fan-out benchmark jobs and fan-in publishing
- Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring:
- benchmark.yml: keep path filters and suite-style test command
- evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg)
- evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON
  (now in benchmark/output.go)
go test sets the working directory to the package under test, so the
env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as
test binary args, so ./benchmark/ was never recognized as a package
pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/),
so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e
passes. It has its own CI workflow so it doesn't need to run here.
…nfig

collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes.
Replace with direct header iteration over [startBlock, endBlock] and add
Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and
tx/block p50/p99.

Optimize spamoor configuration for 100ms block time:
- --slot-duration 100ms, --startup-delay 0 on daemon
- throughput=50 per 100ms slot (500 tx/s per spammer)
- max_pending=50000 to avoid 3s block poll backpressure
- 5 staggered spammers with 50K txs each

Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up
- Replace 20s drain sleep with smart poll (waitForDrain)
- Add deleteAllSpammers cleanup to handle stale spamoor DB entries
- Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140)
- fix OTLP config: remove duplicate env vars, use http/protobuf protocol
- use require.Eventually for host readiness polling
- rename requireHTTP to requireHostUp
- use non-fatal logging in resultWriter.flush deferred context
- fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv)
- rename loop variable to avoid shadowing testing.TB convention
- add block/internal/executing/** to CI path trigger
- remove unused require import from output.go
# Conflicts:
#	scripts/test.mk
# Conflicts:
#	test/e2e/benchmark/suite_test.go
# Conflicts:
#	test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv
so all benchmark tests share the same reth configuration. lower ERC20
spammer count from 5 to 2 to reduce resource contention on local
hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods
- add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead
- add collectTraces() suite method to deduplicate trace collection pattern
- add addEntries() convenience method on resultWriter
- slim TestERC20Throughput from ~217 to ~119 lines
- reuse collectTraces in TestSpamoorSmoke
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

Introduces a new end-to-end benchmark test, TestGasBurner, that measures gas throughput using deterministic gas-burning workloads. The test configures spammers, polls metrics through a warmup phase, collects block metrics, and derives seconds_per_gigagas performance results. Helper functions are extended to track average block intervals during measurement windows.

Changes

Cohort / File(s) Summary
New Benchmark Test
test/e2e/benchmark/gasburner_test.go
Introduces TestGasBurner method implementing EVM gas throughput benchmarking with spamoor workload configuration, metric polling, block metric collection, and performance result generation.
Helper Function Enhancements
test/e2e/benchmark/helpers.go
Adds AvgBlockInterval field to blockMetricsSummary struct and updates summarization logic to compute and report average block interval across measured blocks.

Sequence Diagram

sequenceDiagram
    participant Test as TestGasBurner
    participant Spamoor as Spamoor API
    participant Metrics as Metrics Poller
    participant Blocks as Block Headers
    participant Tracer as Trace Collector
    participant Results as Result Writer

    Test->>Spamoor: Configure & create spammers
    Test->>Metrics: Poll for warmup completion
    Metrics-->>Test: Warmup complete
    Test->>Blocks: Capture start block & time
    Test->>Spamoor: Wait for all txs sent
    Test->>Blocks: Drain pending transactions
    Test->>Blocks: Collect block metrics (window)
    Blocks-->>Test: Block intervals & gas data
    Test->>Test: Summarize metrics<br/>(calculate avg block interval)
    Test->>Tracer: Collect traces
    Tracer-->>Test: Trace data & ev-node overhead
    Test->>Results: Append performance metrics<br/>(seconds_per_gigagas)
    Results-->>Test: Results recorded
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • alpe
  • julienrbrt

Poem

🐰 A burner test to measure gas,
With metrics flowing, quick and fast!
Block intervals now tracked with care,
Gigagas measured in the air! ⛽✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a gas burner test for benchmarking purposes, which matches the primary objective of the PR.
Description check ✅ Passed The description includes the required overview section with issue reference (closes: #3123) and a concise explanation of what the PR adds, meeting the template requirements.
Linked Issues check ✅ Passed The PR successfully implements TestGasBurner with gas throughput measurement and seconds_per_gigagas metric reporting, directly addressing all coding requirements from issue #3123.
Out of Scope Changes check ✅ Passed All changes are focused on implementing the gas burner benchmark test and its supporting metrics infrastructure, staying within the scope of issue #3123.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cian/gass-burner-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatton chatton mentioned this pull request Mar 2, 2026
9 tasks
@claude
Copy link
Contributor

claude bot commented Mar 2, 2026

Claude finished @chatton's task in 2m 19s —— View job


Code Review: feat(benchmarking): adding gas burner test

  • Gather context and read changed files
  • Review test/e2e/benchmark/gasburner_test.go
  • Review test/e2e/benchmark/helpers.go changes
  • Post review feedback

Overall this is a solid benchmark addition. The blockMetrics / blockMetricsSummary abstraction in helpers.go is clean, well-documented, and will make it easy to add future workloads. A few issues worth addressing before merge:

Issues

1. No validation of transaction success/failure (gasburner_test.go)

TestERC20Throughput guards against silent failures with:

s.Require().Greater(sent, float64(0), "at least one transaction should have been sent")
s.Require().Zero(failed, "no transactions should have failed")

TestGasBurner has no equivalent assertions. The test currently passes even if spamoor sends 0 transactions or all of them fail, because the only assertion after the metric-wait is on summary.SteadyState > 0 — which could theoretically be satisfied by unrelated traffic. Consider adding similar success/failure guards. Fix this →

2. waitForDrain error is silently discarded (gasburner_test.go:82)

waitForDrain(drainCtx, t.Logf, e.ethClient, 10)  // return value dropped

Compare to TestERC20Throughput:77 which at least logs a warning:

if err := waitForDrain(drainCtx, t.Logf, e.ethClient, 10); err != nil {
    t.Logf("warning: %v", err)
}

A drain timeout silently skips the drain phase and could cause gas metrics to include post-load empty blocks, skewing secsPerGigagas. Fix this →

3. waitForMetricTarget is not context-aware (helpers.go:386-398)

func waitForMetricTarget(t testing.TB, ..., timeout time.Duration) {
    deadline := time.Now().Add(timeout)
    for time.Now().Before(deadline) {
        ...
        time.Sleep(2 * time.Second)  // not cancellable
    }
    t.Fatalf(...)
}

The function uses a hard-coded time.Sleep, which cannot be interrupted by test-context cancellation (e.g. t.Deadline()). On CI, if the test binary receives SIGTERM, this goroutine will block for up to 2 seconds past the signal. A time.NewTicker with a select on ctx.Done() would be more robust — see the pattern already used in waitForSpamoorDone and waitForDrain. Fix this →

4. secsPerGigagas of 0 is written to benchmark results without a warning (gasburner_test.go:99-120)

var secsPerGigagas float64
if summary.AchievedMGas > 0 {
    secsPerGigagas = 1000.0 / summary.AchievedMGas
}
// ... secsPerGigagas (still 0) written to results
w.addEntry(entry{Name: ..., Unit: "s/Ggas", Value: secsPerGigagas})

If gas throughput is somehow zero (e.g. all blocks empty), a value of 0 is written to the dashboard. Since the dashboard tracks "lower is better", 0 would appear as a record-breaking result. At minimum, log a warning (or t.Fatal) when AchievedMGas == 0. Fix this →

Nits

5. Missing comments on gasburnerCfg values (gasburner_test.go:38-50)

TestERC20Throughput annotates its config: "throughput": 50, // 50 tx per 100ms slot = 500 tx/s per spammer. The gasburner config is unannotated — especially gas_units_to_burn: 5_000_000 and throughput: 25. A line comment explaining the intended per-block gas target or effective send rate would help future readers calibrate changes.

6. Style inconsistency: api local vs direct field access (gasburner_test.go:34)

api := e.spamoorAPI  // gasburner uses a local alias

TestERC20Throughput uses e.spamoorAPI directly throughout. The alias adds an extra name for no functional benefit — prefer the direct field access for consistency.

7. waitForMetricTarget silently ignores poll errors (helpers.go:390-394)

v, err := poll()
if err == nil && v >= target {  // errors are simply retried indefinitely

If poll() returns a persistent error (e.g. spamoor API is down), the loop will spin until timeout with no indication of why. At minimum, log the error on each failed poll so CI logs are diagnosable.


Summary

The core logic, metric collection, and block-stats abstraction are well-done. The main items to address before merge are: (1) add sent/failed assertions at the end of the test, (2) surface the waitForDrain error, and (3) guard against a zero secsPerGigagas polluting the dashboard. The context-aware waitForMetricTarget is a quality-of-life improvement worth doing but not strictly blocking.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 5, 2026, 10:29 AM

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.94%. Comparing base (b1e3010) to head (3eb733a).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3115      +/-   ##
==========================================
- Coverage   61.00%   60.94%   -0.06%     
==========================================
  Files         113      113              
  Lines       11700    11700              
==========================================
- Hits         7137     7131       -6     
- Misses       3753     3759       +6     
  Partials      810      810              
Flag Coverage Δ
combined 60.94% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from cian/erc20-benchmark to main March 4, 2026 11:08
@chatton chatton marked this pull request as ready for review March 5, 2026 10:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/e2e/benchmark/helpers.go (1)

386-398: Include last poll error in timeout failure output.

On timeout, this helper currently reports only that the target wasn’t reached. Including the last polling error makes benchmark failures much easier to triage.

💡 Suggested improvement
 func waitForMetricTarget(t testing.TB, name string, poll func() (float64, error), target float64, timeout time.Duration) {
 	t.Helper()
 	deadline := time.Now().Add(timeout)
+	var lastErr error
 	for time.Now().Before(deadline) {
 		v, err := poll()
 		if err == nil && v >= target {
 			t.Logf("metric %s reached %.0f (target %.0f)", name, v, target)
 			return
 		}
+		if err != nil {
+			lastErr = err
+		}
 		time.Sleep(2 * time.Second)
 	}
+	if lastErr != nil {
+		t.Fatalf("metric %s did not reach target %.0f within %v (last poll error: %v)", name, target, timeout, lastErr)
+	}
 	t.Fatalf("metric %s did not reach target %.0f within %v", name, target, timeout)
 }

As per coding guidelines, "Use structured logging and include relevant context in log messages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/helpers.go` around lines 386 - 398, The timeout failure
should include the last polling error: modify waitForMetricTarget to capture the
last error returned by poll (e.g., add a local variable lastErr error and update
it each iteration when poll() returns an error), and when timing out use
t.Fatalf to include that lastErr (and optionally last observed value v) in the
message so the failure reports the final poll error context; the change should
be made inside the waitForMetricTarget function around the poll() call and the
final t.Fatalf.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/gasburner_test.go`:
- Line 88: The log message reports the number of blocks as endBlock-startBlock
but the range is inclusive [startBlock, endBlock]; update the t.Logf call that
currently prints "end block: %d (range %d blocks)" to compute the range as
endBlock - startBlock + 1 so the logged block count matches the inclusive range
used for metrics collection in gasburner_test.go.
- Line 82: The call to waitForDrain(drainCtx, t.Logf, e.ethClient, 10) currently
discards its error; update the test to capture the returned error from
waitForDrain and fail the test on non-nil (e.g., if err != nil then
t.Fatalf("waitForDrain failed: %v", err)) so the test stops on drain failures
and avoids running with an undrained mempool. Ensure you reference the existing
symbols waitForDrain, drainCtx, t.Logf, and e.ethClient when making the change.

---

Nitpick comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 386-398: The timeout failure should include the last polling
error: modify waitForMetricTarget to capture the last error returned by poll
(e.g., add a local variable lastErr error and update it each iteration when
poll() returns an error), and when timing out use t.Fatalf to include that
lastErr (and optionally last observed value v) in the message so the failure
reports the final poll error context; the change should be made inside the
waitForMetricTarget function around the poll() call and the final t.Fatalf.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c547032-e97b-4885-aaff-bf680aa80fef

📥 Commits

Reviewing files that changed from the base of the PR and between f0eba0d and 3eb733a.

📒 Files selected for processing (2)
  • test/e2e/benchmark/gasburner_test.go
  • test/e2e/benchmark/helpers.go

// wait for pending txs to drain
drainCtx, drainCancel := context.WithTimeout(ctx, 30*time.Second)
defer drainCancel()
waitForDrain(drainCtx, t.Logf, e.ethClient, 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle waitForDrain errors instead of dropping them.

Line 82 ignores the returned error, so the test can proceed with a non-drained mempool and publish skewed throughput numbers.

✅ Proposed fix
-	waitForDrain(drainCtx, t.Logf, e.ethClient, 10)
+	s.Require().NoError(waitForDrain(drainCtx, t.Logf, e.ethClient, 10), "failed waiting for mempool drain")

As per coding guidelines, "Return errors early" and "Ensure tests are deterministic".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/gasburner_test.go` at line 82, The call to
waitForDrain(drainCtx, t.Logf, e.ethClient, 10) currently discards its error;
update the test to capture the returned error from waitForDrain and fail the
test on non-nil (e.g., if err != nil then t.Fatalf("waitForDrain failed: %v",
err)) so the test stops on drain failures and avoids running with an undrained
mempool. Ensure you reference the existing symbols waitForDrain, drainCtx,
t.Logf, and e.ethClient when making the change.

endHeader, err := e.ethClient.HeaderByNumber(ctx, nil)
s.Require().NoError(err, "failed to get end block header")
endBlock := endHeader.Number.Uint64()
t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix inclusive block-range count in log message.

Line 88 reports endBlock-startBlock as “range X blocks”, but metrics are collected over an inclusive range [startBlock, endBlock].

🛠️ Proposed fix
-	t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock)
+	t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock+1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock)
t.Logf("end block: %d (range %d blocks)", endBlock, endBlock-startBlock+1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/gasburner_test.go` at line 88, The log message reports the
number of blocks as endBlock-startBlock but the range is inclusive [startBlock,
endBlock]; update the t.Logf call that currently prints "end block: %d (range %d
blocks)" to compute the range as endBlock - startBlock + 1 so the logged block
count matches the inclusive range used for metrics collection in
gasburner_test.go.

@chatton chatton added this pull request to the merge queue Mar 9, 2026
Merged via the queue into main with commit 3823efd Mar 9, 2026
41 checks passed
@chatton chatton deleted the cian/gass-burner-2 branch March 9, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add gas burner benchmark test

2 participants