Memoize Runnable#mode to stop allocating a String + StringInquirer pe…#744
Open
s01ipsist wants to merge 1 commit intorails:mainfrom
Open
Memoize Runnable#mode to stop allocating a String + StringInquirer pe…#744s01ipsist wants to merge 1 commit intorails:mainfrom
s01ipsist wants to merge 1 commit intorails:mainfrom
Conversation
…r poll Before 960694e (Dec 2023, "Extract Supervised concern from Runnable"), the mode was set once in #start as `@mode = mode.to_s.inquiry` and read back via `attr_reader :mode`. After that commit, the reader was replaced with a method that re-runs `(@mode || DEFAULT_MODE).to_s.inquiry` on every call. `#mode` is invoked in the poll-loop hot path via `running_async?` / `running_as_fork?` / `running_inline?`, themselves called from `shutting_down?` on every iteration of the dispatcher, worker, and scheduler loops. With the default `queue.yml` (1s dispatcher, 0.1s worker), this is ~3600 redundant allocations per hour per process — a String and an ActiveSupport::StringInquirer for a value that doesn't change after the supervisor sets it once. Restoring the original "compute the inquiry at write time, cache it afterwards" semantic via a custom `mode=` writer eliminates the allocations with no behaviour change. The default-mode fallback (used when `mode=` was never called) memoizes on first read of `#mode`. Reproduction harness and validation: https://github.com/<TODO>/solid-queue-idle-memory-repro heapy diff with `GC.start(full_mark: true, immediate_sweep: true)` before each `ObjectSpace.dump_all` confirms the `runnable.rb` line disappears from retained-allocation reports after this change. Full test suite (227 runs, 1299 assertions) passes with no regressions. Refs rails#262, rails#330, rails#405.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Memoize
Runnable#modeto stop allocating per pollSummary
SolidQueue::Processes::Runnable#modere-runs(@mode || DEFAULT_MODE).to_s.inquiryon every call, allocating a fresh
StringandActiveSupport::StringInquirer.Because the poll loops invoke
running_as_fork?/running_async?/running_inline?(all of which call#mode) fromshutting_down?on everyiteration, every dispatcher / worker / scheduler tick produces redundant
allocations for a value that never changes after the supervisor sets it.
This PR restores the pre-960694e8 semantic of "compute the inquiry once,
cache it" by moving the inquiry into a custom
mode=writer and switchingthe reader to memoize the default fallback.
How this regressed
The current per-call allocation was introduced in
960694e8("Extract Supervised concern from Runnable"). Before that commit, the mode
was set once in
#start:After the refactor, the reader was replaced with:
The intent of the change (allowing the writer to be called separately) is
preserved by the patch — the inquiry is just done at write time.
The diff
Behaviour is identical:
instance.mode = :inlinestill results ininstance.mode.inline?returning true, and an unset mode still defaults to:async. The only change is when the inquiry wrapping happens — once atwrite time instead of once per read.
Reproduction harness
solid-queue-idle-memory-reprois two minimal Rails 8.1 apps (one with
--skip-solid, one with thein-Puma SQ plugin), run side-by-side at zero traffic with summed-RSS
sampling and SIGURG-triggered
ObjectSpace.dump_all. The README walksthrough methodology, results, and caveats; the validation patch in
app-sq-idle/config/initializers/sq_runnable_memoize.rbapplies theequivalent of this PR as a monkeypatch and was used to confirm the fix
lands cleanly before this PR was opened.
Headline numbers from the harness:
app-async(control): RSS flat at idle.app-sq-idle(default config): ~30–50 kB/min RSS growth at idle.runnable.rb:36shows up inheapy diffwith 376 retained Strings and 367 retained StringInquirers in
the dispatcher process alone (per hour).
GC.start(full_mark: true)before each heap dump, the line disappears from the diff entirely.
The bigger-picture observation (~3 GB/month allocator-fragmentation drift
from poll-loop churn) is best addressed by process recycling, which is
already proposed as #262 — this PR isn't trying to fix that, just to
remove one specifically-Solid-Queue-side contributor.
Related issues
This is a small, narrow fix. The broader memory-growth reports in #262,
#330, and #405 likely have additional contributors (allocator
fragmentation under the polling architecture, cumulative AR query cache
growth, etc.) that aren't addressed here. But this is one allocation
hotspot that's clearly attributable to a Solid Queue source line and has
a trivial fix.
Methodology note
For anyone else doing heap diffs on poll-loop processes:
heapy diffwill dramatically over-report retention unless you call
GC.start(full_mark: true, immediate_sweep: true)immediately beforeObjectSpace.dump_all. Without that, young-generation objects allocatedsince the last GC appear in both snapshots and look like leaks. With it,
real retention shows up. (My initial diffs against this codebase pointed
at
ActiveSupport::Notifications::FanoutandActiveRecord::Relationretention; both turned out to be young-gen artifacts that GC cleanly.)
Tests
Full test suite passes against SQLite (
227 runs, 1299 assertions, 0 failures, 0 errors). No new tests added — the existing dispatcher /worker / scheduler tests already exercise the modes that read this
value.