Skip to content

SOLR-17875: Ensure SolrMetricsContext is closed#4226

Open
dsmiley wants to merge 8 commits intoapache:mainfrom
dsmiley:SOLR-17857-trackSolrMetricsContext
Open

SOLR-17875: Ensure SolrMetricsContext is closed#4226
dsmiley wants to merge 8 commits intoapache:mainfrom
dsmiley:SOLR-17857-trackSolrMetricsContext

Conversation

@dsmiley
Copy link
Contributor

@dsmiley dsmiley commented Mar 18, 2026

SolrMetricsContext is now AutoCloseable. Renamed unregister() to close().

https://issues.apache.org/jira/browse/SOLR-17875

  • Ensure all SolrMetricsContext instances are properly closed to prevent resource leaks
  • Add ObjectReleaseTracker tracking to SolrMetricsContext constructors so leaked instances cause test failures
  • Add registerCloseable(AutoCloseable) to SolrMetricsContext for lifecycle-managing related resources (e.g. AttributedInstrumentFactory's node-level context)
  • Fix several production code bugs where SolrMetricsContext was never closed:
    • RequestHandlerBase.initializeMetrics(): close old context on re-registration instead of asserting null
    • AuditLoggerPlugin.close(): SolrInfoBean.super.close() was guarded by async-only condition
    • Metrics (blockcache): close() override didn't close its metrics context
    • AttributedInstrumentFactory: node-level SolrMetricsContext was never closed
    • CaffeineCache/ThinCache: NPE on close() when not fully initialized
  • Fix ~10 test suites that leaked SolrMetricsContext instances

SolrMetricsContext is now AutoCloseable.  Renamed unregister() to close().

customThreadPool.execute(replayUpdatesExecutor::shutdownAndAwaitTermination);

// Shutdown GPU metrics service if it was initialized
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shutdown/close should do its work in reverse order of construction. Since metrics is initialized super early, I moved the compensating shutdown logic to the end.


shutdownGpuMetricsService(); // Shutdown GPU metrics service if it was initialized

IOUtils.closeQuietly(solrMetricsContext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is new

}
org.apache.lucene.util.IOUtils.closeWhileHandlingException(loader); // best effort

containerHandlers.close();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is new

private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());

private HandlerMetrics metricsShard = HandlerMetrics.NO_OP;
private HandlerMetrics metricsShard; // nocommit NO_OP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll remove the nocommit ; I wanted to see if we could eliminate NO_OP entirely, and we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more convinced than ever that the NO_OP was a bad move. It was making problems that should have been root-caused. Handlers cannot be used until they have been initialized (including metrics). A follow-up commit here will show that InfoHandler was creating sub-handlers and not fully initializing them, thus any metrics such sub-handlers wanted to create were not getting closed. PublicKeyHandler seems affected as well.


protected boolean publishCpuTime = Boolean.getBoolean(ThreadCpuTimer.ENABLE_CPU_TIME);

@SuppressForbidden(reason = "Need currentTimeMillis, used only for stats output")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused


@Override
public void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes) {
IOUtils.closeQuietly(this.solrMetricsContext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure of this... I suppose if it's already set. Lemme see why that could be...
Ah; it's not needed. Will replace with a comment.

this.nodeMetricsContext =
new SolrMetricsContext(
primaryMetricsContext.getMetricManager(), SolrMetricManager.NODE_REGISTRY);
primaryMetricsContext.registerCloseable(this.nodeMetricsContext);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, this is the only non-test caller of the registerCloseable method I added

assert ObjectReleaseTracker.track(this);
}

public SolrMetricsContext(SolrMetricManager metricManager, String registryName, Object child) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more info helps debug if there's a failure


@Override
public void close() throws IOException {
IOUtils.closeQuietly(ulog);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a gap; not sure of the consequences

Copy link
Contributor

@HoustonPutman HoustonPutman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, overall looks good to me, and it should be easy to tell if the tests lose all the cruft in the logging.

@After
public void clearBufferStores() {
BufferStore.clearBufferStores();
org.apache.solr.common.util.IOUtils.closeQuietly(metrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like this can be imported...

String lfuCacheName = "lfu_cache";
lfuCache.setBacking(cacheScope, backing);
SolrMetricsContext solrMetricsContext = new SolrMetricsContext(metricManager, registry);
var solrMetricsContext = new SolrMetricsContext(metricManager, registry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to put this in a try-with-resources so that we don't get leaks on a failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about that. Rather than be super anal about this detail in tests, which is annoying, I'd rather we configure the ORT mechanism to just log a one-liner warning when the test fails, since if a test fails then there's generally a decent chance that something is not closed. I'll do a separate PR for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; I suppose for some time, ObjectReleaseTracker won't complain if the test fails (for other reasons, not due to itself). The code only checks when afterIsSuccessful So nothing more to do here.

HoustonPutman and others added 2 commits March 18, 2026 13:37
is called before handling requests, preventing memory leaks from
untracked SolrMetricsContext instances.

Fix PluginBag to track all plugins (V1, V2, and defaults) in allPlugins
list, ensuring proper cleanup during replacement and shutdown. Make the
list thread-safe when needed.

Update InfoHandler and SchemaHandler to initialize metrics on
dynamically-created sub-handlers, registering them for cleanup.

Fix tests that create handlers directly to initialize metrics and close
handlers in tearDown (InfoHandlerTest, MaxSizeAutoCommitTest).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor Author

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the only issue left is {{org.apache.solr.cloud.MissingSegmentRecoveryTest}} which has proved a bit of a nightmare for Claude Opus ... so much so that the best action may be to figure out how to conditionally disable the SolrMetricsContext tracking for just this test.

solrMetricsContext.instrumentedExecutorService(
coreAdminAsyncTracker.expensiveExecutor,
"solr_node_executor",
"solr_node_expensive_executor",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah; a fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot to mention this, this is not the only instance of this. There are at least a few others. And I wanted to check with you or @mlbiscoc to see if this was the right kind of fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InfoHandler has sub-handlers, and must tend to their lifecycle, including metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I really appreciate the improvements here @HoustonPutman :-)


@Override
public void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes) {
// Store parent context so we can use it in inform() to initialize sub-handlers as siblings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish inform() calls happened before initializeMetrics... but that's another issue and isn't easy

Copy link
Contributor

@HoustonPutman HoustonPutman Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

On a somewhat related note, I didn't want to go and change everything here, but I really really wished that initializeMetrics worked the same across all classes. It seems like half of the classes that implement it treat parentContext like its own context (and generally in those cases initializeMetrics is called by the constructor which creates the childContext itself), and then half actually treat it like a parentContext and initialize a childContext in the method. This can get quite confusing trying to figure out where and how these contexts are created and managed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree! Worth a JIRA IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants