SOLR-17875: Ensure SolrMetricsContext is closed#4226
SOLR-17875: Ensure SolrMetricsContext is closed#4226dsmiley wants to merge 8 commits intoapache:mainfrom
Conversation
SolrMetricsContext is now AutoCloseable. Renamed unregister() to close().
|
|
||
| customThreadPool.execute(replayUpdatesExecutor::shutdownAndAwaitTermination); | ||
|
|
||
| // Shutdown GPU metrics service if it was initialized |
There was a problem hiding this comment.
Shutdown/close should do its work in reverse order of construction. Since metrics is initialized super early, I moved the compensating shutdown logic to the end.
|
|
||
| shutdownGpuMetricsService(); // Shutdown GPU metrics service if it was initialized | ||
|
|
||
| IOUtils.closeQuietly(solrMetricsContext); |
| } | ||
| org.apache.lucene.util.IOUtils.closeWhileHandlingException(loader); // best effort | ||
|
|
||
| containerHandlers.close(); |
| private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); | ||
|
|
||
| private HandlerMetrics metricsShard = HandlerMetrics.NO_OP; | ||
| private HandlerMetrics metricsShard; // nocommit NO_OP |
There was a problem hiding this comment.
i'll remove the nocommit ; I wanted to see if we could eliminate NO_OP entirely, and we can.
There was a problem hiding this comment.
I'm more convinced than ever that the NO_OP was a bad move. It was making problems that should have been root-caused. Handlers cannot be used until they have been initialized (including metrics). A follow-up commit here will show that InfoHandler was creating sub-handlers and not fully initializing them, thus any metrics such sub-handlers wanted to create were not getting closed. PublicKeyHandler seems affected as well.
|
|
||
| protected boolean publishCpuTime = Boolean.getBoolean(ThreadCpuTimer.ENABLE_CPU_TIME); | ||
|
|
||
| @SuppressForbidden(reason = "Need currentTimeMillis, used only for stats output") |
|
|
||
| @Override | ||
| public void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes) { | ||
| IOUtils.closeQuietly(this.solrMetricsContext); |
There was a problem hiding this comment.
Unsure of this... I suppose if it's already set. Lemme see why that could be...
Ah; it's not needed. Will replace with a comment.
| this.nodeMetricsContext = | ||
| new SolrMetricsContext( | ||
| primaryMetricsContext.getMetricManager(), SolrMetricManager.NODE_REGISTRY); | ||
| primaryMetricsContext.registerCloseable(this.nodeMetricsContext); |
There was a problem hiding this comment.
Admittedly, this is the only non-test caller of the registerCloseable method I added
| assert ObjectReleaseTracker.track(this); | ||
| } | ||
|
|
||
| public SolrMetricsContext(SolrMetricManager metricManager, String registryName, Object child) { |
There was a problem hiding this comment.
more info helps debug if there's a failure
|
|
||
| @Override | ||
| public void close() throws IOException { | ||
| IOUtils.closeQuietly(ulog); |
There was a problem hiding this comment.
this was a gap; not sure of the consequences
HoustonPutman
left a comment
There was a problem hiding this comment.
Small comments, overall looks good to me, and it should be easy to tell if the tests lose all the cruft in the logging.
| @After | ||
| public void clearBufferStores() { | ||
| BufferStore.clearBufferStores(); | ||
| org.apache.solr.common.util.IOUtils.closeQuietly(metrics); |
There was a problem hiding this comment.
Feel like this can be imported...
| String lfuCacheName = "lfu_cache"; | ||
| lfuCache.setBacking(cacheScope, backing); | ||
| SolrMetricsContext solrMetricsContext = new SolrMetricsContext(metricManager, registry); | ||
| var solrMetricsContext = new SolrMetricsContext(metricManager, registry); |
There was a problem hiding this comment.
Best to put this in a try-with-resources so that we don't get leaks on a failure.
There was a problem hiding this comment.
I had thought about that. Rather than be super anal about this detail in tests, which is annoying, I'd rather we configure the ORT mechanism to just log a one-liner warning when the test fails, since if a test fails then there's generally a decent chance that something is not closed. I'll do a separate PR for that.
There was a problem hiding this comment.
Hmm; I suppose for some time, ObjectReleaseTracker won't complain if the test fails (for other reasons, not due to itself). The code only checks when afterIsSuccessful So nothing more to do here.
is called before handling requests, preventing memory leaks from untracked SolrMetricsContext instances. Fix PluginBag to track all plugins (V1, V2, and defaults) in allPlugins list, ensuring proper cleanup during replacement and shutdown. Make the list thread-safe when needed. Update InfoHandler and SchemaHandler to initialize metrics on dynamically-created sub-handlers, registering them for cleanup. Fix tests that create handlers directly to initialize metrics and close handlers in tearDown (InfoHandlerTest, MaxSizeAutoCommitTest). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dsmiley
left a comment
There was a problem hiding this comment.
AFAIK the only issue left is {{org.apache.solr.cloud.MissingSegmentRecoveryTest}} which has proved a bit of a nightmare for Claude Opus ... so much so that the best action may be to figure out how to conditionally disable the SolrMetricsContext tracking for just this test.
| solrMetricsContext.instrumentedExecutorService( | ||
| coreAdminAsyncTracker.expensiveExecutor, | ||
| "solr_node_executor", | ||
| "solr_node_expensive_executor", |
There was a problem hiding this comment.
Oh I forgot to mention this, this is not the only instance of this. There are at least a few others. And I wanted to check with you or @mlbiscoc to see if this was the right kind of fix.
There was a problem hiding this comment.
InfoHandler has sub-handlers, and must tend to their lifecycle, including metrics
There was a problem hiding this comment.
BTW I really appreciate the improvements here @HoustonPutman :-)
|
|
||
| @Override | ||
| public void initializeMetrics(SolrMetricsContext parentContext, Attributes attributes) { | ||
| // Store parent context so we can use it in inform() to initialize sub-handlers as siblings |
There was a problem hiding this comment.
I wish inform() calls happened before initializeMetrics... but that's another issue and isn't easy
There was a problem hiding this comment.
Yeah, I agree.
On a somewhat related note, I didn't want to go and change everything here, but I really really wished that initializeMetrics worked the same across all classes. It seems like half of the classes that implement it treat parentContext like its own context (and generally in those cases initializeMetrics is called by the constructor which creates the childContext itself), and then half actually treat it like a parentContext and initialize a childContext in the method. This can get quite confusing trying to figure out where and how these contexts are created and managed.
There was a problem hiding this comment.
Totally agree! Worth a JIRA IMO.
SolrMetricsContext is now AutoCloseable. Renamed unregister() to close().
https://issues.apache.org/jira/browse/SOLR-17875