-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29802 NPE when shutting down mini cluster cause tests hang #7604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Let's see whether there are other side effects. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Seems the file system was still closed by someone... Added a debug log to see if we can get the root cause, and convert the PR to draft. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
OK, after fixing the MiniMRCluster issue, TestExportSnapshot passed. Let me check what is the problem for TestAcid... |
|
At least TestAcid is not related to NPE in shutdown. So let's get this merged first. Filed HBASE-29817 for fixing the log flooding issue in TestAcid first. |
This comment has been minimized.
This comment has been minimized.
Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close the FileSystem instance of a region server. Set DistributedFileSystem cache to false when creating MiniMRCluster so when shutting down MiniMRCluster it will not close the shared FileSystem instance. Also re-enable TestExportSnapshot.
ndimiduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice find! This issue has plagued us for years. I hope you've got it 🤞
This comment has been minimized.
This comment has been minimized.
|
🎊 +1 overall
This message was automatically generated. |
|
@ndimiduk This is the failure message for TestNettyTlsIPC... Seems something related to the initialization of OpenTelemetry? |
|
A bug in the lifecycle of the otel junit extension? or how we're configured it, at least? |
|
The Rule is non-static, so should be fresh for each method, https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/AbstractTestIPC.java#L135-L136 |
|
I think there could be race that, after you reset globalOpenTelemetry but before you set it again, another thread calls get and set globalOpenTelemetry to non null... Maybe we should do some retries here? |
|
I think it's the test Rule that managed the global instance. Are parallel test executions stepping on each other's static state? |
GlobalOpenTelemetry.getPropagators() may lead to a GlobalOpenTelemetry.set call, see the above stacktrace There could be other threads which are still running after a test is marked as finished, these threads may have race with the main test thread. I guess this is the problem? |
|
🎊 +1 overall
This message was automatically generated. |
Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close the FileSystem instance of a region server. Set DistributedFileSystem cache to false when creating MiniMRCluster so when shutting down MiniMRCluster it will not close the shared FileSystem instance. Also re-enable TestExportSnapshot. Signed-off-by: Nick Dimiduk <[email protected]> (cherry picked from commit d11ee6a)
…che#7604) Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close the FileSystem instance of a region server. Set DistributedFileSystem cache to false when creating MiniMRCluster so when shutting down MiniMRCluster it will not close the shared FileSystem instance. Also re-enable TestExportSnapshot. Signed-off-by: Nick Dimiduk <[email protected]> (cherry picked from commit d11ee6a)
Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close the FileSystem instance of a region server.
Also re-enable TestExportSnapshot.