Skip to content

Conversation

@Apache9
Copy link
Contributor

@Apache9 Apache9 commented Jan 7, 2026

Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close the FileSystem instance of a region server.
Also re-enable TestExportSnapshot.

@Apache9 Apache9 self-assigned this Jan 7, 2026
@Apache9
Copy link
Contributor Author

Apache9 commented Jan 7, 2026

Let's see whether there are other side effects.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9 Apache9 marked this pull request as draft January 8, 2026 08:35
@Apache9
Copy link
Contributor Author

Apache9 commented Jan 8, 2026

Seems the file system was still closed by someone...

Added a debug log to see if we can get the root cause, and convert the PR to draft.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor Author

Apache9 commented Jan 9, 2026

OK, after fixing the MiniMRCluster issue, TestExportSnapshot passed.

Let me check what is the problem for TestAcid...

@Apache9 Apache9 marked this pull request as ready for review January 9, 2026 07:39
@Apache9
Copy link
Contributor Author

Apache9 commented Jan 9, 2026

At least TestAcid is not related to NPE in shutdown. So let's get this merged first.

Filed HBASE-29817 for fixing the log flooding issue in TestAcid first.

@Apache9 Apache9 requested a review from ndimiduk January 9, 2026 07:41
@Apache-HBase

This comment has been minimized.

Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close
the FileSystem instance of a region server.
Set DistributedFileSystem cache to false when creating MiniMRCluster so
when shutting down MiniMRCluster it will not close the shared FileSystem
instance.
Also re-enable TestExportSnapshot.
Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice find! This issue has plagued us for years. I hope you've got it 🤞

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 3m 34s master passed
+1 💚 compile 6m 31s master passed
+1 💚 checkstyle 1m 38s master passed
+1 💚 spotbugs 2m 59s master passed
+1 💚 spotless 0m 56s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 24s the patch passed
+1 💚 compile 6m 31s the patch passed
+1 💚 javac 5m 20s hbase-server in the patch passed.
+1 💚 javac 0m 45s hbase-mapreduce in the patch passed.
+1 💚 javac 0m 26s hbase-testing-util generated 0 new + 19 unchanged - 1 fixed = 19 total (was 20)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 38s the patch passed
+1 💚 spotbugs 3m 13s the patch passed
+1 💚 hadoopcheck 13m 15s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 51s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
54m 36s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7604/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7604
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 865bce3e1350 6.14.0-1018-aws #18~24.04.1-Ubuntu SMP Mon Nov 24 19:46:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / c217dbb
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce hbase-testing-util U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7604/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor Author

Apache9 commented Jan 9, 2026

@ndimiduk This is the failure message for TestNettyTlsIPC...

java.lang.IllegalStateException: GlobalOpenTelemetry.set has already been called. GlobalOpenTelemetry.set must be called only once before any calls to GlobalOpenTelemetry.get. If you are using the OpenTelemetrySdk, use OpenTelemetrySdkBuilder.buildAndRegisterGlobal instead. Previous invocation set to cause of this exception.
	at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:107)
	at io.opentelemetry.sdk.testing.junit4.OpenTelemetryRule.before(OpenTelemetryRule.java:176)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.Throwable
	at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:115)
	at io.opentelemetry.api.GlobalOpenTelemetry.get(GlobalOpenTelemetry.java:85)
	at io.opentelemetry.api.GlobalOpenTelemetry.getPropagators(GlobalOpenTelemetry.java:217)
	at org.apache.hadoop.hbase.ipc.IPCUtil.buildRequestHeader(IPCUtil.java:124)
	at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(NettyRpcDuplexHandler.java:84)
	at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(NettyRpcDuplexHandler.java:116)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:875)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:984)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:868)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:863)
	at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:99)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:398)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:368)
	at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.userEventTriggered(NettyRpcDuplexHandler.java:198)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:398)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:368)
	at org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
	at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:388)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:398)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:368)
	at org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.userEventTriggered(ChannelInboundHandlerAdapter.java:117)
	at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.userEventTriggered(ByteToMessageDecoder.java:388)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:398)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:368)
	at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1375)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:396)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:376)
	at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:862)
	at org.apache.hadoop.hbase.ipc.NettyRpcConnection.established(NettyRpcConnection.java:172)
	at org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.succeed(NettyRpcConnection.java:401)
	at org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.lambda$operationComplete$0(NettyRpcConnection.java:428)
	at org.apache.hadoop.hbase.util.NettyFutureUtils.lambda$addListener$0(NettyFutureUtils.java:56)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:603)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:596)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:572)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:505)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:649)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:638)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:118)
	at org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.setHandshakeSuccess(SslHandler.java:1987)
	at org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.wrapNonAppData(SslHandler.java:1016)
	at org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1555)
	at org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1377)
	at org.apache.hbase.thirdparty.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1428)
	at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
	at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501)
	at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

Seems something related to the initialization of OpenTelemetry?

@ndimiduk
Copy link
Member

ndimiduk commented Jan 9, 2026

A bug in the lifecycle of the otel junit extension? or how we're configured it, at least?

@ndimiduk
Copy link
Member

ndimiduk commented Jan 9, 2026

@Apache9
Copy link
Contributor Author

Apache9 commented Jan 9, 2026

I think there could be race that, after you reset globalOpenTelemetry but before you set it again, another thread calls get and set globalOpenTelemetry to non null...

Maybe we should do some retries here?

@ndimiduk
Copy link
Member

ndimiduk commented Jan 9, 2026

I think it's the test Rule that managed the global instance. Are parallel test executions stepping on each other's static state?

@Apache9
Copy link
Contributor Author

Apache9 commented Jan 9, 2026

I think it's the test Rule that managed the global instance. Are parallel test executions stepping on each other's static state?

  static RequestHeader buildRequestHeader(Call call, CellBlockMeta cellBlockMeta) {
    RequestHeader.Builder builder = RequestHeader.newBuilder();
    builder.setCallId(call.id);
    RPCTInfo.Builder traceBuilder = RPCTInfo.newBuilder();
    GlobalOpenTelemetry.getPropagators().getTextMapPropagator().inject(Context.current(),
      traceBuilder, (carrier, key, value) -> carrier.putHeaders(key, value));
    builder.setTraceInfo(traceBuilder.build());
    builder.setMethodName(call.md.getName());
    builder.setRequestParam(call.param != null);
    if (cellBlockMeta != null) {
      builder.setCellBlockMeta(cellBlockMeta);
    }
    // Only pass priority if there is one set.
    if (call.priority != HConstants.PRIORITY_UNSET) {
      builder.setPriority(call.priority);
    }
    if (call.attributes != null && !call.attributes.isEmpty()) {
      HBaseProtos.NameBytesPair.Builder attributeBuilder = HBaseProtos.NameBytesPair.newBuilder();
      for (Map.Entry<String, byte[]> attribute : call.attributes.entrySet()) {
        attributeBuilder.setName(attribute.getKey());
        attributeBuilder.setValue(UnsafeByteOperations.unsafeWrap(attribute.getValue()));
        builder.addAttribute(attributeBuilder.build());
      }
    }
    builder.setTimeout(call.timeout);

    return builder.build();
  }

GlobalOpenTelemetry.getPropagators() may lead to a GlobalOpenTelemetry.set call, see the above stacktrace

Caused by: java.lang.Throwable
	at io.opentelemetry.api.GlobalOpenTelemetry.set(GlobalOpenTelemetry.java:115)
	at io.opentelemetry.api.GlobalOpenTelemetry.get(GlobalOpenTelemetry.java:85)
	at io.opentelemetry.api.GlobalOpenTelemetry.getPropagators(GlobalOpenTelemetry.java:217)
	at org.apache.hadoop.hbase.ipc.IPCUtil.buildRequestHeader(IPCUtil.java:124)

There could be other threads which are still running after a test is marked as finished, these threads may have race with the main test thread. I guess this is the problem?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 3m 29s master passed
+1 💚 compile 1m 41s master passed
+1 💚 javadoc 0m 57s master passed
+1 💚 shadedjars 6m 19s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 1m 39s the patch passed
+1 💚 javac 1m 39s the patch passed
+1 💚 javadoc 0m 55s the patch passed
+1 💚 shadedjars 6m 15s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 235m 31s hbase-server in the patch passed.
+1 💚 unit 24m 8s hbase-mapreduce in the patch passed.
+1 💚 unit 2m 11s hbase-testing-util in the patch passed.
292m 57s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7604/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7604
Optional Tests javac javadoc unit compile shadedjars
uname Linux 49c5713b82f7 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / c217dbb
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7604/5/testReport/
Max. process+thread count 4263 (vs. ulimit of 30000)
modules C: hbase-server hbase-mapreduce hbase-testing-util U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7604/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache9 Apache9 merged commit d11ee6a into apache:master Jan 10, 2026
1 check passed
Apache9 added a commit that referenced this pull request Jan 10, 2026
Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close
the FileSystem instance of a region server.
Set DistributedFileSystem cache to false when creating MiniMRCluster so
when shutting down MiniMRCluster it will not close the shared FileSystem
instance.
Also re-enable TestExportSnapshot.

Signed-off-by: Nick Dimiduk <[email protected]>
(cherry picked from commit d11ee6a)
Apache9 added a commit to Apache9/hbase that referenced this pull request Jan 10, 2026
…che#7604)

Remove the 'shutdownHook' in SingleProcessHBaseCluster where we close
the FileSystem instance of a region server.
Set DistributedFileSystem cache to false when creating MiniMRCluster so
when shutting down MiniMRCluster it will not close the shared FileSystem
instance.
Also re-enable TestExportSnapshot.

Signed-off-by: Nick Dimiduk <[email protected]>
(cherry picked from commit d11ee6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants