Skip to content

HIVE-29477: Introduce TezSession interface and a new implementation for unmanaged, external Tez sessions#6343

Open
abstractdog wants to merge 5 commits intoapache:masterfrom
abstractdog:HIVE-29477
Open

HIVE-29477: Introduce TezSession interface and a new implementation for unmanaged, external Tez sessions#6343
abstractdog wants to merge 5 commits intoapache:masterfrom
abstractdog:HIVE-29477

Conversation

@abstractdog
Copy link
Contributor

@abstractdog abstractdog commented Mar 2, 2026

What changes were proposed in this pull request?

  1. HiveConf: add options for external sessions discovered from Zookeeper
  2. TezSession: new interface for subclasses like TezSessionState, TezExternalSessionState
  3. TezExternalSessionState: the implementation of the new, unmanaged Tez session, where openInternal simply acquired an ApplicationId instead of submitting a Yarn application
  4. ExternalSessionsRegistry: interface for a Tez session registry that can be implemented for specific frameworks and environments, currently ZookeeperExternalSessionsRegistryClient - this interface complies with a simple contract to get + return a Tez session for Hive
  5. AbstractTriggerValidator: top-level class instead of TezSessionPoolSession.AbstractTriggerValidator
  6. TezSessionState -> TezSession: it's changed at many places in the code, because most of the Hive code has nothing to do with the underlying implementation, so with the fact whether those are managed or unmanaged sessions

Why are the changes needed?

Make able HiveServer2 to submit DAGs to external sessions (not managed by HiveServer2 and Yarn).

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Unit tests added, e2e tests performed with Tez AM docker image from currently in-progress TEZ-4682.

*/
private final Map<String, FunctionInfo> currentFunctionsInUse = new HashMap<>();

private static ExternalSessionsRegistry externalSessions = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be made as a local variable in start() method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't even need a reference on this registry here, removing it altogether

private final Object lock = new Object();
private final int maxAttempts;

private PathChildrenCache cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathChildrenCache is deprecated. Can you please check if we can use CuratorCache API's

@abstractdog
Copy link
Contributor Author

@Aggarwal-Raghav : thanks for your comments so far: can you please check if this patch can work together with your current testing scenario regarding TEZ-4682, I'm specifically interested whether TEZ-4686 goes away if hive is configured accordingly:

hive.server2.use.external.sessions=true
hive.server2.use.external.sessions.namespace=...
hive.server2.tez.external.sessions.registry.class=org.apache.hadoop.hive.ql.exec.tez.ZookeeperExternalSessionsRegistryClient

@Aggarwal-Raghav
Copy link
Contributor

Aggarwal-Raghav commented Mar 7, 2026

Hi @abstractdog , i tested this patch on my local setup (hadoop, hive, TEZ-4682 branch and Zookeeper - installed via brew) and its working🚀. Ran a basic insert command

Attaching the HS2 logs and tez_am docker logs

tez-am-working.log
working_hs2.log

Few things to note:

  1. Add the following in /etc/hosts/ file. this is necesary for docker to communicate with hdfs running on localhost
    127.0.0.1 host.docker.internal
  2. tez-am docker image needs hive-exec jar i.e resorce localization. For now I added it using the plugin directory way i.e. -v "/Users/raghav/Desktop/plugin:/opt/tez/plugins" \ and plugin directory contains only hive-exec jar otherwise .
Vertex vertex_1769280834537_0000_1_01 [Reducer 2] killed/failed due to: INIT_FAILURE] ....
vertex=vertex_1769280834537_0000_1_00 [Map 1], org.apache.tez.dag.api .TezUncheckedException: java. lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.ha
doop.hive.ql.io. CombineHiveInputFormat not found
  1. core-site.xml
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://host.docker.internal:9000</value>
    </property>
  1. hdfs-site.xml Connecting docker AM to datanode was really problematic, took lot of time 😅
    <property>
      <name>dfs.datanode.use.datanode.hostname</name>
      <value>true</value>
    </property>

    <property>
      <name>dfs.datanode.address</name>
      <value>host.docker.internal:9866</value>
    </property>

    <property>
      <name>dfs.datanode.hostname</name>
      <value>host.docker.internal</value>
    </property>
  1. In docker tez-am tez-site.xml (Will update the PR for this as well in TEZ-4682)
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
  1. Some --add-opens in tez-entrypoint.sh needs to be updated. Basically hive project --add-opens are also required. Will update the PR for TEZ-4682.
  2. hive-site.xml
  <property>
      <name>hive.server2.use.external.sessions</name>
      <value>true</value>
  </property>

  <property>
    <name>hive.server2.tez.external.sessions.namespace</name>
    <value>/tez-external-sessions</value>
  </property>

  <property>
      <name>hive.server2.tez.external.sessions.registry.class</name>
      <value>org.apache.hadoop.hive.ql.exec.tez.ZookeeperExternalSessionsRegistryClient</value>
  </property>

  <property>
    <name>hive.zookeeper.quorum</name>
    <value>localhost:2181</value>
  </property>

NOTE/IMPORTANT: There was a flaky behaviour observed with tez-conf.pb in tez-staging directory in hdfs. It was throwing error for No file found. Not sure why it was occuring but it was fixed after some time automatically and not facing it again. Will post here if I face it again.

Screenshot 2026-03-08 at 2 38 12 AM

@abstractdog
Copy link
Contributor Author

hive.server2.tez.external.sessions.namespace

Hi @abstractdog , i tested this patch on my local setup (hadoop, hive, TEZ-4682 branch and Zookeeper - installed via brew) and its working🚀. Ran a basic insert command

Attaching the HS2 logs and tez_am docker logs

tez-am-working.log working_hs2.log

Few things to note:

  1. Add the following in /etc/hosts/ file. this is necesary for docker to communicate with hdfs running on localhost
    127.0.0.1 host.docker.internal
  2. tez-am docker image needs hive-exec jar i.e resorce localization. For now I added it using the plugin directory way i.e. -v "/Users/raghav/Desktop/plugin:/opt/tez/plugins" \ and plugin directory contains only hive-exec jar otherwise .
Vertex vertex_1769280834537_0000_1_01 [Reducer 2] killed/failed due to: INIT_FAILURE] ....
vertex=vertex_1769280834537_0000_1_00 [Map 1], org.apache.tez.dag.api .TezUncheckedException: java. lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.ha
doop.hive.ql.io. CombineHiveInputFormat not found
  1. core-site.xml
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://host.docker.internal:9000</value>
    </property>
  1. hdfs-site.xml Connecting docker AM to datanode was really problematic, took lot of time 😅
    <property>
      <name>dfs.datanode.use.datanode.hostname</name>
      <value>true</value>
    </property>

    <property>
      <name>dfs.datanode.address</name>
      <value>host.docker.internal:9866</value>
    </property>

    <property>
      <name>dfs.datanode.hostname</name>
      <value>host.docker.internal</value>
    </property>
  1. In docker tez-am tez-site.xml (Will update the PR for this as well in TEZ-4682)
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
  1. Some --add-opens in tez-entrypoint.sh needs to be updated. Basically hive project --add-opens are also required. Will update the PR for TEZ-4682.
  2. hive-site.xml
  <property>
      <name>hive.server2.use.external.sessions</name>
      <value>true</value>
  </property>

  <property>
    <name>hive.server2.tez.external.sessions.namespace</name>
    <value>/tez-external-sessions</value>
  </property>

  <property>
      <name>hive.server2.tez.external.sessions.registry.class</name>
      <value>org.apache.hadoop.hive.ql.exec.tez.ZookeeperExternalSessionsRegistryClient</value>
  </property>

  <property>
    <name>hive.zookeeper.quorum</name>
    <value>localhost:2181</value>
  </property>

NOTE/IMPORTANT: There was a flaky behaviour observed with tez-conf.pb in tez-staging directory in hdfs. It was throwing error for No file found. Not sure why it was occuring but it was fixed after some time automatically and not facing it again. Will post here if I face it again.

Screenshot 2026-03-08 at 2 38 12 AM

@Aggarwal-Raghav : this is awesome, thanks for testing! let me share the action items I can think of here to make this and TEZ-4682 happen:

  1. /etc/hosts workaround: external setup steps, I wish we could get rid of them, but maybe it's crucial for the first iteration: I believe hive docker page can iterate about this: https://hive.apache.org/docs/latest/admin/setting-up-hive-with-docker/

  2. tez-plugins folder: that's awesome, properly documented on TEZ-4682 side (if it's not enough, we can fully make it work out of the box in the scope of HIVE-29419

5/6: core-site.xml + hdfs-site.changes: these are must-have items on the hive site, however it would be better if I can have the hdfs setup and these xml configs at the same place, otherwise, the values might seem a bit vulnerable in a sense that no one knows under what circumstances host.docker.internal:9866 and hdfs://host.docker.internal:9000 are valid and working, this is what might be addressed by HIVE-29493

  1. hive-related add-opens to tez config: in general, tez should not contain hive related stuff given that tez doesn't depend on hive, however, in case of add-opens args, I feel it might be beneficial to have them there as long as those don't refer to anything "hive", only java packages

  2. hive-site.xml config: definitely something to be included here in this PR or a related one

@abstractdog
Copy link
Contributor Author

@ayushtkn : I'm kindly asking you to review this huge thing :)
please consider @Aggarwal-Raghav 's manual testing as described here #6343 (comment)
basically Hive was able to submit a DAG to an external AM (given that zookeeper and hdfs was running on that machine), which was running by the image create in the scope of TEZ-4682

@abstractdog abstractdog requested a review from ayushtkn March 9, 2026 13:42
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 9, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants