HIVE-29477: Introduce TezSession interface and a new implementation for unmanaged, external Tez sessions#6343
HIVE-29477: Introduce TezSession interface and a new implementation for unmanaged, external Tez sessions#6343abstractdog wants to merge 5 commits intoapache:masterfrom
Conversation
| */ | ||
| private final Map<String, FunctionInfo> currentFunctionsInUse = new HashMap<>(); | ||
|
|
||
| private static ExternalSessionsRegistry externalSessions = null; |
There was a problem hiding this comment.
can be made as a local variable in start() method
There was a problem hiding this comment.
doesn't even need a reference on this registry here, removing it altogether
| private final Object lock = new Object(); | ||
| private final int maxAttempts; | ||
|
|
||
| private PathChildrenCache cache; |
There was a problem hiding this comment.
PathChildrenCache is deprecated. Can you please check if we can use CuratorCache API's
|
@Aggarwal-Raghav : thanks for your comments so far: can you please check if this patch can work together with your current testing scenario regarding TEZ-4682, I'm specifically interested whether TEZ-4686 goes away if hive is configured accordingly: |
|
Hi @abstractdog , i tested this patch on my local setup (hadoop, hive, TEZ-4682 branch and Zookeeper - installed via brew) and its working🚀. Ran a basic insert command Attaching the HS2 logs and tez_am docker logs tez-am-working.log Few things to note:
NOTE/IMPORTANT: There was a flaky behaviour observed with
|
@Aggarwal-Raghav : this is awesome, thanks for testing! let me share the action items I can think of here to make this and TEZ-4682 happen:
5/6: core-site.xml + hdfs-site.changes: these are must-have items on the hive site, however it would be better if I can have the hdfs setup and these xml configs at the same place, otherwise, the values might seem a bit vulnerable in a sense that no one knows under what circumstances
|
|
@ayushtkn : I'm kindly asking you to review this huge thing :) |
…or unmanaged, external Tez sessions
|





What changes were proposed in this pull request?
HiveConf: add options for external sessions discovered from ZookeeperTezSession: new interface for subclasses likeTezSessionState,TezExternalSessionStateTezExternalSessionState: the implementation of the new, unmanaged Tez session, whereopenInternalsimply acquired an ApplicationId instead of submitting a Yarn applicationExternalSessionsRegistry: interface for a Tez session registry that can be implemented for specific frameworks and environments, currentlyZookeeperExternalSessionsRegistryClient- this interface complies with a simple contract to get + return a Tez session for HiveAbstractTriggerValidator: top-level class instead ofTezSessionPoolSession.AbstractTriggerValidatorTezSessionState->TezSession: it's changed at many places in the code, because most of the Hive code has nothing to do with the underlying implementation, so with the fact whether those are managed or unmanaged sessionsWhy are the changes needed?
Make able HiveServer2 to submit DAGs to external sessions (not managed by HiveServer2 and Yarn).
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Unit tests added, e2e tests performed with Tez AM docker image from currently in-progress TEZ-4682.