HIVE-29458: Iceberg: Sort expressions should not be added for distrib…#6325
HIVE-29458: Iceberg: Sort expressions should not be added for distrib…#6325kokila-19 wants to merge 1 commit intoapache:masterfrom
Conversation
31fa9bf to
3a90fde
Compare
09053d0 to
a67523d
Compare
…ution. Partition transforms are added for clustering (distribution and sorting)
|
|
Sonar reported 3 new issues, can you check? |
| private int maxPartsPerNode; // maximum dynamic partitions created per mapper/reducer | ||
| private Pattern whiteListPattern; | ||
| private boolean hasCustomSortExpr = false; | ||
| private transient List<Function<List<ExprNodeDesc>, ExprNodeDesc>> customPartitionExpressions; |
There was a problem hiding this comment.
DynamicPartitionCtx is Serializable. customPartitionExpressions is transient. A transient field is not serialized, so after deserialization customPartitionExpressions will be null and getCustomPartitionExpressions() will return an empty list. Is it intentional?
|
The new test covers partition transform + Z-order.
|
|
|
||
| public void addCustomPartitionExpressions(List<Function<List<ExprNodeDesc>, ExprNodeDesc>> customPartitionExpressions) { | ||
| if (!org.apache.commons.collections.CollectionUtils.isEmpty(customPartitionExpressions)) { | ||
| this.hasCustomSortExpr = true; |
There was a problem hiding this comment.
The flag name hasCustomSortExpr is misleading because it suggests only sort expressions, but it also covers partition transforms. Should we rename it to something like hasCustomPartitionOrSortExpr to better match its behavior?



…ution. Partition transforms are added for clustering (distribution and sorting)
What changes were proposed in this pull request?
DynamicPartitionCtx: Introduced customPartitionExpressions separate from customSortExpressions.
HiveIcebergStorageHandler: Partition transforms go to customPartitionExpressions; write sort order goes to customSortExpressions.
SortedDynPartitionOptimizer: Partition expressions are used for both distribution and sort; sort expressions only for sort keys. Include both in the Select output. Update allStaticPartitions and nullOrderStr accordingly.
Why are the changes needed?
Previously all custom expressions (partition transforms and sort-only) were used for distribution. Sort-only expressions (e.g. z-order) should not drive distribution, only partition transforms should.
Does this PR introduce any user-facing change?
Yes. Z-order and other sort-only expressions are no longer used for MapReduce partitioning. Only partition transforms drive distribution and clustering.
How was this patch tested?
qtests: iceberg_create_locally_zordered_table.q and iceberg_alter_locally_zordered_table.q.