perf(physical-plan): optimize byte view append#21586
perf(physical-plan): optimize byte view append#21586kumarUjjawal wants to merge 2 commits intoapache:mainfrom
Conversation
| } | ||
|
|
||
| let start_idx = self.views.len(); | ||
| self.views.extend(rows.iter().map(|&row| source_views[row])); |
There was a problem hiding this comment.
This could be moved at line 202 and this will re-use the iteration over rows.
| let start_idx = self.views.len(); | ||
| self.views.extend(rows.iter().map(|&row| source_views[row])); | ||
|
|
||
| let mut pending = Vec::with_capacity(rows.len()); |
There was a problem hiding this comment.
This most probably will over-allocate because only the rows with more than 12 bytes will be appended. Maybe use half of the rows length as initial capacity and let it grow if needed ?!
| rows: &[usize], | ||
| ) { | ||
| let source_views = array.views(); | ||
| let mut pending = Vec::with_capacity(rows.len()); |
There was a problem hiding this comment.
Same - consider using a smaller initial capacity.
There was a problem hiding this comment.
I think half or rows.len() would be good as you said.
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing feat/vectorized_append (bd0a959) to 98d280f (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing feat/vectorized_append (bd0a959) to 98d280f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing feat/vectorized_append (bd0a959) to 98d280f (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
ByteViewGroupValueBuilder::vectorized_appenddid extraper-value work. It rebuilt views and copied long values one by one even when the input already had reusable byte views.What changes are included in this PR?
vectorized_appendextendon the no-null fast pathtake_nafter vectorized appendAre these changes tested?
Yes
Are there any user-facing changes?
No