Skip to content

feat[expr]: N-ary CASE WHEN expression#6197

Open
lukekim wants to merge 11 commits intovortex-data:developfrom
spiceai:develop
Open

feat[expr]: N-ary CASE WHEN expression#6197
lukekim wants to merge 11 commits intovortex-data:developfrom
spiceai:develop

Conversation

@lukekim
Copy link

@lukekim lukekim commented Jan 29, 2026

Adds support for the "CASE WHEN" SQL expression to the Vortex expression system, including its conversion from DataFusion, benchmarking, and pushdown logic. The main focus is on enabling CASE WHEN expressions to be parsed, converted, and benchmarked, while ensuring only supported forms are handled.

Support for CASE WHEN expressions:

  • Added a new module case_when to vortex-array's expression system and re-exported its functions, enabling construction and evaluation of CASE WHEN and nested CASE WHEN expressions. (vortex-array/src/expr/exprs/mod.rs) [1] [2]
  • Registered the new CaseWhen expression in the ExprSession so it can be used in expression evaluation. (vortex-array/src/expr/session.rs) [1] [2]

DataFusion integration and conversion:

  • Implemented conversion from DataFusion's CaseExpr to Vortex's nested case_when expressions, with validation to only support the "searched CASE" form (not "simple CASE"). (vortex-datafusion/src/convert/exprs.rs) [1] [2] [3]
  • Updated the pushdown logic to recognize and validate CASE WHEN expressions, including recursive checks for convertible sub-expressions and else clauses. (vortex-datafusion/src/convert/exprs.rs) [1] [2]

Benchmarks and protocol updates:

  • Added a new benchmark suite for CASE WHEN expressions, covering simple, nested, all-true, and all-false scenarios with varying array sizes. (vortex-array/benches/expr/case_when_bench.rs, vortex-array/Cargo.toml) [1] [2]
  • Extended the protocol buffer definitions to include options for CASE WHEN expressions, specifying the number of when/then pairs and presence of an else clause. (vortex-proto/proto/expr.proto)

Bench:

Timer precision: 16 ns
expr_case_when                    fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ case_when_all_false                          │               │               │               │         │
│  ├─ 1000                        3.183 µs      │ 1.572 ms      │ 3.295 µs      │ 19.01 µs      │ 100     │ 100
│  ├─ 10000                       4.047 µs      │ 5.311 µs      │ 4.175 µs      │ 4.192 µs      │ 100     │ 100
│  ╰─ 100000                      12.36 µs      │ 17.1 µs       │ 12.52 µs      │ 12.6 µs       │ 100     │ 100
├─ case_when_all_true                           │               │               │               │         │
│  ├─ 1000                        3.167 µs      │ 4.047 µs      │ 3.311 µs      │ 3.324 µs      │ 100     │ 100
│  ├─ 10000                       4.095 µs      │ 7.407 µs      │ 4.191 µs      │ 4.234 µs      │ 100     │ 100
│  ╰─ 100000                      12.36 µs      │ 14.43 µs      │ 12.49 µs      │ 12.52 µs      │ 100     │ 100
├─ case_when_nested_3_conditions                │               │               │               │         │
│  ├─ 1000                        13.53 µs      │ 159.4 µs      │ 13.75 µs      │ 15.28 µs      │ 100     │ 100
│  ├─ 10000                       18.41 µs      │ 21.19 µs      │ 18.75 µs      │ 18.78 µs      │ 100     │ 100
│  ╰─ 100000                      203.6 µs      │ 424.2 µs      │ 236.5 µs      │ 252.2 µs      │ 100     │ 100
╰─ case_when_simple                             │               │               │               │         │
   ├─ 1000                        4.591 µs      │ 6.991 µs      │ 4.735 µs      │ 4.764 µs      │ 100     │ 100
   ├─ 10000                       6.415 µs      │ 9.471 µs      │ 6.527 µs      │ 6.567 µs      │ 100     │ 100
   ╰─ 100000                      147.5 µs      │ 184.5 µs      │ 153.3 µs      │ 153.5 µs      │ 100     │ 100

…13)

* feat: implement binary CASE WHEN expression with support for nested conditions
@AdamGS AdamGS added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jan 29, 2026
@joseph-isaacs joseph-isaacs added action/benchmark Trigger full benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Jan 30, 2026
@joseph-isaacs
Copy link
Contributor

joseph-isaacs commented Jan 30, 2026

Sorry we just merged a break: #6081. We will be updating the PR very soon with a migration

Copy link
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little one otherwise looks good.

Thanks for this

@joseph-isaacs
Copy link
Contributor

Looks like the merge was a little off

@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Feb 5, 2026
@joseph-isaacs joseph-isaacs changed the title feat: Binary CASE WHEN expression with support for nested conditions … feat[expr]: Binary CASE WHEN expression with support for nested conditions … Feb 5, 2026
@joseph-isaacs joseph-isaacs changed the title feat[expr]: Binary CASE WHEN expression with support for nested conditions … feat[expr]: Binary CASE WHEN expression Feb 5, 2026
* N-ary

* fix: update threshold and value literals in n-ary CASE WHEN benchmark

* refactor: streamline context creation in benchmarks and improve test readability
Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Copy link
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this. 2 tiny changes and we can go

@joseph-isaacs joseph-isaacs changed the title feat[expr]: Binary CASE WHEN expression feat[expr]: N-ary CASE WHEN expression Feb 5, 2026
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 5, 2026

Merging this PR will improve performance by 33.5%

⚡ 1 improved benchmark
✅ 1137 untouched benchmarks
🆕 18 new benchmarks
⏩ 1265 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_into_canonical[(1000, 10)] 87.2 µs 65.3 µs +33.5%
🆕 Simulation case_when_all_false[100000] N/A 773.6 µs N/A
🆕 Simulation case_when_nary_100_conditions[1000] N/A 4.5 ms N/A
🆕 Simulation case_when_all_true[100000] N/A 773.9 µs N/A
🆕 Simulation case_when_all_true[1000] N/A 115.3 µs N/A
🆕 Simulation case_when_nary_100_conditions[100000] N/A 72.5 ms N/A
🆕 Simulation case_when_all_true[10000] N/A 175.5 µs N/A
🆕 Simulation case_when_all_false[1000] N/A 114.9 µs N/A
🆕 Simulation case_when_nary_10_conditions[10000] N/A 1.2 ms N/A
🆕 Simulation case_when_nary_3_conditions[100000] N/A 2.9 ms N/A
🆕 Simulation case_when_all_false[10000] N/A 176.4 µs N/A
🆕 Simulation case_when_nary_3_conditions[1000] N/A 229.9 µs N/A
🆕 Simulation case_when_nary_10_conditions[100000] N/A 7.9 ms N/A
🆕 Simulation case_when_nary_100_conditions[10000] N/A 10.9 ms N/A
🆕 Simulation case_when_nary_10_conditions[1000] N/A 541.6 µs N/A
🆕 Simulation case_when_nary_3_conditions[10000] N/A 483.1 µs N/A
🆕 Simulation case_when_simple[1000] N/A 138.8 µs N/A
🆕 Simulation case_when_simple[100000] N/A 1.5 ms N/A
🆕 Simulation case_when_simple[10000] N/A 266.6 µs N/A

Comparing spiceai:develop (949a446) with develop (8cb79d9)

Open in CodSpeed

Footnotes

  1. 1265 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment on lines +149 to +154
if options.num_when_then_pairs == 0 {
vortex_bail!("CaseWhen must have at least one WHEN/THEN pair");
}

// The return dtype is based on the first THEN expression (index 1)
let then_dtype = &arg_dtypes[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong.

I would think the dtype is the union of the nullability or then conditions with the nullability of else (or nulllable if there is not else).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action/benchmark Trigger full benchmarks to run on this PR changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants