GH-49410: [C++] Fix if_else null-scalar fast paths for sliced BaseBinary arrays#49443
GH-49410: [C++] Fix if_else null-scalar fast paths for sliced BaseBinary arrays#49443Ebraam-Ashraf wants to merge 4 commits intoapache:mainfrom
Conversation
|
|
|
hi @kou |
| {ArrayFromJSON(int64(), "[-1]"), ArrayFromJSON(int32(), "[0]")})); | ||
| } | ||
|
|
||
| TEST_F(TestIfElseKernel, IfElseBaseBinarySlicedChunk) { |
There was a problem hiding this comment.
Could you move this TestIfEleseKernel test to the location where other TestIfEleseKernel tests exist instead of adding it after the TestChooseKernel test?
Can we use better name than IfElseBaseBinarySlicedChunk like other existing test names?
| } | ||
|
|
||
| TEST_F(TestIfElseKernel, IfElseBaseBinarySlicedChunk) { | ||
| for (auto type : {utf8(), binary(), large_utf8(), large_binary()}) { |
There was a problem hiding this comment.
Do we need to test all these types?
If so, TYPED_TEST(TestIfElseBaseBinary, ...) may be better than TEST_F(TestifEleseKernel, ...).
| auto cond_asa = ArrayFromJSON(boolean(), "[true, false, false]"); | ||
| ASSERT_OK_AND_ASSIGN(auto result_asa, | ||
| CallFunction("if_else", {cond_asa, MakeNullScalar(type), chunk1})); | ||
| ASSERT_OK(result_asa.make_array()->ValidateFull()); | ||
| AssertArraysEqual(*ArrayFromJSON(type, R"([null, "x", "x"])"), | ||
| *result_asa.make_array(), true); | ||
|
|
||
| auto cond_aas = ArrayFromJSON(boolean(), "[false, true, true]"); | ||
| ASSERT_OK_AND_ASSIGN(auto result_aas, | ||
| CallFunction("if_else", {cond_aas, chunk1, MakeNullScalar(type)})); | ||
| ASSERT_OK(result_aas.make_array()->ValidateFull()); | ||
| AssertArraysEqual(*ArrayFromJSON(type, R"([null, "x", "x"])"), | ||
| *result_aas.make_array(), true); | ||
|
|
||
| auto arr1 = std::make_shared<ChunkedArray>(ArrayVector{chunk0, chunk1}); | ||
| auto mask = *CallFunction("is_null", {arr1}); | ||
| ASSERT_OK_AND_ASSIGN(auto arr2_datum, | ||
| CallFunction("if_else", {Datum(true), *Concatenate(arr1->chunks()), arr1})); | ||
| ASSERT_OK(arr2_datum.chunked_array()->ValidateFull()); | ||
| ASSERT_OK_AND_ASSIGN(auto arr3_datum, | ||
| CallFunction("if_else", {mask, MakeNullScalar(type), arr2_datum})); | ||
| ASSERT_OK(arr3_datum.chunked_array()->ValidateFull()); | ||
| AssertDatumsEqual(Datum(arr1), arr3_datum); |
There was a problem hiding this comment.
We can reproduce this problem without chunked array, right?
If so, we don't need to test chunked array.
|
thanks for the feedback @kou I switched from TEST_F(TestIfElseKernel, ...) to TYPED_TEST(TestIfElseBaseBinary, IfElseBaseBinarySliced) |
kou
left a comment
There was a problem hiding this comment.
Thanks.
Could you push a fix for this problem to this branch?
| auto full_arr = ArrayFromJSON(type, R"([null, "x", "x", null, "x", "x"])"); | ||
| auto sliced = full_arr->Slice(3); |
There was a problem hiding this comment.
Can we simplify test data?
| auto full_arr = ArrayFromJSON(type, R"([null, "x", "x", null, "x", "x"])"); | |
| auto sliced = full_arr->Slice(3); | |
| auto full_arr = ArrayFromJSON(type, R"(["not used", null, "x", "x"])"); | |
| auto sliced = full_arr->Slice(1); |
|
Tests before fix What the fix does Replaced the single memcpy for the offsets buffer with a loop that subtracts offsets[0] as a base from each value Then changed the data copy to start from data + base instead of data The same two changes are applied symmetrically in both the ASA and AAS paths. Tests after fix |
|
Could you fix the lint failure and update the PR title and description? |
|
thanks @kou |
Rationale for this change
if_elsewith a null scalar and a slicedBaseBinaryarray (offset != 0) produces invalid output. The ASA and AAS shortcut paths inscalar_if_else.cccopy offsets without adjusting for the slice offset, and copy data from byte 0 instead ofdata + offsets[0].What changes are included in this PR?
Fix (
scalar_if_else.cc): In both the ASA and AAS fast paths:memcpyfor the offsets buffer with a loop that normalizes each offset by subtractingoffsets[0]as a base, so output offsets always start at 0.data + baseinstead ofdata, so only the bytes belonging to the slice are copied.Regression test (
scalar_if_else_test.cc): AddedTYPED_TEST(TestIfElseBaseBinary, IfElseBaseBinarySliced)coveringutf8,binary,large_utf8, andlarge_binarytypes, testing both the ASA path and AAS path with a sliced array.Are these changes tested?
Yes. The new
IfElseBaseBinarySlicedtyped test reproduces the bug and passes after the fix. All 416 existing tests continue to pass.This PR contains a "Critical Fix". The bug causes incorrect/invalid data to be produced when
if_elseis called with a null scalar and a slicedBaseBinaryarray.