Skip to content

Conversation

@boneanxs
Copy link

Rationale for this change

WriteArrowSerialize could unconditionally read values from the Arrow array even for null rows. Since it's possible the caller could provided a zero-sized dummy buffer for all-null arrays, this caused an ASAN heap-buffer-overflow.

What changes are included in this PR?

Early check the array is not all null values before serialize it

Are these changes tested?

Added tests.

Are there any user-facing changes?

No

@boneanxs boneanxs requested a review from wgtmac as a code owner December 30, 2025 04:25
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@boneanxs
Copy link
Author

boneanxs commented Jan 5, 2026

@wgtmac Hi, could you please help review this, thanks!

// Set all bits to 0 (null)
::arrow::bit_util::SetBitsTo(null_bitmap->mutable_data(), 0, 100, false);

std::shared_ptr<::arrow::Buffer> data_buffer = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I was wrong. I think the Arrow spec is vague on whether the value buffer can be null if all values are null. It also escapes the Array::Validate check as in

if (buffer == nullptr) {
continue;
}
.

If this violates the spec, is it better to fix Array::Validate() and calls it before calling functor.Serialize()?

Copy link
Member

@wgtmac wgtmac Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pitrou as this is somehow related to #48560 though we don't have a fuzz writer yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow spec is vague on whether the value buffer can be null

Yes, that also confuses me, I think we don't support null value buffer but accept empty value buffer if it's all nulls in the batch? Seems we're avoiding null value buffers: https://github.com/apache/arrow/pull/2243/changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it's deliberate that a null buffer pointer is accepted there. I would rather not have this but it could break compatibility with existing usage.

In any case, feel free to open a separate issue about it.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 7, 2026
SerializeFunctor<ParquetType, ArrowType> functor;
RETURN_NOT_OK(functor.Serialize(checked_cast<const ArrayType&>(array), ctx, buffer));
// The value buffer could be empty if all values are nulls.
if (array.null_count() != array.length()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of an invalid arrow array, value buffer can still be nullptr when array.null_count() == array.length() which crashes the following call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't care about invalid Arrow arrays here, do we?

Comment on lines +2426 to +2428
ASSERT_OK_AND_ASSIGN(null_bitmap, ::arrow::AllocateBitmap(100));
// Set all bits to 0 (null)
::arrow::bit_util::SetBitsTo(null_bitmap->mutable_data(), 0, 100, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use AllocateEmptyBitmap which will zero-initialize the bitmap.

// Set all bits to 0 (null)
::arrow::bit_util::SetBitsTo(null_bitmap->mutable_data(), 0, 100, false);

std::shared_ptr<::arrow::Buffer> data_buffer = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it's deliberate that a null buffer pointer is accepted there. I would rather not have this but it could break compatibility with existing usage.

In any case, feel free to open a separate issue about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants