Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Jan 27, 2026

What's Changed

Fix BaseVariableWidthVector/BaseLargeVariableWidthVector IPC serialization when valueCount is 0.

Problem

When valueCount == 0, setReaderAndWriterIndex() was setting offsetBuffer.writerIndex(0), which means readableBytes() == 0. IPC serializer uses readableBytes() to determine buffer size, so 0 bytes were written to the IPC stream. This crashes IPC readers in other libraries because Arrow spec requires offset buffer to have at least one entry [0].

This is a follow-up to #967 which fixed the same issue in ListVector/LargeListVector.

Fix

Simplify setReaderAndWriterIndex() to always use (valueCount + 1) * OFFSET_WIDTH for offset buffer's writerIndex. When valueCount == 0, this correctly sets writerIndex to OFFSET_WIDTH, ensuring offset[0] is included in serialization.

Testing

Added tests for empty VarCharVector and LargeVarCharVector verifying offset buffer has correct readableBytes().

related to #343

@github-actions

This comment has been minimized.

@Yicong-Huang
Copy link
Contributor Author

cc @viirya @jbonofre

@lidavidm lidavidm added the bug-fix PRs that fix a big. label Jan 27, 2026
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the similar fix as for valueBuffer afair. It sounds good to me.

@jbonofre jbonofre added this to the 19.0.0 milestone Jan 27, 2026
@jbonofre
Copy link
Member

@Yicong-Huang sorry to be a pain, can you please rebase ? Thanks !

@Yicong-Huang
Copy link
Contributor Author

Yicong-Huang commented Jan 27, 2026

@Yicong-Huang sorry to be a pain, can you please rebase ? Thanks !

Thanks @jbonofre . I did rebase but I think the root error in this change is BaseVariableWidthVector's offsetBuffer is allocated to be 0 capacity when it is empty. Writing the offset 0 to it would be invalid. It is also not easy to increase the allocation as all tests are written against that. Could you advise on this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PRs that fix a big.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants