Skip to content

Comments

Fix Lance writer to emit Arrow FixedSizeList for array columns to enable native vector search#2707

Open
leekeiabstraction wants to merge 2 commits intoapache:mainfrom
leekeiabstraction:lance-writer-emit-arrow-fixedSizeList
Open

Fix Lance writer to emit Arrow FixedSizeList for array columns to enable native vector search#2707
leekeiabstraction wants to merge 2 commits intoapache:mainfrom
leekeiabstraction:lance-writer-emit-arrow-fixedSizeList

Conversation

@leekeiabstraction
Copy link
Contributor

Purpose

Linked issue: close #2706

Fix Lance writer to emit Arrow FixedSizeList for array columns to enable native vector search.

Querying with pylance native vector search fails because pylance expects embedding column to be of fixedSizeList type instead of variable size list. This PR fixes the conversion from Fluss' arrow data to Lance's arrow data by using FixedSizeList if <column>.arrow.fixed-size-list.size is defined, similar to how Spark SQL does it ref: https://lance.org/integrations/spark/operations/ddl/create-table/#creating-large-string-columns

Tests

Added unit test

  • Tested manually that pylance successfully performed a vector search on the tiered data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lance writer should emit Arrow FixedSizeList for array columns to enable native vector search

1 participant