Skip to content

fix: avoid versioned describe table for namespace opens#7250

Open
FANNG1 wants to merge 2 commits into
lance-format:mainfrom
FANNG1:fix/namespace-describe-with-version
Open

fix: avoid versioned describe table for namespace opens#7250
FANNG1 wants to merge 2 commits into
lance-format:mainfrom
FANNG1:fix/namespace-describe-with-version

Conversation

@FANNG1

@FANNG1 FANNG1 commented Jun 12, 2026

Copy link
Copy Markdown

What

Avoid passing the requested dataset version to namespace describe_table / describeTable when opening a dataset through a namespace client.

The namespace describe call is only used to resolve the table location and storage options. The requested dataset version is still applied by the lower-level dataset open path after namespace resolution.

Why

Some namespace implementations do not support describing a table from a specific version. Passing version to describe_table can fail even though opening the dataset at that version would work once the location and storage options are resolved.

Fixes lance-format/lance-spark#609.

Testing

  • JAVA_HOME=/opt/homebrew/Cellar/openjdk@17/17.0.18/libexec/openjdk.jdk/Contents/Home mvn test -Dtest=DirectoryNamespaceTest#testOpenSpecificVersionDoesNotPassVersionToDescribeTable from java/
  • UV_HTTP_TIMEOUT=120 uv run pytest python/tests/test_namespace_dir.py::test_dataset_namespace_open_does_not_pass_version_to_describe_table from python/
  • JAVA_HOME=/opt/homebrew/Cellar/openjdk@17/17.0.18/libexec/openjdk.jdk/Contents/Home ./mvnw spotless:check from java/
  • UV_HTTP_TIMEOUT=120 uv run ruff format --check --diff python from python/
  • UV_HTTP_TIMEOUT=120 uv run ruff check python from python/
  • git diff --check
  • UV_HTTP_TIMEOUT=120 uv run make lint from python/ (fails in pyright because local optional imports tensorflow and torch are not installed; ruff format --check and ruff check passed)

@github-actions github-actions Bot added A-python Python bindings A-java Java bindings + JNI bug Something isn't working labels Jun 12, 2026

@yanghua yanghua left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test for these change?

Comment on lines 137 to 139
version : optional, int | str
If specified, load a specific version of the Lance dataset. Else, loads the
latest version. A version number (`int`) or a tag (`str`) can be provided.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also update here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public behavior of version does not change here: it still opens the requested dataset version.

This change only adjusts the internal namespace resolution flow. We now use describe_table to resolve table metadata/location/storage options, and apply version later in the lower-level dataset open path. I'm not sure this internal detail should be exposed in the public parameter docs.

Do you think the current version doc is misleading for users, or were you suggesting documenting this namespace-specific implementation detail?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and apply version later in the lower-level dataset open path

Can you share more details about this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently some namespace implementations can support describing a specific table version in describe_table, but not all implementations do. For example, REST-backed namespace implementations may only support resolving the current table metadata/location/storage options from describe_table. This is the background of lance-format/lance-spark#609.

This PR avoids requiring every namespace implementation to support versioned describe_table for the dataset-open path. We only use describe_table to resolve the table location and storage options. The requested dataset version is still passed to the actual dataset open path afterward.

So this keeps compatibility with namespace implementations that support only latest-table describe, while preserving the user-facing behavior of version: the opened dataset is still the requested version.

@FANNG1

FANNG1 commented Jun 15, 2026

Copy link
Copy Markdown
Author

Added Java and Python regression tests that verify namespace describe_table / describeTable does not receive the requested dataset version, while the actual dataset open still returns the requested version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-java Java bindings + JNI A-python Python bindings bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lance Spark writes fail with REST namespaces that don't support describing a specific table version

2 participants