Skip to content

perf(fts): push prefilter through scalar index on flat FTS#7283

Open
LuQQiu wants to merge 4 commits into
lance-format:mainfrom
LuQQiu:lu/fts_filtered_read
Open

perf(fts): push prefilter through scalar index on flat FTS#7283
LuQQiu wants to merge 4 commits into
lance-format:mainfrom
LuQQiu:lu/fts_filtered_read

Conversation

@LuQQiu

@LuQQiu LuQQiu commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Flat FTS (no inverted index on the text column) used scan_fragments + a manual LanceFilterExec to apply the prefilter, bypassing any scalar index on the filter column.
  • Route the scan through filtered_read instead — same pattern as the brute-force KNN path at scanner.rs:3839-3848 — so the pushable part of the prefilter is evaluated inside FilteredReadExec (using a scalar index when one exists) and only the unpushable refine_expr is reapplied on top.
  • Requires prefilter(true) on the scanner. The postfilter branch passes an empty ExprFilterPlan::default() down the FTS path and is unaffected.

Plan shape

Before (with WHERE id = 1 and a BTree on id, no FTS index on text):

FlatMatchQueryExec
  └── LanceFilterExec(id = 1)            # post-scan filter, BTree unused
        └── LanceScan(columns=[text, id], with_row_id)

After:

FlatMatchQueryExec
  └── FilteredReadExec                   # full_filter=id = Int32(1), BTree used
        columns=[text], with_row_id
        # refine_expr (if any) reapplied as LanceFilterExec on top

Test plan

  • New test_fts_without_index_uses_scalar_index_for_prefilter in dataset_index.rs: BTree on id, no FTS index, flat FTS + id = 1 prefilter with .prefilter(true). Asserts via analyze_plan that LanceRead shows full_filter=id = Int32(1), no LanceScan: in the plan, and the result set is correct (2 rows).
  • Full FTS test suite passes (123 tests).
  • cargo fmt --all
  • cargo clippy --all --tests --benches -- -D warnings

Flat FTS (no inverted index on the text column) used `scan_fragments` +
a manual `LanceFilterExec` to apply the prefilter, bypassing any scalar
index on the filter column. Route the scan through `filtered_read`
instead, matching the brute-force KNN path, so the pushable part of the
prefilter is evaluated inside `FilteredReadExec` (using a scalar index
when one exists) and only the unpushable `refine_expr` is reapplied on
top.

Requires `prefilter(true)` on the scanner — the postfilter branch still
sends an empty filter plan down and is unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@LuQQiu

LuQQiu commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Duplicate; reopening against lancedb/lance via API.

@LuQQiu LuQQiu closed this Jun 15, 2026
@LuQQiu LuQQiu reopened this Jun 15, 2026
LuQQiu and others added 2 commits June 15, 2026 20:34
The flat-FTS scan now goes through `FilteredReadExec`, so the golden
plans in `test_plans` need to reflect both shapes:

- No prefilter: legacy emits `LanceScan` with `ordered=true` (vs. the
  old hardcoded `ordered=false`); v2 emits `LanceRead` with empty
  filters. Functionally equivalent — output still feeds an outer
  SortExec by score.
- With prefilter on an indexed column: the BTree now pushes into the
  unindexed-fragment scan. Legacy uses the `MaterializeIndex` shape, v2
  uses `LanceRead` with `full_filter` set — same pushdown the indexed
  `MatchQuery` side already had.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 70.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 70.00% 2 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

…redRead

The flat-FTS path now reads through `FilteredReadExec` (`LanceRead`), so
the unindexed-fragment branch shows `LanceRead` instead of `LanceScan`
and the BTree on `id` pushes into the flat scan too. Update the
assertion to reflect the new pushdown.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot added the A-python Python bindings label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant