feat: pluggable embedding provider registry by dcfocus · Pull Request #87 · lance-format/lance-context

dcfocus · 2026-06-12T15:49:03Z

Summary

Adds an EmbeddingProvider protocol and registry so Context auto-embeds text at write time and auto-embeds string queries at search time — eliminating the per-caller embedding glue described in Proposal: pluggable embedding provider registry (OpenAI / Jina / local) for auto-embedding on ingest #85
Ships two built-in providers (OpenAIProvider, SentenceTransformersProvider) behind optional extras; accepts any object satisfying the protocol for custom backends
Manual embedding= always takes precedence; existing callers are unaffected

API

# built-in provider via dict shorthand
ctx = Context.create(
    "context.lance",
    embedding={"provider": "openai", "model": "text-embedding-3-small"},
)

# or pass a provider instance (including custom)
from lance_context import EmbeddingProvider

class MyProvider:
    @property
    def dims(self) -> int: return 768
    def embed_texts(self, texts: list[str]) -> list[list[float]]: ...

ctx = Context.create("context.lance", embedding_provider=MyProvider())

ctx.add("user", "Where should I travel in spring?")  # auto-embedded
results = ctx.search("spring travel")                # query auto-embedded

Changes

File	What
`lance_context/embeddings.py`	`EmbeddingProvider` protocol, `OpenAIProvider`, `SentenceTransformersProvider`, `_build_provider` registry
`lance_context/api.py`	`embedding_provider` kwarg on `__init__`/`create`; auto-embed in `add`, `upsert`, `add_many`, `search`; provider propagated through `fork`
`lance_context/__init__.py`	exports `EmbeddingProvider`
`pyproject.toml`	`[openai]` and `[sentence-transformers]` optional extras
`tests/test_embeddings.py`	15 new tests (stub provider, no external deps)

Notes

retrieve() is intentionally not wired: its text arg feeds BM25 (not the vector index), so a separate vector= kwarg is already the right surface if hybrid retrieval with auto-embedding is wanted in a follow-up
add_many() sends all un-embedded text records in one provider call — batching is preserved end-to-end

Test plan

python/tests/test_embeddings.py — 15 unit tests, zero external deps
Existing test_search.py, test_add_many.py, test_delete.py, test_async.py — all pass (85 total across these suites)

Closes #85

🤖 Generated with Claude Code

Add an EmbeddingProvider protocol and registry so Context auto-embeds text at write time and string queries at search time, eliminating the need for each caller to maintain their own embedding pipeline. Built-in providers for OpenAI and sentence-transformers ship as optional extras (lance-context[openai] / lance-context[sentence-transformers]); the registry accepts any object satisfying the EmbeddingProvider protocol for custom backends. - EmbeddingProvider: runtime-checkable Protocol (dims, embed_texts) - Context.create/AsyncContext.create: new embedding_provider kwarg (instance or {"provider": "openai", "model": ...} dict) - add() / upsert(): auto-embed text payloads when no manual embedding given; manual embedding= always takes precedence - add_many(): batch-embeds all uneembedded text records in one call - search(): accepts a plain string query and auto-embeds it via the provider; existing vector queries are unaffected - fork() propagates the provider to the child context - EmbeddingProvider exported from the top-level package Closes #85 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dcfocus and others added 2 commits June 12, 2026 15:48

style: format embedding provider Python files

175ee4e

beinan approved these changes Jun 13, 2026

View reviewed changes

dcfocus merged commit 10b3c32 into main Jun 13, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable embedding provider registry#87

feat: pluggable embedding provider registry#87
dcfocus merged 2 commits into
mainfrom
embedding-provider-registry

dcfocus commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dcfocus commented Jun 12, 2026

Summary

API

Changes

Notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants