Fix LlamaIndexEmbeddingOperator returning None vectors for all chunks by bujjibabukatta · Pull Request #68424 · apache/airflow

bujjibabukatta · 2026-06-12T03:53:22Z

Problem

LlamaIndexEmbeddingOperator was returning vector: None for every chunk in its output, making the results unusable for downstream vector storage tasks.

Root cause: VectorStoreIndex._get_node_with_embedding() in llama-index-core calls node.copy() internally before attaching embedding vectors. This means embeddings are only stored on the internal copies, The original node objects in the nodes list retain embedding=None.

Minimal reproduction:

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.embeddings.mock_embed_model import MockEmbedding

docs = [Document(text="hello world")]
nodes = SentenceSplitter(chunk_size=512).get_nodes_from_documents(docs)
index = VectorStoreIndex(nodes, embed_model=MockEmbedding(embed_dim=8))

print(nodes[0].embedding)  # None  ← bug
print(index.vector_store.data.embedding_dict)  # {node_id: [...]}  ← vector is here, not on the node

Fix

Pre-embed the nodes using embed_model.get_text_embedding_batch() before building the index and assign the results directly to the original node objects. Since VectorStoreIndex skips re-embedding nodes that already carry a vector, this avoids redundant API calls while ensuring node.embedding is correctly set on the objects we read from later.

Changes

providers/common/ai/.../operators/llamaindex_embedding.py - added pre-embedding step before VectorStoreIndex construction
providers/common/ai/tests/.../test_llamaindex_embedding.py - updated existing tests to mock get_text_embedding_batch, added regression test

…turning None vectors VectorStoreIndex._get_node_with_embedding() calls node.copy() internally before attaching embeddings, so reading node.embedding from the original node list after index construction always returned None. Fix by calling embed_model.get_text_embedding_batch() before building the index and assigning the results directly to the original node objects. VectorStoreIndex then skips re-embedding nodes that already carry a vector. Closes apache#68416

bujjibabukatta requested review from gopidesupavan and kaxil as code owners June 12, 2026 03:53

boring-cyborg Bot added area:providers provider:common-ai labels Jun 12, 2026

kaxil closed this Jun 12, 2026

kaxil added the AI Spam label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LlamaIndexEmbeddingOperator returning None vectors for all chunks#68424

Fix LlamaIndexEmbeddingOperator returning None vectors for all chunks#68424
bujjibabukatta wants to merge 1 commit into
apache:mainfrom
bujjibabukatta:fix/llamaindex-embedding-vector-none-68416

bujjibabukatta commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bujjibabukatta commented Jun 12, 2026

Problem

Fix

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants