Add routing support for OpenSearchDocumentStore #2624
+150
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues
Proposed Changes
This PR adds support for OpenSearch routing functionality to the
OpenSearchDocumentStore. Routing allows documents to be distributed across specific shards, which improves query performance when searching for documents belonging to specific tenants or groups.Changes implemented:
1. Write operations with routing:
_routingin their metadata during write operations_routingfield is automatically extracted from document metadata and passed as a routing parameter to OpenSearch_routingfield is removed from the stored document metadata to comply with OpenSearch requirements (routing must be a request parameter, not a document field)2. Delete operations with routing:
routingparameter todelete_documents()anddelete_documents_async()methodsrouting: Optional[dict[str, str]]3. Implementation details:
_prepare_bulk_write_request()to extract_routingfrom document metadata and add it to the bulk action_prepare_bulk_delete_request()to accept and apply routing parametersHow did you test it?
Unit tests (mocked):
test_routing_extracted_from_metadata(): Verifies routing is correctly extracted from document metadata and added to bulk actionstest_routing_in_delete(): Verifies routing parameters are correctly applied to delete operationsIntegration tests:
TestDocumentStore. test_write_with_routing(): Tests writing documents with routing metadata to a live OpenSearch instanceTestDocumentStore.test_delete_with_routing(): Tests deleting documents with routing from a live OpenSearch instanceAll tests verify that:
_routingis removed from document metadata before storage_routingis correctly passed as an action-level parameter to OpenSearchNotes for the reviewer
_prepare_bulk_write_request()(lines 395-420) where_routingis extracted fromdoc_dictat the top level (Haystack'sDocument.to_dict()flattens metadata fields)_prepare_bulk_delete_request()method now accepts an optionalroutingparameter with signature:routing: Optional[dict[str, str]] = Nonewrite_documents()andwrite_documents_async()delete_documents()anddelete_documents_async().get("meta", {})to handle cases where themetadict may not exist after routing extractionChecklist
I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: feat: Add routing support for OpenSearchDocumentStore