-
Notifications
You must be signed in to change notification settings - Fork 202
feat: expose refresh parameter in ElasticsearchDocumentStore #2622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose refresh parameter in ElasticsearchDocumentStore #2622
Conversation
Add configurable refresh parameter to write, delete, and update methods in ElasticsearchDocumentStore. This allows users to control when index changes become visible to search operations, enabling read-your-writes consistency without relying on time.sleep() workarounds. Methods updated: - write_documents / write_documents_async - delete_documents / delete_documents_async - delete_all_documents / delete_all_documents_async - delete_by_filter / delete_by_filter_async - update_by_filter / update_by_filter_async The refresh parameter accepts: - True: Force immediate refresh - False: No refresh (best for bulk performance) - 'wait_for': Wait for next refresh cycle (default) Closes deepset-ai#2065
anakin87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments.
Asking for a review from @mpangrazzi as well.
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Show resolved
Hide resolved
…_query methods (per ES docs and review)
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
…o 'wait_for' per review
...ions/elasticsearch/src/haystack_integrations/document_stores/elasticsearch/document_store.py
Outdated
Show resolved
Hide resolved
…ault to 'wait_for' per review
|
@mpangrazzi I see that we are now setting What's your opinion on this aspect? |
|
@anakin87 I see what you mean about the breaking change now. Looking at the current code, I noticed that:
So I'm thinking:
But I'm not sure if this is the best approach. @mpangrazzi, would this align with Haystack's philosophy on consistency vs. backward compatibility? Happy to adjust based on your feedback! |
|
@anakin87 @GunaPalanivel For Let's recap the values:
So we're basically using WDYT? |
|
@mpangrazzi The table makes it crystal clear - backward compatibility first, consistency second. I'll wait for @anakin87's thoughts and then update the PR with Thanks for the thorough analysis! |
|
Thanks for taking a look, @mpangrazzi. Agree with your table! |
…r (sync/async) to maintain backward compatibility per reviewer consensus.
|
@mpangrazzi @anakin87 Updated! I've set The implementation now matches your table exactly:
|
mpangrazzi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks! @anakin87 feel free to take another look if you think it's needed
anakin87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Related Issues
Part of #2605
Description
This PR exposes the
refreshparameter to relevant methods inElasticsearchDocumentStore, allowing users to control when index changes become visible to search operations.Motivation
Elasticsearch is a "near real-time" search engine where changes are not immediately visible after write/update/delete operations. Previously, users had to use
time.sleep()workarounds to ensure consistency. This change exposes the underlyingrefreshparameter, giving users explicit control over this behavior.Changes
New
RefreshTypeparameter added to the following methods:write_documents/write_documents_async"wait_for"delete_documents/delete_documents_async"wait_for"delete_all_documents/delete_all_documents_asyncTruedelete_by_filter/delete_by_filter_async"wait_for"update_by_filter/update_by_filter_async"wait_for"Parameter values:
True: Force refresh immediately after the operationFalse: Do not refresh (better performance for bulk operations)"wait_for": Wait for the next refresh cycle (default, ensures read-your-writes consistency)Test Updates
Updated integration tests to use
refresh=Trueinstead oftime.sleep(), making tests more reliable and faster.How was this tested?
refreshparameter