Skip to content

Conversation

@GunaPalanivel
Copy link
Contributor

@GunaPalanivel GunaPalanivel commented Dec 18, 2025

Related Issues

Part of #2605

Description

This PR exposes the refresh parameter to relevant methods in ElasticsearchDocumentStore, allowing users to control when index changes become visible to search operations.

Motivation

Elasticsearch is a "near real-time" search engine where changes are not immediately visible after write/update/delete operations. Previously, users had to use time.sleep() workarounds to ensure consistency. This change exposes the underlying refresh parameter, giving users explicit control over this behavior.

Changes

New RefreshType parameter added to the following methods:

Method Default Value
write_documents / write_documents_async "wait_for"
delete_documents / delete_documents_async "wait_for"
delete_all_documents / delete_all_documents_async True
delete_by_filter / delete_by_filter_async "wait_for"
update_by_filter / update_by_filter_async "wait_for"

Parameter values:

  • True: Force refresh immediately after the operation
  • False: Do not refresh (better performance for bulk operations)
  • "wait_for": Wait for the next refresh cycle (default, ensures read-your-writes consistency)

Test Updates

Updated integration tests to use refresh=True instead of time.sleep(), making tests more reliable and faster.

How was this tested?

  • Existing integration tests updated to use the new refresh parameter
  • All tests pass with the new implementation

Add configurable refresh parameter to write, delete, and update methods
in ElasticsearchDocumentStore.

This allows users to control when index changes become visible to search
operations, enabling read-your-writes consistency without relying on
time.sleep() workarounds.

Methods updated:
- write_documents / write_documents_async
- delete_documents / delete_documents_async
- delete_all_documents / delete_all_documents_async
- delete_by_filter / delete_by_filter_async
- update_by_filter / update_by_filter_async

The refresh parameter accepts:
- True: Force immediate refresh
- False: No refresh (best for bulk performance)
- 'wait_for': Wait for next refresh cycle (default)

Closes deepset-ai#2065
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments.
Asking for a review from @mpangrazzi as well.

@anakin87 anakin87 requested a review from mpangrazzi December 18, 2025 16:44
@anakin87
Copy link
Member

@mpangrazzi I see that we are now setting refresh=True default for update_by_filter and delete_by_filter (sync and async). This seems consistent with most of the other methods, but it's a breaking change.

What's your opinion on this aspect?

@GunaPalanivel
Copy link
Contributor Author

@anakin87 I see what you mean about the breaking change now.

Looking at the current code, I noticed that:

  • delete_by_query and update_by_query don't currently have any refresh parameter at all
  • These methods only accept bool (not "wait_for"), as @GauravBansal29 pointed out

So I'm thinking:

  1. For delete_by_filter / update_by_filter → use refresh=True as default to match the current behavior (no breaking change since these are already blocking operations)
  2. For write_documents / delete_documents → use refresh="wait_for" as default (slight breaking change, but better consistency)

But I'm not sure if this is the best approach. @mpangrazzi, would this align with Haystack's philosophy on consistency vs. backward compatibility?

Happy to adjust based on your feedback!

@mpangrazzi
Copy link
Contributor

@anakin87 @GunaPalanivel For update_by_filter and delete_by_filter I would put False as default value for refresh (so non-breaking, as before).

Let's recap the values:

Method Before PR After PR (current) Breaking? Notes
write_documents "wait_for" "wait_for" No ✅ -
write_documents_async True "wait_for" Yes ⚠️ behaviour remains the same
delete_documents "wait_for" "wait_for" No ✅ -
delete_documents_async True "wait_for" Yes ⚠️ behaviour remains the same
delete_all_documents True True No ✅ -
delete_all_documents_async True True No ✅ -
delete_by_filter False (ES default) False No ✅ wait_for not supported here
delete_by_filter_async False (ES default) False No ✅ wait_for not supported here
update_by_filter False (ES default) False No ✅ wait_for not supported here
update_by_filter_async False (ES default) False No ✅ wait_for not supported here

So we're basically using True only for delete_all_documents (sync/async) for backward compatibility. In other cases, we're trying to be simply more consistent.

WDYT?

@GunaPalanivel
Copy link
Contributor Author

GunaPalanivel commented Dec 19, 2025

@mpangrazzi The table makes it crystal clear - backward compatibility first, consistency second.

I'll wait for @anakin87's thoughts and then update the PR with refresh=False for update_by_filter / delete_by_filter methods to keep everything non-breaking.

Thanks for the thorough analysis!

@anakin87
Copy link
Member

Thanks for taking a look, @mpangrazzi. Agree with your table!

…r (sync/async) to maintain backward compatibility per reviewer consensus.
@GunaPalanivel
Copy link
Contributor Author

@mpangrazzi @anakin87 Updated!

I've set refresh=False for update_by_filter and delete_by_filter methods (both sync/async) as recommended.

The implementation now matches your table exactly:

  • All *_by_filter methods use refresh=False (non-breaking)
  • delete_all_documents keeps refresh=True (backward compatible)
  • Write/delete operations use refresh="wait_for" (consistency)

Copy link
Contributor

@mpangrazzi mpangrazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks! @anakin87 feel free to take another look if you think it's needed

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@anakin87 anakin87 merged commit 91f7caf into deepset-ai:main Dec 19, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:elasticsearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants