Skip to content

Fix possible lost signal in ControlledRealTimeReopenThread#15830

Merged
dweiss merged 5 commits intoapache:mainfrom
viliam-durina:viliam/fix-reopener
Apr 9, 2026
Merged

Fix possible lost signal in ControlledRealTimeReopenThread#15830
dweiss merged 5 commits intoapache:mainfrom
viliam-durina:viliam/fix-reopener

Conversation

@viliam-durina
Copy link
Copy Markdown
Contributor

Condition.signal/await was used to wake up the background thread to reload data. Since the background thread isn't owning the condition's lock at all times, notification can be lost. Specifically, if a new waiter arrives while doing maybeRefreshBlocking() and before it obtains the lock again, the thread will go to sleep, instead of refreshing again. Similarly, the thread won't terminate if close() is called during this window. The issue is pronounced if minTime is 0 and maxTime is high. Owning the lock all the time is not a solution, because it would unnecessarily block all waiters, even if they are not interested in the generation being loaded.

This PR replaces this notification with a semaphore. The semaphore collects permits, and the background thread tries to acquire a permit with a timeout as a way of sleeping. Releasing a permit serves as the signal. This way no notification is lost.

There's another wait-notify construct in this class, but that one is correct. The waiter owns the lock all the time, and the write to searchingGen is done while holding the same monitor.

Other changes included in this PR:

  • simplify the background thread loop (the run() method), now it's simple steps: loop until finished, in each iteration determine sleep time, if >0, sleep, else refresh.
  • move refreshStartGen to the listener class, it's used only there
  • clarify javadocs: This class is the Thread itself, not a utility. The user has to run it. And more...

@viliam-durina viliam-durina changed the title Fix thread notification in ControlledRealTimeReopenThread Fix possible lost signal in ControlledRealTimeReopenThread Mar 16, 2026
@github-actions github-actions bot added this to the 11.0.0 milestone Mar 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with this code but this change looks ok. Thanks for tackling the difficult concurrency stuff.

I feel tempted to make synchronization explicit (instead of method keyword) consistently. Makes reasoning a bit easier (to me). What do you think?

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Apr 8, 2026

I allowed myself to add an explicit monitor and change the parameters to an explicit Duration. I think it's clearer and the class is experimental so no problem there. Let me know if this is ok.

@github-actions github-actions bot removed the Stale label Apr 9, 2026
@viliam-durina
Copy link
Copy Markdown
Contributor Author

Not synchronizing on the instance itself is a good practice, so it's fine.
Using Duration - it's a breaking change, if that doesn't matter, it's fine.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Apr 9, 2026

This is clearly marked as an experimental class so I don't think this matters. People will have to adjust (and then at least they'll know what the time unit passed to this constructor is - the fraction-of-a-second-double is really strange, at least to me).

@dweiss dweiss merged commit f3cc239 into apache:main Apr 9, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants