Skip to content

Add LongConsumer progress callback to Operations.determinize for memo…#15937

Open
drempapis wants to merge 2 commits intoapache:mainfrom
drempapis:feature/determinize-progress-callback
Open

Add LongConsumer progress callback to Operations.determinize for memo…#15937
drempapis wants to merge 2 commits intoapache:mainfrom
drempapis:feature/determinize-progress-callback

Conversation

@drempapis
Copy link
Copy Markdown

Problem

Lucene's Operations.determinize uses powerset construction to convert an NFA into a DFA. For patterns with combinatorial structure (e.g. abcdef*), this can cause exponential blowup in the number of DFA states, each carrying its own FrozenIntSet snapshot, HashMap entry, and backing arrays. The existing workLimit parameter bounds CPU effort but provides no memory signal, the JVM can OOM long before the work limit is reached. There is currently no way for callers to observe or react to memory growth during determinization.

Update

This PR adds a new overload to Lucene's determinization API

public static Automaton determinize(Automaton a, int workLimit, LongConsumer progressCallback)

When a non-null callback is provided, Lucene periodically invokes it with an accumulated estimate of bytes allocated since the last report. The caller can use this signal to enforce memory policies (e.g. throw a circuit-breaker exception to abort determinization)

The estimate is built from the allocations directly attributable to each newly discovered DFA state during powerset construction. Rather than invoking the callback on every new state (which would add overhead proportional to state count), estimates are accumulated and reported in chunks once a configurable byte threshold is crossed. Any remainder is flushed once at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant