Skip to content

Cache frozen FieldType to skip schema validation#15886

Merged
ChrisHegarty merged 16 commits intoapache:mainfrom
Tim-Brooks:only_schema_validation
Apr 7, 2026
Merged

Cache frozen FieldType to skip schema validation#15886
ChrisHegarty merged 16 commits intoapache:mainfrom
Tim-Brooks:only_schema_validation

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

When indexing, every field in every document has its schema built via
updateDocFieldSchema and validated via assertSameSchema against the
existing FieldInfo. For the common case where a field consistently uses
the same frozen FieldType instance, this work is redundant — a frozen
type is immutable, so its schema contribution is identical every time.

This change caches the frozen FieldType on each PerField and checks
same object instance to detect when the type hasn't changed. When it
matches, schema building and validation are skipped entirely, and
FieldSchema only resets its docID. If a different type is encountered,
the cache is invalidated and the full validation path runs. A
deoptimize path handles multi-valued fields where a later value uses a
different type than earlier values within the same document.

Adds FieldType.isFrozen() to support the optimization, and new tests
covering the fast-path, cache invalidation, deoptimize, cross-segment
validation, and document blocks.

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

This is motivated by scenarios where the user is mostly lightly indexing (docvalues, stored fields, etc; with few indexed fields). In scenarios like this the actual validation of the schema starts to dominating the processDocument cost. This change does not make a difference in lunceneutil benchmarks which are inverted indices heavy.

However, in macro docvalue oriented runs in Elasticsearch it has a 10% impact on the documents per second (combined with another small parent field handling change I will follow-up with).

Hopefully there would be some interest in this optimization which targets the extremely common Lucene case of freezing and re-using the same FieldType instance.

image image

@ChrisHegarty
Copy link
Copy Markdown
Contributor

This looks goo to me Tim. Can we add a change log entry for 10.5?

@github-actions github-actions Bot added this to the 11.0.0 milestone Mar 30, 2026
@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

This looks goo to me Tim. Can we add a change log entry for 10.5?

Done.

Comment thread lucene/CHANGES.txt Outdated
@github-actions github-actions Bot modified the milestones: 11.0.0, 10.5.0 Mar 30, 2026
Copy link
Copy Markdown
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChrisHegarty ChrisHegarty merged commit 9e1c6a3 into apache:main Apr 7, 2026
13 checks passed
ChrisHegarty pushed a commit that referenced this pull request Apr 7, 2026
When indexing, every field in every document has its schema built via
updateDocFieldSchema and validated via assertSameSchema against the
existing FieldInfo. For the common case where a field consistently uses
the same frozen FieldType instance, this work is redundant — a frozen
type is immutable, so its schema contribution is identical every time.

This change caches the frozen FieldType on each PerField and checks
same object instance to detect when the type hasn't changed. When it
matches, schema building and validation are skipped entirely, and
FieldSchema only resets its docID. If a different type is encountered,
the cache is invalidated and the full validation path runs. A
deoptimize path handles multi-valued fields where a later value uses a
different type than earlier values within the same document.

Adds FieldType.isFrozen() to support the optimization, and new tests
covering the fast-path, cache invalidation, deoptimize, cross-segment
validation, and document blocks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants