Skip to content

Fix NPE when querying pattern_text field in segment with no field values#142767

Merged
parkertimmins merged 22 commits intoelastic:mainfrom
parkertimmins:parker/pattern-text-empty-segment-npe
Feb 25, 2026
Merged

Fix NPE when querying pattern_text field in segment with no field values#142767
parkertimmins merged 22 commits intoelastic:mainfrom
parkertimmins:parker/pattern-text-empty-segment-npe

Conversation

@parkertimmins
Copy link
Copy Markdown
Contributor

@parkertimmins parkertimmins commented Feb 20, 2026

This PR fixes 3 separate but related related bugs:

  1. Empty segment NPE - This occurs when pattern_text is enabled (disableTemplating=false). When a segment has no documents containing a pattern_text field, PatternTextFallbackDocValues.from() returns null. Both valueFetcher and PatternTextIndexFieldData called methods on this null reference without checking.
  2. Disabled templating NPE - When disableTemplating is true, valueFetcher() and PatternTextIndexFieldData still called PatternTextFallbackDocValues.from() which always returned null (no template_id values), causing NPEs. The blockLoader and getValueFetcherProvider paths already handled it correctly by falling back to binary doc values or stored fields.
  3. When disabledTemplating is true, the getValueFetcherProvider() method (used by SourceIntervalsSource for intervals queries) returns a raw BytesRef. SourceIntervalsSource calls .toString() on the returned value, which on a BytesRef produces hex-encoded output (e.g., [66 6f 6f]) instead of the actual text. This caused intervals queries to never match.

Fixes:

  1. Check if docValues is null in valueFetcher.fetchValues() and PatternTextIndexFieldData
  2. Push the disableTemplate logic into PatternTextFallbackDocValues.from, so that it is applied in all use cases. After this change PatternTextFallbackDocValues.from() handles the two ways the pattern_text can fallback back to using flat binary doc values or stored fields: 1) as a whole column because disableTemplate=true, or 2) per-document due to values being larger than 32kb.
  3. Convert to utf8ToString() in getValueFetcherProvider since all callers can handle strings

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

parkertimmins and others added 5 commits February 20, 2026 13:31
Add tests that verify valueFetcher and fieldData return
correct values when disable_templating is true.
These tests would have caught the NPE on main
where PatternTextFallbackDocValues.from() returns null
because template_id doc values are never written when
templating is disabled. Also strengthen the existing
testFieldDataWithMissingFieldSegment to verify the
segment with data returns correct values.

Co-authored-by: Cursor <cursoragent@cursor.com>
When disableTemplating is true (basic license),
PatternTextFallbackDocValues.from() always returns null
because template_id doc values are never written. This
caused an NPE via valueFetcher and returned empty values
via fieldData for all pattern_text fields on basic license.

Add loadDocValues() to PatternTextFieldType that selects
the correct BinaryDocValues source based on the templating
and storage flags. Both valueFetcher and
PatternTextIndexFieldData now share this single dispatch
point, eliminating duplicated logic.

Co-authored-by: Cursor <cursoragent@cursor.com>
The value fetcher provider was returning raw BytesRef objects from
doc values. SourceIntervalsSource calls value.toString() which on
BytesRef produces hex output instead of text, causing intervals
queries to never match.

Co-authored-by: Cursor <cursoragent@cursor.com>
return PatternTextFallbackDocValues.from(context.reader(), this);
}

private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of wrong to do - converting stored fields into doc values. Are you doing this to return BinaryDocValues from loadDocValues()? Can we use PatternTextFallbackDocValues instead?

Copy link
Copy Markdown
Contributor Author

@parkertimmins parkertimmins Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping in doc values is just to simplify the valueFetcher logic. So PatternTextFallbackDocValues wraps the proper pattern_text, the binary fallback, and the stored fallback in a single binary doc values. loadDocValues does the same thing, but it makes the decision between the three options at the whole column level rather than on a per-doc basis. So it will have fewer branches since it doesn't require checking the main pattern_text iterator before falling back on each doc.

I think we'll want to wrap the stored field in a doc value iterator, but we might be able to push this down into PatternTextFallbackDocValues in a cleaner way. I'll give it some more thought next week.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I think we should make this decision at the column level. And I think this is possible.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can occur after a license downgrade from enterprise/trial to basic as existing indices retain disable_templating=false, and any subsequent indexing of documents without the pattern_text field will create segments without the pattern_text field which will then trigger the NPE on search.

I don't think I fully understand. IIRC the idea was the the license change would only affect new indices. So after license downgrade everything should remain to work in the same way in current indices using pattern_text, right? Meaning that new documents being indexed would use pattern text doc values. So does that not happen then?

I do see we don't full test the downgrade scenario in PatternTextLicenseDowngradeIT. After license downgrade, we immediately rollover. Maybe we should add a test that after downgrade keeps index and searching the current backing index?

return PatternTextFallbackDocValues.from(context.reader(), this);
}

private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I think we should make this decision at the column level. And I think this is possible.

@parkertimmins parkertimmins self-assigned this Feb 24, 2026
parkertimmins and others added 8 commits February 24, 2026 13:17
Verifies that intervals queries match correctly on
pattern_text fields with disable_templating=true.
Move loadDocValues and storedFieldAsBinaryDocValues from
PatternTextFieldType into PatternTextFallbackDocValues.from(),
which now handles dispatch for both templating-enabled and
templating-disabled paths. Update BytesRefsFromBinaryBlockLoader
to accept LeafReaderContext so blockLoader() can use the unified
entry point directly.
@parkertimmins
Copy link
Copy Markdown
Contributor Author

@Kubik42 and @martijnvg
I went ahead and updated with the description with some more details. The main idea behind this refactor is to move all the logic which selects between different backing storage into PatternTextFallbackDocValues.from. Though this perhaps adds a bit of overhead from wrapped iterators, and forces stored fields into a doc values interface, I think it is a bit safer.

Before this change there were 5 locations that needed to choose between using PatternTextFallbackDocValues.from, BinaryDocValues directly, or stored fields. There were: PatternTextType.valueFetcher, PatternTextType.getValueFetcherProvider, PatternTextIndexFieldData.loadDirect, PatternTextType.blockLoader, and PatternTextFieldMapper.getSyntheticFieldLoader. With this change, the first four can just use PatternTextFallbackDocValues.from directly and only the synthetic field loader needs to handle it separately.

Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one minor comment. I also would to see that in PatternTextLicenseDowngradeIT, we better test the downgrade case better. In particular indexing a few docs before and after downgrade. Also executing a search. Then rollover. I'm ok with doing that in a followup PR.

Otherwise LGTM 👍

}
}

private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like hiding stored fields behind the BinaryDocValues abstraction. I think in this case it is acceptable, because it is only for bwc and pulling this out here would increase code complexity. Maybe explain this in a comment here?

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

@parkertimmins parkertimmins enabled auto-merge (squash) February 25, 2026 17:34
@parkertimmins parkertimmins enabled auto-merge (squash) February 25, 2026 17:58
Use NoMergePolicy with a direct IndexWriter instead of
withLuceneIndex (which uses RandomIndexWriter that can
randomly merge segments), ensuring tests that require
separate segments per document are deterministic.
@parkertimmins parkertimmins merged commit 64c1723 into elastic:main Feb 25, 2026
35 checks passed
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

Status Branch Result
9.3 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 142767

@parkertimmins parkertimmins deleted the parker/pattern-text-empty-segment-npe branch February 25, 2026 21:14
parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request Feb 25, 2026
Targeted backport of elastic#142767 to 9.3. Fixes three bugs:

1. Empty segment NPE when PatternTextCompositeValues.from()
   returns null in valueFetcher and PatternTextIndexFieldData.
2. Disabled templating NPE where valueFetcher and
   PatternTextIndexFieldData called CompositeValues.from()
   which always returns null without template_id values.
3. BytesRef hex output in getValueFetcherProvider where
   SourceIntervalsSource received raw BytesRef instead of
   String, causing intervals queries to never match.
parkertimmins added a commit that referenced this pull request Feb 26, 2026
Targeted backport of #142767 to 9.3. Fixes three bugs:

1. Empty segment NPE when PatternTextCompositeValues.from()
   returns null in valueFetcher and PatternTextIndexFieldData.
2. Disabled templating NPE where valueFetcher and
   PatternTextIndexFieldData called CompositeValues.from()
   which always returns null without template_id values.
3. BytesRef hex output in getValueFetcherProvider where
   SourceIntervalsSource received raw BytesRef instead of
   String, causing intervals queries to never match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants