Fix NPE when querying pattern_text field in segment with no field values#142767
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Add tests that verify valueFetcher and fieldData return correct values when disable_templating is true. These tests would have caught the NPE on main where PatternTextFallbackDocValues.from() returns null because template_id doc values are never written when templating is disabled. Also strengthen the existing testFieldDataWithMissingFieldSegment to verify the segment with data returns correct values. Co-authored-by: Cursor <cursoragent@cursor.com>
When disableTemplating is true (basic license), PatternTextFallbackDocValues.from() always returns null because template_id doc values are never written. This caused an NPE via valueFetcher and returned empty values via fieldData for all pattern_text fields on basic license. Add loadDocValues() to PatternTextFieldType that selects the correct BinaryDocValues source based on the templating and storage flags. Both valueFetcher and PatternTextIndexFieldData now share this single dispatch point, eliminating duplicated logic. Co-authored-by: Cursor <cursoragent@cursor.com>
The value fetcher provider was returning raw BytesRef objects from doc values. SourceIntervalsSource calls value.toString() which on BytesRef produces hex output instead of text, causing intervals queries to never match. Co-authored-by: Cursor <cursoragent@cursor.com>
| return PatternTextFallbackDocValues.from(context.reader(), this); | ||
| } | ||
|
|
||
| private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException { |
There was a problem hiding this comment.
This feels kind of wrong to do - converting stored fields into doc values. Are you doing this to return BinaryDocValues from loadDocValues()? Can we use PatternTextFallbackDocValues instead?
There was a problem hiding this comment.
Wrapping in doc values is just to simplify the valueFetcher logic. So PatternTextFallbackDocValues wraps the proper pattern_text, the binary fallback, and the stored fallback in a single binary doc values. loadDocValues does the same thing, but it makes the decision between the three options at the whole column level rather than on a per-doc basis. So it will have fewer branches since it doesn't require checking the main pattern_text iterator before falling back on each doc.
I think we'll want to wrap the stored field in a doc value iterator, but we might be able to push this down into PatternTextFallbackDocValues in a cleaner way. I'll give it some more thought next week.
There was a problem hiding this comment.
If possible I think we should make this decision at the column level. And I think this is possible.
martijnvg
left a comment
There was a problem hiding this comment.
This can occur after a license downgrade from enterprise/trial to basic as existing indices retain disable_templating=false, and any subsequent indexing of documents without the pattern_text field will create segments without the pattern_text field which will then trigger the NPE on search.
I don't think I fully understand. IIRC the idea was the the license change would only affect new indices. So after license downgrade everything should remain to work in the same way in current indices using pattern_text, right? Meaning that new documents being indexed would use pattern text doc values. So does that not happen then?
I do see we don't full test the downgrade scenario in PatternTextLicenseDowngradeIT. After license downgrade, we immediately rollover. Maybe we should add a test that after downgrade keeps index and searching the current backing index?
| return PatternTextFallbackDocValues.from(context.reader(), this); | ||
| } | ||
|
|
||
| private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException { |
There was a problem hiding this comment.
If possible I think we should make this decision at the column level. And I think this is possible.
Verifies that intervals queries match correctly on pattern_text fields with disable_templating=true.
Move loadDocValues and storedFieldAsBinaryDocValues from PatternTextFieldType into PatternTextFallbackDocValues.from(), which now handles dispatch for both templating-enabled and templating-disabled paths. Update BytesRefsFromBinaryBlockLoader to accept LeafReaderContext so blockLoader() can use the unified entry point directly.
|
@Kubik42 and @martijnvg Before this change there were 5 locations that needed to choose between using |
This reverts commit 9019b19.
martijnvg
left a comment
There was a problem hiding this comment.
I left one minor comment. I also would to see that in PatternTextLicenseDowngradeIT, we better test the downgrade case better. In particular indexing a few docs before and after downgrade. Also executing a search. Then rollover. I'm ok with doing that in a followup PR.
Otherwise LGTM 👍
| } | ||
| } | ||
|
|
||
| private static BinaryDocValues storedFieldAsBinaryDocValues(LeafReaderContext context, String fieldName) throws IOException { |
There was a problem hiding this comment.
I don't like hiding stored fields behind the BinaryDocValues abstraction. I think in this case it is acceptable, because it is only for bwc and pulling this out here would increase code complexity. Maybe explain this in a comment here?
|
Hi @parkertimmins, I've created a changelog YAML for you. |
Use NoMergePolicy with a direct IndexWriter instead of withLuceneIndex (which uses RandomIndexWriter that can randomly merge segments), ensuring tests that require separate segments per document are deterministic.
💔 Backport failed
You can use sqren/backport to manually backport by running |
Targeted backport of elastic#142767 to 9.3. Fixes three bugs: 1. Empty segment NPE when PatternTextCompositeValues.from() returns null in valueFetcher and PatternTextIndexFieldData. 2. Disabled templating NPE where valueFetcher and PatternTextIndexFieldData called CompositeValues.from() which always returns null without template_id values. 3. BytesRef hex output in getValueFetcherProvider where SourceIntervalsSource received raw BytesRef instead of String, causing intervals queries to never match.
Targeted backport of #142767 to 9.3. Fixes three bugs: 1. Empty segment NPE when PatternTextCompositeValues.from() returns null in valueFetcher and PatternTextIndexFieldData. 2. Disabled templating NPE where valueFetcher and PatternTextIndexFieldData called CompositeValues.from() which always returns null without template_id values. 3. BytesRef hex output in getValueFetcherProvider where SourceIntervalsSource received raw BytesRef instead of String, causing intervals queries to never match.
This PR fixes 3 separate but related related bugs:
Fixes: