Skip to content

Comments

Store keyword fields that trip ignore_above in binary doc values#139093

Merged
Kubik42 merged 8 commits intoelastic:mainfrom
Kubik42:binary-doc-values-index-versions-check
Dec 10, 2025
Merged

Store keyword fields that trip ignore_above in binary doc values#139093
Kubik42 merged 8 commits intoelastic:mainfrom
Kubik42:binary-doc-values-index-versions-check

Conversation

@Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Dec 4, 2025

Reverted the revert.

This PR contains the following additional changes:

  • all of the binary doc values code is now gated behind an IndexVersion check. This is to avoid old indices being switched over to using binary doc values while some of their ignored values are still stored in stored fields
  • adds KeywordRollingUpgradeIT - a BWC test containing a combination of values that do and do not exceed ignore_above

@Kubik42 Kubik42 added Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Dec 4, 2025
@Kubik42 Kubik42 force-pushed the binary-doc-values-index-versions-check branch 3 times, most recently from f13ce69 to 0f210d9 Compare December 5, 2025 04:42
@Kubik42 Kubik42 marked this pull request as ready for review December 5, 2025 05:56
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

allValues.addAll(FIELD_VALUES_2);
search(allValues);
} else {
// skip - the cluster is upgrading, we don't really need to do anything here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also index and search after all nodes have been upgraded?

Copy link
Contributor Author

@Kubik42 Kubik42 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what the else if (isUpgradedCluster()) block is responsible for? In this test method:

  • when the cluster is in the old cluster state (nothing has been upgraded), I create an index, index some documents, then run a search query
  • when the cluster is in upgraded state (all nodes have been upgraded), I index some more documents, then run another search query and expect to find results from before and post the upgrade
  • when the cluster is in mixed state (some nodes have been upgraded), I don't do anything

Kubik42 and others added 5 commits December 5, 2025 19:29
# Conflicts:
#	x-pack/plugin/logsdb/qa/legacy-rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/KeywordRollingUpgradeIT.java
@Kubik42 Kubik42 force-pushed the binary-doc-values-index-versions-check branch from dc8ee86 to 19be8df Compare December 6, 2025 01:18
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some reuse possible with MultiValuedBinaryDocValuesField and BinaryDocValuesSyntheticFieldLoaderLayer with the high cardinality keyword field PR. I think that PR can reuse what this PR adds when this gets merged.

LGTM

* A custom implementation of {@link org.apache.lucene.index.BinaryDocValues} that uses a {@link Set} to maintain a collection of unique
* binary doc values for fields with multiple values per document.
*/
private static final class MultiValuedBinaryDocValuesField extends CustomDocValuesField {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if this implementation can be stream lined in the keyword high cardinality PR, so that we can endup just having one implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did merge the two implementations back before this was reverted, so probably something similar to that?

@Kubik42 Kubik42 enabled auto-merge (squash) December 9, 2025 22:35
@Kubik42 Kubik42 disabled auto-merge December 10, 2025 02:32
@Kubik42 Kubik42 merged commit fa0d83a into elastic:main Dec 10, 2025
33 of 34 checks passed
@Kubik42 Kubik42 deleted the binary-doc-values-index-versions-check branch December 10, 2025 02:33
jordan-powers added a commit that referenced this pull request Dec 18, 2025
In #139093 we updated the NativeArrayIntegrationTestCase to extract the 
single value out of arrays of length 1. This prevents the creation of the
.offsets field (as it's not need for single-valued fields), which then
fails the later assertion in the test which checks for the presence of
this .offsets field.

This PR fixes the test failures by reverting the logic to collapse the
arrays, instead maintaining single-valued arrays as-is.
jordan-powers added a commit to jordan-powers/elasticsearch that referenced this pull request Jan 29, 2026
In elastic#139093 we updated the NativeArrayIntegrationTestCase to extract the
single value out of arrays of length 1. This prevents the creation of the
.offsets field (as it's not need for single-valued fields), which then
fails the later assertion in the test which checks for the presence of
this .offsets field.

This PR fixes the test failures by reverting the logic to collapse the
arrays, instead maintaining single-valued arrays as-is.

(cherry picked from commit 5f24987)

# Conflicts:
#	muted-tests.yml
jordan-powers added a commit that referenced this pull request Jan 30, 2026
In #139093 we updated the NativeArrayIntegrationTestCase to extract the
single value out of arrays of length 1. This prevents the creation of the
.offsets field (as it's not need for single-valued fields), which then
fails the later assertion in the test which checks for the presence of
this .offsets field.

This PR fixes the test failures by reverting the logic to collapse the
arrays, instead maintaining single-valued arrays as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants