Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers#143999
Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom Mar 11, 2026
Conversation
…nce numbers The method ESIntegTestCase.getLiveDocs verifies that primary and replica have the same set of documents. This method must be adapted to account for sequence numbers that can be merged away on the shard if the IndexSettings.DISABLE_SEQUENCE_NUMBERS is set. This method was previously adjusted for synthetic id and synthetic sources to rely on the Engine's changes snapshot API to retrieve Lucene documents. At that time, LuceneChangesSnapshot and LuceneSyntheticSourceChangesSnapshot were changed to accommodate for missing id/source. It was already a bit ugly but now with _seq_no also pruned it would require even larger changes in those Lucene*ChangesSnapshot classes only for testing, since _seq_no are loaded at the lower level in Lucene*ChangesSnapshot. So I changed ESIntegTestCase to not use the change snapshot API anymore, I reverted the changes in Lucene*ChangesSnapshot classes and now simply bulk load documents from the reader directly. Relates elastic#136305
Collaborator
|
Pinging @elastic/es-distributed (Team:Distributed) |
19 tasks
romseygeek
approved these changes
Mar 11, 2026
Contributor
romseygeek
left a comment
There was a problem hiding this comment.
+65 -151
My favourite kind of commit. LGTM!
tlrx
added a commit
to tlrx/elasticsearch
that referenced
this pull request
Mar 11, 2026
This commit adds tests to verify that CCR works correctly with pruned sequence numbers. The test is inspired by SeqNoPruningIT. Note: made by Cursor, adjusted by me. Also requires elastic#143999 to pass. Relates elastic#136305
Member
Author
|
Thanks Francisco and Alan! |
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Mar 11, 2026
…elocations * upstream/main: (54 commits) [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997) ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970) [ESQL] Push stats to external source via metadata (elastic#143940) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051 Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912) Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918) Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022 Ensure we use float values for rolling upgrade float vectors (elastic#144032) Remove sensitive info from reindex task description (elastic#143635) Fix HistogramUnionState.equals (elastic#143990) Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776) Engine/Store DistributedArchitectureGuide doc (elastic#143818) Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034 Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987) allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023) [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941) Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999) Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747) TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002) Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873) ...
michalborek
pushed a commit
to michalborek/elasticsearch
that referenced
this pull request
Mar 23, 2026
…nce numbers (elastic#143999) The method ESIntegTestCase.getLiveDocs verifies that primary and replica have the same set of documents. This method must be adapted to account for sequence numbers that can be merged away on the shard if the IndexSettings.DISABLE_SEQUENCE_NUMBERS is set. This method was previously adjusted for synthetic id and synthetic sources to rely on the Engine's changes snapshot API to retrieve Lucene documents. At that time, LuceneChangesSnapshot and LuceneSyntheticSourceChangesSnapshot were changed to accommodate for missing id/source. It was already a bit ugly but now with _seq_no also pruned it would require even larger changes in those Lucene\*ChangesSnapshot classes only for testing, since _seq_no are loaded at the lower level in Lucene\*ChangesSnapshot. So I changed ESIntegTestCase to not use the change snapshot API anymore, I reverted the changes in Lucene*ChangesSnapshot classes and now simply bulk load documents from the reader directly. Relates elastic#136305
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The method ESIntegTestCase.getLiveDocs verifies that primary and replica have the same set of documents. This method must be adapted to account for sequence numbers that can be merged away on the shard if the IndexSettings.DISABLE_SEQUENCE_NUMBERS is set.
This method was previously adjusted for synthetic id and synthetic sources to rely on the Engine's changes snapshot API to retrieve Lucene documents. At that time, LuceneChangesSnapshot and LuceneSyntheticSourceChangesSnapshot were changed to accommodate for missing id/source. It was already a bit ugly but now with _seq_no also pruned it would require even larger changes in those Lucene*ChangesSnapshot classes only for testing, since _seq_no are loaded at the lower level in Lucene*ChangesSnapshot.
So I changed ESIntegTestCase to not use the change snapshot API anymore, I reverted the changes in Lucene*ChangesSnapshot classes and now simply bulk load documents from the reader directly.
Relates #136305