Skip to content

Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers#143999

Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom
tlrx:2026/03/11-adjust-test-framework-seq-no-disabled
Mar 11, 2026
Merged

Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers#143999
elasticsearchmachine merged 5 commits intoelastic:mainfrom
tlrx:2026/03/11-adjust-test-framework-seq-no-disabled

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Mar 11, 2026

The method ESIntegTestCase.getLiveDocs verifies that primary and replica have the same set of documents. This method must be adapted to account for sequence numbers that can be merged away on the shard if the IndexSettings.DISABLE_SEQUENCE_NUMBERS is set.

This method was previously adjusted for synthetic id and synthetic sources to rely on the Engine's changes snapshot API to retrieve Lucene documents. At that time, LuceneChangesSnapshot and LuceneSyntheticSourceChangesSnapshot were changed to accommodate for missing id/source. It was already a bit ugly but now with _seq_no also pruned it would require even larger changes in those Lucene*ChangesSnapshot classes only for testing, since _seq_no are loaded at the lower level in Lucene*ChangesSnapshot.

So I changed ESIntegTestCase to not use the change snapshot API anymore, I reverted the changes in Lucene*ChangesSnapshot classes and now simply bulk load documents from the reader directly.

Relates #136305

…nce numbers

The method ESIntegTestCase.getLiveDocs verifies that primary and replica
have the same set of documents. This method must be adapted to account
for sequence numbers that can be merged away on the shard if the
IndexSettings.DISABLE_SEQUENCE_NUMBERS is set.

This method was previously adjusted for synthetic id and synthetic
sources to rely on the Engine's changes snapshot API to retrieve Lucene
documents. At that time, LuceneChangesSnapshot and
LuceneSyntheticSourceChangesSnapshot were changed to accommodate for
missing id/source. It was already a bit ugly but now with _seq_no also
pruned it would require even larger changes in those
Lucene*ChangesSnapshot classes only for testing, since _seq_no are
loaded at the lower level in Lucene*ChangesSnapshot.

So I changed ESIntegTestCase to not use the change snapshot API anymore,
I reverted the changes in Lucene*ChangesSnapshot classes and now simply
bulk load documents from the reader directly.

Relates elastic#136305
@tlrx tlrx added >test Issues or PRs that are addressing/adding tests :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v9.4.0 labels Mar 11, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 11, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+65 -151

My favourite kind of commit. LGTM!

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 11, 2026
This commit adds tests to verify that CCR works correctly with pruned
sequence numbers. The test is inspired by SeqNoPruningIT.

Note: made by Cursor, adjusted by me. Also requires elastic#143999 to pass.

Relates elastic#136305
@tlrx tlrx added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 11, 2026
@elasticsearchmachine elasticsearchmachine merged commit 51601c6 into elastic:main Mar 11, 2026
36 checks passed
@tlrx tlrx deleted the 2026/03/11-adjust-test-framework-seq-no-disabled branch March 11, 2026 15:03
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Mar 11, 2026

Thanks Francisco and Alan!

szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 11, 2026
…elocations

* upstream/main: (54 commits)
  [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997)
  ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970)
  [ESQL] Push stats to external source via metadata (elastic#143940)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051
  Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912)
  Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022
  Ensure we use float values for rolling upgrade float vectors (elastic#144032)
  Remove sensitive info from reindex task description (elastic#143635)
  Fix HistogramUnionState.equals (elastic#143990)
  Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776)
  Engine/Store DistributedArchitectureGuide doc (elastic#143818)
  Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034
  Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987)
  allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023)
  [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941)
  Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999)
  Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747)
  TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002)
  Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873)
  ...
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…nce numbers (elastic#143999)

The method ESIntegTestCase.getLiveDocs verifies that primary and replica
have the same set of documents. This method must be adapted to account
for sequence numbers that can be merged away on the shard if the
IndexSettings.DISABLE_SEQUENCE_NUMBERS is set.

This method was previously adjusted for synthetic id and synthetic
sources to rely on the Engine's changes snapshot API to retrieve Lucene
documents. At that time, LuceneChangesSnapshot and
LuceneSyntheticSourceChangesSnapshot were changed to accommodate for
missing id/source. It was already a bit ugly but now with _seq_no also
pruned it would require even larger changes in those
Lucene\*ChangesSnapshot classes only for testing, since _seq_no are
loaded at the lower level in Lucene\*ChangesSnapshot.

So I changed ESIntegTestCase to not use the change snapshot API anymore,
I reverted the changes in Lucene*ChangesSnapshot classes and now simply
bulk load documents from the reader directly.

Relates elastic#136305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team. >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants