Skip to content

Prune sequence numbers during merges#143583

Merged
tlrx merged 17 commits intoelastic:mainfrom
tlrx:2026/03/04-prune-seq-no
Mar 5, 2026
Merged

Prune sequence numbers during merges#143583
tlrx merged 17 commits intoelastic:mainfrom
tlrx:2026/03/04-prune-seq-no

Conversation

@tlrx
Copy link
Copy Markdown
Member

@tlrx tlrx commented Mar 4, 2026

This commit changes the RecoverySourcePruneMergePolicy to also prune the sequence number field _seq_no during merges when the field is not necessary anymore for recoveries. The activation of _seq_no is controlled by the IndexSettings#DISABLE_SEQUENCE_NUMBERS index setting.

Only unit tests have been modified. I'll follow up with a couple of integration tests.

Related to #136305

@tlrx tlrx added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v9.4.0 labels Mar 4, 2026
@tlrx tlrx requested review from burqen, fcofdez and romseygeek March 4, 2026 14:17
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@tlrx tlrx marked this pull request as ready for review March 4, 2026 14:17
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 4, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

}
return new PointsReader() {
@Override
public PointValues getValues(String field) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work for partial segments? eg where the sequence number to trim at occurs half-way through a segment, we drop all values up to that point for the doc_values.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be tempted to say that we only trim sequence numbers if they are being stored as a DV field with a skipper, and if you insist on using points then you're on your own... Although maybe that doesn't work if we want to be able to apply this to entire indexes on rollover.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch, I made it this way with the intention to look at points later but had other issues and totally forgot to get back to it.

From your comment and what I can see in Lucene, there is no existing solution to trim points only for some documents in Lucene. So trimming sequence numbers only in DOC_VALUES_ONLY makes sense.

I can add some setting validation to enforce that IndexSettings#DISABLE_SEQUENCE_NUMBERS can only be enabled if IndexSettings#SEQ_NO_INDEX_OPTIONS_SETTING is set to DOC_VALUES_ONLY.

Regular indices use POINTS_AND_DOC_VALUES by default so both settings would need to be set to allow seq no trimming. I think that's OK.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed via another channel I removed the filtered points reader (80e32c2) and added setting validation in 4f1da52

issues: []
pr: 143583
summary: Prune sequence numbers during merges
type: enhancement
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want changelog entries for individual PRs if they're hidden behind feature flags? For other projects I've done we've waited until the feature flag gets removed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added automatically after I set the >enhancement label.
I'll remove this.

@tlrx tlrx requested a review from romseygeek March 5, 2026 10:20
Copy link
Copy Markdown
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can we add at least an integration test that asserts that we're indeed pruning the sequence numbers?

engineConfig.getIndexSettings().getMode() == IndexMode.TIME_SERIES,
() -> softDeletesPolicy.getRetentionQuery(engineConfig.getIndexSettings().seqNoIndexOptions()),
pruneSeqNo,
() -> softDeletesPolicy.getRetentionQuery(seqNoIndexOptions),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a follow up we should check if we could execute the query once for both policies so we ensure that we prune the same doc ids?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look indeed.

@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Mar 5, 2026

Can we add at least an integration test that asserts that we're indeed pruning the sequence numbers?

I mentioned in the description that I'd like to do this in a follow up, because having GET and searches returning sentinel values are required.

Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek
Copy link
Copy Markdown
Contributor

I mentioned in the description that I'd like to do this in a follow up, because having GET and searches returning sentinel values are required.

We'll return -2 from everything, whether or not they've actually been pruned, so it will probably need to do some introspection of the index itself.

@tlrx tlrx merged commit a368ee6 into elastic:main Mar 5, 2026
35 checks passed
@tlrx
Copy link
Copy Markdown
Member Author

tlrx commented Mar 5, 2026

Thanks Francisco and Alan!

@tlrx tlrx deleted the 2026/03/04-prune-seq-no branch March 5, 2026 20:12
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
This commit changes the RecoverySourcePruneMergePolicy to also 
prune the sequence number field _seq_no during merges when the 
field is not necessary anymore for recoveries. The activation of 
_seq_no is controlled by the IndexSettings#DISABLE_SEQUENCE_NUMBERS 
index setting.

Related to elastic#136305
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
This commit changes the RecoverySourcePruneMergePolicy to also 
prune the sequence number field _seq_no during merges when the 
field is not necessary anymore for recoveries. The activation of 
_seq_no is controlled by the IndexSettings#DISABLE_SEQUENCE_NUMBERS 
index setting.

Related to elastic#136305
tlrx added a commit that referenced this pull request Mar 9, 2026
…#143690)

This commit adds an integration test for checking that sequence 
number are correctly pruned after segments are merged (in the 
test, force-merged) and triggers a peer-recovery afterwards.

Relates #143583
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue Team:Distributed Meta label for distributed team. v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants