Prune sequence numbers during merges#143583
Conversation
|
Hi @tlrx, I've created a changelog YAML for you. |
|
Pinging @elastic/es-distributed (Team:Distributed) |
| } | ||
| return new PointsReader() { | ||
| @Override | ||
| public PointValues getValues(String field) throws IOException { |
There was a problem hiding this comment.
I don't think this is going to work for partial segments? eg where the sequence number to trim at occurs half-way through a segment, we drop all values up to that point for the doc_values.
There was a problem hiding this comment.
I think I would be tempted to say that we only trim sequence numbers if they are being stored as a DV field with a skipper, and if you insist on using points then you're on your own... Although maybe that doesn't work if we want to be able to apply this to entire indexes on rollover.
There was a problem hiding this comment.
Oh good catch, I made it this way with the intention to look at points later but had other issues and totally forgot to get back to it.
From your comment and what I can see in Lucene, there is no existing solution to trim points only for some documents in Lucene. So trimming sequence numbers only in DOC_VALUES_ONLY makes sense.
I can add some setting validation to enforce that IndexSettings#DISABLE_SEQUENCE_NUMBERS can only be enabled if IndexSettings#SEQ_NO_INDEX_OPTIONS_SETTING is set to DOC_VALUES_ONLY.
Regular indices use POINTS_AND_DOC_VALUES by default so both settings would need to be set to allow seq no trimming. I think that's OK.
docs/changelog/143583.yaml
Outdated
| issues: [] | ||
| pr: 143583 | ||
| summary: Prune sequence numbers during merges | ||
| type: enhancement |
There was a problem hiding this comment.
Do we want changelog entries for individual PRs if they're hidden behind feature flags? For other projects I've done we've waited until the feature flag gets removed.
There was a problem hiding this comment.
It was added automatically after I set the >enhancement label.
I'll remove this.
fcofdez
left a comment
There was a problem hiding this comment.
Looks good! Can we add at least an integration test that asserts that we're indeed pruning the sequence numbers?
| engineConfig.getIndexSettings().getMode() == IndexMode.TIME_SERIES, | ||
| () -> softDeletesPolicy.getRetentionQuery(engineConfig.getIndexSettings().seqNoIndexOptions()), | ||
| pruneSeqNo, | ||
| () -> softDeletesPolicy.getRetentionQuery(seqNoIndexOptions), |
There was a problem hiding this comment.
Maybe in a follow up we should check if we could execute the query once for both policies so we ensure that we prune the same doc ids?
I mentioned in the description that I'd like to do this in a follow up, because having GET and searches returning sentinel values are required. |
We'll return -2 from everything, whether or not they've actually been pruned, so it will probably need to do some introspection of the index itself. |
|
Thanks Francisco and Alan! |
This commit changes the RecoverySourcePruneMergePolicy to also prune the sequence number field _seq_no during merges when the field is not necessary anymore for recoveries. The activation of _seq_no is controlled by the IndexSettings#DISABLE_SEQUENCE_NUMBERS index setting. Related to elastic#136305
This commit changes the RecoverySourcePruneMergePolicy to also prune the sequence number field _seq_no during merges when the field is not necessary anymore for recoveries. The activation of _seq_no is controlled by the IndexSettings#DISABLE_SEQUENCE_NUMBERS index setting. Related to elastic#136305
This commit changes the
RecoverySourcePruneMergePolicyto also prune the sequence number field_seq_noduring merges when the field is not necessary anymore for recoveries. The activation of_seq_nois controlled by theIndexSettings#DISABLE_SEQUENCE_NUMBERSindex setting.Only unit tests have been modified. I'll follow up with a couple of integration tests.
Related to #136305