Skip to content

Add slicing mode and fix operation-level reindex metrics#143415

Merged
samxbr merged 16 commits intoelastic:mainfrom
samxbr:reindex/metrics
Mar 6, 2026
Merged

Add slicing mode and fix operation-level reindex metrics#143415
samxbr merged 16 commits intoelastic:mainfrom
samxbr:reindex/metrics

Conversation

@samxbr
Copy link
Copy Markdown
Contributor

@samxbr samxbr commented Mar 2, 2026

Adds a es_reindex_slicing_mode dimension to existing reindex metrics to capture what type of slicing is being used. Also fixed the reindex metrics to be operation-level, meaning each API call to reindex will count as a single reindex metric. Reindex with multiple slices will only emit a single metric instead of one per slice.

Closes #138258
Closes https://github.com/elastic/elasticsearch-team/issues/2375

@samxbr samxbr added :Distributed/Reindex Issues relating to reindex that are not caused by issues further down >non-issue >bug and removed >non-issue labels Mar 2, 2026
@samxbr samxbr changed the title Add slicing_mode and fix operation-level reindex metrics Add slicing mode and fix operation-level reindex metrics Mar 2, 2026
@samxbr samxbr marked this pull request as ready for review March 3, 2026 00:24
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 3, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@samxbr samxbr requested a review from PeteGillinElastic March 3, 2026 18:16
Copy link
Copy Markdown
Member

@PeteGillinElastic PeteGillinElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sam. This LGTM as it stands. But you have a merge conflict, and if that's with Szymon's change to implement relocation with slicing then I suspect that there might be some non-trivial work to be done in Reindexer to resolve it. (You already have a bunch of — unavoidably — fiddly stuff with multiply nested callbacks, and I have a nasty feeling that stuff has changed under you.) I think it's better if we hold off on approval until that's done.

Once you've dealt with the conflict, it might also be a good idea to ping Szymon and get him to look specifically at the changes to the callbacks in Reindexer as he probably knows that better than me.

@samxbr
Copy link
Copy Markdown
Contributor Author

samxbr commented Mar 4, 2026

Thanks Sam. This LGTM as it stands. But you have a merge conflict, and if that's with Szymon's change to implement relocation with slicing then I suspect that there might be some non-trivial work to be done in Reindexer to resolve it. (You already have a bunch of — unavoidably — fiddly stuff with multiply nested callbacks, and I have a nasty feeling that stuff has changed under you.) I think it's better if we hold off on approval until that's done.

Once you've dealt with the conflict, it might also be a good idea to ping Szymon and get him to look specifically at the changes to the callbacks in Reindexer as he probably knows that better than me.

Yeah I did expect there will be some conflict with Szymon's PR. @szybia to have a look as well

@samxbr samxbr requested a review from szybia March 4, 2026 23:20
Copy link
Copy Markdown
Member

@PeteGillinElastic PeteGillinElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'd also like @szybia to weigh in.

Copy link
Copy Markdown
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review, haven't scanned tests yet 🚀

Copy link
Copy Markdown
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! thinking last batch of comments

public static final String ATTRIBUTE_VALUE_SOURCE_LOCAL = "local";
public static final String ATTRIBUTE_VALUE_SOURCE_REMOTE = "remote";

public static final String ATTRIBUTE_NAME_SLICING_MODE = "es_reindex_slicing_mode";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strikes me as a non-consistent name for an attribute, given the other names, look more like a metric name, thoughts?

Suggested change
public static final String ATTRIBUTE_NAME_SLICING_MODE = "es_reindex_slicing_mode";
public static final String ATTRIBUTE_NAME_SLICING_MODE = "slicing_mode";

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric attribute names are validated in MetricValidator, es(_<segment>)+ is the recommended naming convention according to NAMING.md

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation and convention are introduced after the earlier reindex metrics, ideally we should migrate other metrics to the new convention too later

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the es_ prefix is required now, as I discovered the hard way :-). The previous ones have been grandfathered in as exceptions in the validator, but we shouldn't add more.

We could probably drop the reindex though, as this thing is only used as an attributed to metrics with reindex in their names, i.e. es_slicing_mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use es_ as the root segment to easily discover ES attributes. Follow with a module name, team or area of code, e.g. snapshot, repositories, indices, threadpool using existing terminology (whether singular and plural)

So according to their convention, es_reindex_slicing_mode feels a little bit better with a module/area of code prefix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use es_ as the root

where is this written? (curious)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that quote Sam, I agree that including reindex makes sense. I wasn't going off any documentation, I was going off the exception that got thrown when I didn't have the es_ prefix and looking at the regex that was being enforced by that validation code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! ty

public static final String ATTRIBUTE_VALUE_SOURCE_LOCAL = "local";
public static final String ATTRIBUTE_VALUE_SOURCE_REMOTE = "remote";

public static final String ATTRIBUTE_NAME_SLICING_MODE = "es_reindex_slicing_mode";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use es_ as the root

where is this written? (curious)

@samxbr samxbr merged commit 1a56af0 into elastic:main Mar 6, 2026
35 checks passed
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Reindex Issues relating to reindex that are not caused by issues further down Team:Distributed Meta label for distributed team. v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reindex metrics can be inaccurate with slicing enabled

4 participants