Add metric for shard snapshot queue time#143658
Add metric for shard snapshot queue time#143658elasticsearchmachine merged 9 commits intoelastic:mainfrom
Conversation
Add a new histogram metric es.repositories.snapshots.shards.queue_time.histogram that reports how long each shard snapshot spent waiting in queues before its actual operation began. The creation time is stored as a negated value in the existing startTimeMillis field of IndexShardSnapshotStatus, which is then overwritten by moveToStarted. This lets startTimeMillis > 0 reliably indicate that the snapshot has started, replacing the previous != 0 check. Queue time is computed when the shard is dequeued in BlobStoreRepository.doSnapshotShard. Closes ES-14292
|
Pinging @elastic/es-distributed (Team:Distributed) |
| /** | ||
| * @param creationTimeMillis the time this status was created, used to compute queue time until the snapshot starts. | ||
| * Stored as a negative value in {@code startTimeMillis} so that {@code startTimeMillis > 0} | ||
| * reliably indicates that {@link #moveToStarted} has been called. | ||
| */ | ||
| public static IndexShardSnapshotStatus newInitializing(ShardGeneration generation, long creationTimeMillis) { | ||
| return new IndexShardSnapshotStatus(Stage.INIT, -creationTimeMillis, 0L, 0, 0, 0, 0, 0, 0, null, generation, "initializing"); |
There was a problem hiding this comment.
This avoids adding a new field to just track the queue time. Please let me know if you'd rather prefer a separate field.
There was a problem hiding this comment.
I think we should add a new field, I think it's preferable from a maintainability perspective
| startTimeMillis, | ||
| startTimeMillis < 0 ? 0 : startTimeMillis, |
There was a problem hiding this comment.
The Copy ends up being visible to the users and can go across the wire. So maintaining the field being non-negative seems right. It really is an implementation detail.
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
nicktindall
left a comment
There was a problem hiding this comment.
Sorry, I think we should have a separate field. I assume that has BWC implications? but still I think it's worthwhile for readability and perhaps it'll become useful for more one day
| /** | ||
| * @param creationTimeMillis the time this status was created, used to compute queue time until the snapshot starts. | ||
| * Stored as a negative value in {@code startTimeMillis} so that {@code startTimeMillis > 0} | ||
| * reliably indicates that {@link #moveToStarted} has been called. | ||
| */ | ||
| public static IndexShardSnapshotStatus newInitializing(ShardGeneration generation, long creationTimeMillis) { | ||
| return new IndexShardSnapshotStatus(Stage.INIT, -creationTimeMillis, 0L, 0, 0, 0, 0, 0, 0, null, generation, "initializing"); |
There was a problem hiding this comment.
I think we should add a new field, I think it's preferable from a maintainability perspective
|
Yeah no problem. I updated with a separate field in 2951f9c |
nicktindall
left a comment
There was a problem hiding this comment.
This LGTM, is zero a good default to use for creationTimeMillis in newDone? I know we won't use the creation time value for any metrics but I don't know what newDone statuses are used for (just wondering if startTime might make more sense?)
|
Add a new histogram metric `es.repositories.snapshots.shards.queue_time.histogram` that reports how long each shard snapshot spent waiting in queues before its actual operation began. The creation time is stored as a new field. Queue time is computed when the shard is dequeued in `BlobStoreRepository.doSnapshotShard`. This new field is an internal stat and not exposed to end users, i.e. not available in `IndexShardSnapshotStatus.Copy`. Closes ES-14292
Add a new histogram metric
es.repositories.snapshots.shards.queue_time.histogramthat reports how long each shard snapshot spent waiting in queues before its actual operation began. The creation time is stored as a new field. Queue time is computed when the shard is dequeued inBlobStoreRepository.doSnapshotShard. This new field is an internal stat and not exposed to end users, i.e. not available inIndexShardSnapshotStatus.Copy.Closes ES-14292