-
Notifications
You must be signed in to change notification settings - Fork 12
[ES-1292925] Fix metrics with reusable counter resets #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yi Jin <[email protected]>
| var it adjustableSeriesIterator | ||
| if m.isCounter { | ||
| it = &counterErrAdjustSeriesIterator{Iterator: r.Iterator(nil)} | ||
| } else { | ||
| it = &noopAdjustableSeriesIterator{Iterator: r.Iterator(nil)} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are applying counter adjustments to the raw data samples before deduplication. This is more complicated than applying the adjustments to the single time series after deduplication. The adjustments will intervene with quorum-based deduplication logic. I'm concerned it may introduce other edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another test case such that one raw time series has a large gap with a reset, but the other two have complete data?
replica 0: [[1000, 10], [10000, 8], [11000, 10]]
replica 1: [[1000, 10], [2000, 0], [3000, 1], [4000, 2], [5000, 3], [6000, 4], [7000, 5], [8000, 6], [9000, 7], [10000, 8], [11000, 10]
replica 2: [[1000, 10], [2000, 0], [3000, 1], [4000, 2], [5000, 3], [6000, 4], [7000, 5], [8000, 6], [9000, 7], [10000, 8], [11000, 10]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added new test cases, the reason i've done it this way is to reuse counterErrAdjustSeriesIterator which you need to call adjustAtValue() somewhere, passing a merged time series to original newDedupSeries doesn't work
| // feed the merged series into dedup series which apply counter adjustment | ||
| return NewMergedSeries(s.lset, repl, s.f) | ||
| } | ||
| if s.deduplicationFunc == AlgorithmChain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @yuchen-db, I've also added the prometheus implementation, but it doesn't work for a number of unit tests if you wonder
e012181 to
33ec6e3
Compare
33ec6e3 to
ccb205c
Compare
Signed-off-by: Yi Jin <[email protected]>
0b0fcfe to
7c98625
Compare
pkg/query/querier.go
Outdated
| partialResponseStrategy := storepb.PartialResponseStrategy_ABORT | ||
| if opts.GroupReplicaPartialResponseStrategy { | ||
| level.Debug(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") | ||
| level.Info(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very chatty. I intentionally changed it to Debug previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found i actually doesn't need this log line, it is logged here already:
level.Info(logger).Log("msg", "databricks querier features", "opts", fmt.Sprintf("%+v", opts))
pkg/query/querier.go
Outdated
| } else if partialResponse { | ||
| partialResponseStrategy = storepb.PartialResponseStrategy_WARN | ||
| } | ||
| level.Info(logger).Log("msg", "Deduplication algorithm applied", "func", opts.DeduplicationFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
pkg/query/querier.go
Outdated
| partialResponseStrategy := storepb.PartialResponseStrategy_ABORT | ||
| if opts.GroupReplicaPartialResponseStrategy { | ||
| level.Debug(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") | ||
| level.Info(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
Signed-off-by: Yi Jin <[email protected]>
7c98625 to
ab31d53
Compare
This PR tries to fix when a reusable counter metrics resets:
Added a number of unit tested and make sure they pass when doing counter functions.
Also unit tested if not doing counter functions the original time series is returned
Pending integration tests, sent for early review and feedbacks
Integration tests work as expected:
Changes
Verification