-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add separate history for expired failed probe results #517
Conversation
Signed-off-by: Jake Utley <[email protected]>
Signed-off-by: Jake Utley <[email protected]>
2cfe522
to
eae7c47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards compatibility isn't a concern here, as this is all UI.
Per my comments in #383, I think one combined table would be best.
main.go
Outdated
timeoutOffset = kingpin.Flag("timeout-offset", "Offset to subtract from timeout in seconds.").Default("0.5").Float64() | ||
configCheck = kingpin.Flag("config.check", "If true validate the config file and then exit.").Default().Bool() | ||
historyLimit = kingpin.Flag("history.limit", "The maximum amount of items to keep in the history.").Default("100").Uint() | ||
historyPreservedFailedLimit = kingpin.Flag("history.preserved-failed-limit", "The maximum amount of failed items to preserve after expiration.").Default("5").Uint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need an extra setting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the extra flag it's less obvious to me how this should be implemented. Should the history.limit
flag be the cardinality of standard history and the cardinality of preserved failed history? Should that flag be the cardinality of the combined history? Should the preserved failed history be a constant size separate from the history.limit
flag?
I'm open to anything, though I'll admit that without the extra flag, the semantics feel confusing to me: "I set the history.limit
flag to 10, why do I have 20 items?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have it be the limit of each.
…in List() Signed-off-by: Jake Utley <[email protected]>
6f24b6a
to
2dc250f
Compare
Signed-off-by: Jake Utley <[email protected]>
1d25538
to
c164d6b
Compare
mu sync.Mutex | ||
nextId int64 | ||
results []*result | ||
preservedFailedResults []*result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failedResults is enough
@@ -59,14 +68,19 @@ func (rh *resultHistory) List() []*result { | |||
rh.mu.Lock() | |||
defer rh.mu.Unlock() | |||
|
|||
return rh.results[:] | |||
return append(rh.preservedFailedResults[:], rh.results...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to de-dupe here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true, actually. Maybe there is confusion about how I implemented this. I am adding results to the preservedFailedResults
slice only after they expire from the main results
slice. As a result, no result will be in both at the same time.
If we put failed results into this slice immediately, then I think it would make more sense to have one list for successes and one list for failures, then merge the two on List
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you're doing now. A brief comment would help.
Signed-off-by: Jake Utley <[email protected]>
9e5093d
to
6459e28
Compare
Signed-off-by: Jake Utley <[email protected]>
Thanks! |
Hello, Sorry, I feel a bit stupid but I can't find where I can access this separate history ... |
@lbrichet It is below the main history section in the blackbox-exporter web ui. |
Doh ... thank you ! |
This PR adds a new section to the history to hold onto failed probes after they expire from the main history. This allows us to check what debug logs for failed probes in environments with a high rate of successful probes.
I implemented this to be strictly additive for the sake of backwards compatibility. The Web UI has a separate section for these preserved failures.
Resolves #350