Add separate history for expired failed probe results #517

jutley · 2019-09-04T20:52:56Z

This PR adds a new section to the history to hold onto failed probes after they expire from the main history. This allows us to check what debug logs for failed probes in environments with a high rate of successful probes.

I implemented this to be strictly additive for the sake of backwards compatibility. The Web UI has a separate section for these preserved failures.

Resolves #350

Signed-off-by: Jake Utley <[email protected]>

brian-brazil

Backwards compatibility isn't a concern here, as this is all UI.

Per my comments in #383, I think one combined table would be best.

brian-brazil · 2019-09-05T12:46:56Z

main.go

+	timeoutOffset               = kingpin.Flag("timeout-offset", "Offset to subtract from timeout in seconds.").Default("0.5").Float64()
+	configCheck                 = kingpin.Flag("config.check", "If true validate the config file and then exit.").Default().Bool()
+	historyLimit                = kingpin.Flag("history.limit", "The maximum amount of items to keep in the history.").Default("100").Uint()
+	historyPreservedFailedLimit = kingpin.Flag("history.preserved-failed-limit", "The maximum amount of failed items to preserve after expiration.").Default("5").Uint()


I don't think we need an extra setting here.

Without the extra flag it's less obvious to me how this should be implemented. Should the history.limit flag be the cardinality of standard history and the cardinality of preserved failed history? Should that flag be the cardinality of the combined history? Should the preserved failed history be a constant size separate from the history.limit flag?

I'm open to anything, though I'll admit that without the extra flag, the semantics feel confusing to me: "I set the history.limit flag to 10, why do I have 20 items?"

I'd have it be the limit of each.

…in List() Signed-off-by: Jake Utley <[email protected]>

Signed-off-by: Jake Utley <[email protected]>

brian-brazil · 2019-09-06T14:00:02Z

history.go

+	mu                     sync.Mutex
+	nextId                 int64
+	results                []*result
+	preservedFailedResults []*result


failedResults is enough

brian-brazil · 2019-09-06T14:01:35Z

history.go

@@ -59,14 +68,19 @@ func (rh *resultHistory) List() []*result {
 	rh.mu.Lock()
 	defer rh.mu.Unlock()

-	return rh.results[:]
+	return append(rh.preservedFailedResults[:], rh.results...)


You need to de-dupe here

Not true, actually. Maybe there is confusion about how I implemented this. I am adding results to the preservedFailedResults slice only after they expire from the main results slice. As a result, no result will be in both at the same time.

If we put failed results into this slice immediately, then I think it would make more sense to have one list for successes and one list for failures, then merge the two on List.

Ah, I see what you're doing now. A brief comment would help.

Signed-off-by: Jake Utley <[email protected]>

brian-brazil · 2019-09-10T11:48:56Z

Thanks!

tibuski · 2021-01-22T08:58:22Z

Hello,

Sorry, I feel a bit stupid but I can't find where I can access this separate history ...

jutley · 2021-01-23T15:38:15Z

@lbrichet It is below the main history section in the blackbox-exporter web ui.

tibuski · 2021-01-23T15:45:21Z

Doh ... thank you !

jutley added 2 commits September 4, 2019 13:54

Fix style for make

fe52c8a

Signed-off-by: Jake Utley <[email protected]>

Fix bug

eae7c47

Signed-off-by: Jake Utley <[email protected]>

jutley force-pushed the add-preserved-history branch from 2cfe522 to eae7c47 Compare September 4, 2019 20:54

brian-brazil reviewed Sep 5, 2019

View reviewed changes

Remove maxPreservedFailedResults, merge all results into common view …

2dc250f

…in List() Signed-off-by: Jake Utley <[email protected]>

jutley force-pushed the add-preserved-history branch from 6f24b6a to 2dc250f Compare September 5, 2019 21:38

Remove newline to re-match origin

c164d6b

Signed-off-by: Jake Utley <[email protected]>

jutley force-pushed the add-preserved-history branch from 1d25538 to c164d6b Compare September 5, 2019 21:43

brian-brazil reviewed Sep 6, 2019

View reviewed changes

Add some comments about failed history logic.

6459e28

Signed-off-by: Jake Utley <[email protected]>

jutley force-pushed the add-preserved-history branch from 9e5093d to 6459e28 Compare September 9, 2019 23:32

Fix whitespacing

3e95d12

Signed-off-by: Jake Utley <[email protected]>

brian-brazil merged commit d0c9b46 into prometheus:master Sep 10, 2019

brian-brazil mentioned this pull request Oct 30, 2019

adding probe error log support #383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add separate history for expired failed probe results #517

Add separate history for expired failed probe results #517

jutley commented Sep 4, 2019 •

edited

Loading

brian-brazil left a comment

brian-brazil Sep 5, 2019

jutley Sep 5, 2019

brian-brazil Sep 5, 2019

brian-brazil Sep 6, 2019

brian-brazil Sep 6, 2019

jutley Sep 6, 2019

brian-brazil Sep 9, 2019

brian-brazil commented Sep 10, 2019

tibuski commented Jan 22, 2021

jutley commented Jan 23, 2021

tibuski commented Jan 23, 2021

Add separate history for expired failed probe results #517

Add separate history for expired failed probe results #517

Conversation

jutley commented Sep 4, 2019 • edited Loading

brian-brazil left a comment

Choose a reason for hiding this comment

brian-brazil Sep 5, 2019

Choose a reason for hiding this comment

jutley Sep 5, 2019

Choose a reason for hiding this comment

brian-brazil Sep 5, 2019

Choose a reason for hiding this comment

brian-brazil Sep 6, 2019

Choose a reason for hiding this comment

brian-brazil Sep 6, 2019

Choose a reason for hiding this comment

jutley Sep 6, 2019

Choose a reason for hiding this comment

brian-brazil Sep 9, 2019

Choose a reason for hiding this comment

brian-brazil commented Sep 10, 2019

tibuski commented Jan 22, 2021

jutley commented Jan 23, 2021

tibuski commented Jan 23, 2021

jutley commented Sep 4, 2019 •

edited

Loading