Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@tomaka
Copy link
Contributor

@tomaka tomaka commented Nov 11, 2020

Note that this will trigger for all validators right now, as they have stable connections to their sentries.

These alerts assume that there exists a background noise of connections.
In practice, all of our nodes except for validators have an increase of 7000 to 24000 per 20 minutes, so quite far away from the 0 where the alert would trigger.

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 11, 2020
@tomaka tomaka requested a review from mxinden November 11, 2020 14:00
message: 'The node {{ $labels.instance }} has less than 3 peers for more
than 15 minutes'
- alert: NoIncomingConnection
expr: increase(polkadot_sub_libp2p_incoming_connections_total[20m]) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a for would this not fire each time a new node starts as the first value would be 0 and there are no other values in the past 20m in the timeseries?

Copy link
Contributor Author

@tomaka tomaka Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this with my own server, where my node was offline for a few hours:

image

Instead of 20m I put 6h. In the period before I started the node again, the values were simply missing, and I assume missing values wouldn't trigger the alert.

Interestingly, however, between 1am (when my node crashed) and 7am, values were still present and decreasing linearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, however, between 1am (when my node crashed) and 7am, values were still present and decreasing linearly.

Just to make sure there is no confusion, the increase rate decreased.

message: 'The node {{ $labels.instance }} has less than 3 peers for more
than 15 minutes'
- alert: NoIncomingConnection
expr: increase(polkadot_sub_libp2p_incoming_connections_total[20m]) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, however, between 1am (when my node crashed) and 7am, values were still present and decreasing linearly.

Just to make sure there is no confusion, the increase rate decreased.

@tomaka
Copy link
Contributor Author

tomaka commented Jan 15, 2021

gnunicorn added the Z3-stale label 8 days ago

For what it's worth, this PR is ready to be merged (assuming it gets a second approval), but we're reluctant in doing it in order to not spam the devops with alerts.

@bkchr
Copy link
Member

bkchr commented Jan 15, 2021

gnunicorn added the Z3-stale label 8 days ago

For what it's worth, this PR is ready to be merged (assuming it gets a second approval), but we're reluctant in doing it in order to not spam the devops with alerts.

Approved it :D

@tomaka
Copy link
Contributor Author

tomaka commented Feb 22, 2021

Since Max's approval no longer counts, I need a third approval 😬

@bkchr bkchr requested a review from s3krit February 22, 2021 12:13
Copy link
Contributor

@s3krit s3krit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good to me 👍
I'm assuming your initial comment regarding validators + sentries is also no longer relevant?

@tomaka
Copy link
Contributor Author

tomaka commented Feb 22, 2021

Well, right now it would trigger alerts on our (Parity) nodes, since the change hasn't been done for Polkadot nodes yet as far as I know.

Since #8079 removes sentry nodes altogether, we can merge this close to the next release.

@tomaka
Copy link
Contributor Author

tomaka commented Feb 23, 2021

Apparently our devops have finished the work.

@tomaka tomaka requested a review from a team as a code owner February 23, 2021 08:25
@tomaka
Copy link
Contributor Author

tomaka commented Feb 23, 2021

bot merge

@ghost
Copy link

ghost commented Feb 23, 2021

Waiting for commit status.

@ghost
Copy link

ghost commented Feb 23, 2021

Checks failed; merge aborted.

@tomaka tomaka merged commit e6ac7e7 into paritytech:master Feb 23, 2021
@tomaka tomaka deleted the alert-no-inc branch February 23, 2021 09:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants