Skip to content

Fix connected resource counts after keepalive errors#47931

Merged
espadolini merged 2 commits intomasterfrom
espadolini/keepalive-err-metrics
Oct 25, 2024
Merged

Fix connected resource counts after keepalive errors#47931
espadolini merged 2 commits intomasterfrom
espadolini/keepalive-err-metrics

Conversation

@espadolini
Copy link
Copy Markdown
Contributor

The inventory controller updates prom metrics through its onConnectFunc and onDisconnectFunc callbacks, which are supposed to be called whenever a handle starts or stops heartbeating for a resource. In the case of three consecutive failures in doing a KeepAlive, the app, kube cluster or database that failed to be kept alive was removed from the handle's responsibility but the corresponding metric was not updated. This PR fixes that by calling the onDisconnectFunc appropriately. In addition, onDisconnectFunc is changed to support a count, since at disconnection time we're removing all the heartbeats for the disconnected agent at once (of which there could be many) and it's faster to update the count a single time.

changelog: fixed teleport_connected_resource metric overshooting after keepalive errors

@aws-amplify-us-west-2
Copy link
Copy Markdown

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-47931.d212ksyjt6y4yg.amplifyapp.com

@aws-amplify-us-west-2
Copy link
Copy Markdown

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-47931.d3pp5qlev8mo18.amplifyapp.com

@espadolini espadolini added this pull request to the merge queue Oct 25, 2024
Merged via the queue into master with commit 91bb17a Oct 25, 2024
@espadolini espadolini deleted the espadolini/keepalive-err-metrics branch October 25, 2024 15:28
@public-teleport-github-review-bot
Copy link
Copy Markdown

@espadolini See the table below for backport results.

Branch Result
branch/v14 Failed
branch/v15 Failed
branch/v16 Failed

espadolini added a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
espadolini added a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
espadolini added a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
github-merge-queue Bot pushed a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
github-merge-queue Bot pushed a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
github-merge-queue Bot pushed a commit that referenced this pull request Oct 25, 2024
* Fix connected resource counts after keepalive errors

* Log server_id when cleaning up resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants