Skip to content

Report uptime as metrics for spire server and agent with config#2032

Merged
amartinezfayo merged 8 commits intospiffe:masterfrom
hixichen:cxi/uptime-as-metrics
Jan 21, 2021
Merged

Report uptime as metrics for spire server and agent with config#2032
amartinezfayo merged 8 commits intospiffe:masterfrom
hixichen:cxi/uptime-as-metrics

Conversation

@hixichen
Copy link
Contributor

@hixichen hixichen commented Jan 4, 2021

Pull Request check list

  • Commit conforms to CONTRIBUTING.md?
  • Proper tests/regressions included?
  • Documentation updated?

Affected functionality

  • Called during bootstrap.
  • Add uptimeReportInterval into agent and server config.

Description of change

Add a uptime metrics for spire-agent and spire-server.
The goal here is to t indicate both Spire agent and Spire server are healthy.
And, uptime metric will help to determine if the process is periodically crashing.

This is different compared to the existing uptime in the debug API.

Test locally:

for run in {1..50}; do time make test; done

Which issue this PR fixes

fixes #1231

@hixichen hixichen changed the title WIP: Report update time as metrics for spire server and agent with config Report update time as metrics for spire server and agent with config Jan 6, 2021
@amartinezfayo
Copy link
Member

Thank you @hixichen for this contribution.
Before going into the details of the code changes, I would like to make a note that the introduction of an uptime metric has been previously discussed in #1208, where the version metric was added and adding an uptime metric was considered to be useful for the future (captured in #1231).
I just want to make sure that the motivation for this change is driven by the need of knowing the uptime specifically, and that the version metric that is emitted when the server/agent starts is not enough for your needs.
Knowing a little more about the use case and the motivation for this in your context would be helpful.
Thanks again!

@hixichen hixichen changed the title Report update time as metrics for spire server and agent with config Report uptime as metrics for spire server and agent with config Jan 6, 2021
@hixichen
Copy link
Contributor Author

hixichen commented Jan 15, 2021

Thank you @hixichen for this contribution.
Before going into the details of the code changes, I would like to make a note that the introduction of an uptime metric has been previously discussed in #1208, where the version metric was added and adding an uptime metric was considered to be useful for the future (captured in #1231).
I just want to make sure that the motivation for this change is driven by the need of knowing the uptime specifically, and that the version metric that is emitted when the server/agent starts is not enough for your needs.
Knowing a little more about the use case and the motivation for this in your context would be helpful.
Thanks again!

Hi,Agustín:
Great thanks for your reply.
We are trying to use uptime to track long-term uptime since the version metric can be purged after some period of time.
Feel free to let me know if you have concerns on the code.

Signed-off-by: CHEN XI <cxi@uber.com>
Signed-off-by: CHEN XI <cxi@uber.com>
Signed-off-by: CHEN XI <cxi@uber.com>
@hixichen hixichen force-pushed the cxi/uptime-as-metrics branch from e42a8bf to 2e14964 Compare January 15, 2021 17:39
Signed-off-by: CHEN XI <cxi@uber.com>
@hixichen
Copy link
Contributor Author

@amartinezfayo let me know if you have other comments.

thanks.

Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hixichen for providing the details. I think that it makes sense to add this metric to cover that case.
My main concern is around the new agent/server config settings added. Please see my comments about that.

Signed-off-by: CHEN XI <cxi@uber.com>
Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few nitpicks. We should be ready to go after that.
Thank you @hixichen for your patience!

Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hixichen!

@amartinezfayo amartinezfayo merged commit 41b38d5 into spiffe:master Jan 21, 2021
@hixichen
Copy link
Contributor Author

Yay!
thanks. @amartinezfayo @amoore877

@hixichen hixichen deleted the cxi/uptime-as-metrics branch January 21, 2021 04:32
@azdagron azdagron added this to the 0.12.2 milestone Mar 23, 2021
azdagron pushed a commit to azdagron/spire that referenced this pull request Mar 30, 2021
…fe#2032)

* Report update time as metrics for spire server and agent

Signed-off-by: CHEN XI <cxi@uber.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide an uptime metric for SPIRE Server and SPIRE Agent

5 participants