Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce metrics #121

Merged
merged 1 commit into from
Jun 14, 2022
Merged

Introduce metrics #121

merged 1 commit into from
Jun 14, 2022

Conversation

zqzten
Copy link
Contributor

@zqzten zqzten commented May 24, 2022

This PR introduce metrics to kine.

It'll expose below metrics:

  • SQL Total & Errors: Counter metrics which hold the total number of SQL operations and failed SQL operations. This is helpful for quickly tracking and alerting local SQL errors which have a variant of causes (network, db, etc.) thus not easy to monitor.
  • SQL Time: Histogram metric which keeps track of the duration of SQL operations. This duration is measured locally and can be used as a supplement to the SQL exec time measurement of db.
  • Compact Errors: Counter metrics which hold the total number of compact errors. Helpful for alerting when continuous compact errors occur which can lead to uncontrollable growth of db size.
  • DBStats metrics of sql.DB used by kine.
  • Process and Go runtime metrics when running stand-alone.

If kine is running stand-alone, we can use the metrics-bind-address flag to configure the metrics http server address or disable the server.
If kine is used as a library, we can set MetricsRegisterer of endpoint.Config to our own prometheus registerer to let kine register its metrics to it.

fixes #119

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Added a few comments about TLS and metrics selection.

I also notice that you didn't add any metrics support to the jetstream driver. Is this an intentional omission?

main.go Outdated Show resolved Hide resolved
pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
@zqzten
Copy link
Contributor Author

zqzten commented May 25, 2022

@brandond Thanks for the comments! I have made changes accrodingly, PTAL.

For the JetStream driver, I omit it because it's not a SQL backend and cannot be included in the metrics introduced in this PR. Also, I'm not quite familiar with it so maybe we shall leave its own metric implementations to some other one who really need them.

@zqzten zqzten requested a review from brandond May 26, 2022 10:06
Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just labeling these with success/failure, I think we can extract the actual error code/number? It'll need to be done on a per-driver basis, but if you add type ErrCode func(error) string to generic.go, and ErrCode ErrCode to the Generic struct, you should be able to do something like this on each driver:

+       dialect.ErrCode = func(err error) string {
+               if err == nil {
+                       return ""
+               }
+               if err, ok := err.(*mysql.MySQLError); ok {
+                       return fmt.Sprint(err.Number)
+               }
+               return err.Error()
+       }

That's an example of mysql. postgres has err.Code, sqlite has err.ExtendedCode, and so on.

pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
pkg/drivers/generic/metrics.go Outdated Show resolved Hide resolved
pkg/drivers/generic/tx.go Outdated Show resolved Hide resolved
Signed-off-by: Zach Zhu <[email protected]>
@zqzten
Copy link
Contributor Author

zqzten commented May 29, 2022

@brandond Brilliant solution 👍 I have modifed the code accroding to your solution, PTAL.

@zqzten zqzten requested a review from brandond May 29, 2022 11:25
@zqzten
Copy link
Contributor Author

zqzten commented Jun 6, 2022

@brandond Hi, are there any blocking issue on this PR? Seems that the test can't proceed with arm64 build for some unknown reason.

@brandond
Copy link
Member

Restarted CI

@zqzten
Copy link
Contributor Author

zqzten commented Jun 14, 2022

@brandond Mind merging this? We'd like to use this feature in community vendor, thanks!

@brandond brandond merged commit 27bd5e7 into k3s-io:master Jun 14, 2022
@zqzten zqzten deleted the metrics branch June 15, 2022 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

metrics support
2 participants