Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional collector for SLM stats #558

Merged
merged 7 commits into from
May 12, 2022

Conversation

Evesy
Copy link
Contributor

@Evesy Evesy commented Apr 1, 2022

Closes #555

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@Evesy
Copy link
Contributor Author

Evesy commented Apr 7, 2022

Thanks for your review @sysadmind

With regards to counter suffix what's your opinion around the below?

Currently there's e.g. total_snapshots_taken (Copied verbatim from the ES API response) which represents that value across all policies, and snapshots_taken (Also copied verbatim) which is labelled by policy.

Adding a _total suffix to these to align with best practices for naming counters would yield:
total_snapshots_taken_total & snapshots_taken_total which is nice as it still aligns with the ES API but feels like a mouthful.
Another option would be to go away from keeping parity with the API response and instead try and name these more descriptively to more clearly call out one is for overall totals and the other for policies?

There's also an argument that some of the total metrics could be removed in favour of just the policy metrics (as you can just sum them to get the same results they currently give), but for some users in large clusters they may not want the cardinality of per policy metrics and choose instead to just keep the totals in their scrape config.

Not sure what's best here? I'm leaning towards the first option of e.g. total_snapshots_taken_total & snapshots_taken_total

@Evesy
Copy link
Contributor Author

Evesy commented Apr 7, 2022

Made the relevant metrics counters now and suffixed appropriately, happy to review/change if you have any differing thoughts on the above though

@Evesy Evesy requested a review from sysadmind May 5, 2022 16:03
Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Just a few minor changes. Otherwise, looks good!

// curl -XPUT http://127.0.0.1:9200/_slm/policy/everything -H 'Content-Type: application/json' -d '{"schedule":"0 */15 * * * ?","name":"<everything-{now/d}>","repository":"my_repository","config":{"indices":".*","include_global_state":true,"ignore_unavailable":true},"retention":{"expire_after":"7d"}}'
// curl http://127.0.0.1:9200/_slm/stats (Numbers manually tweaked)

tcs := map[string][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more readable if it was a map[string]testData with a new type testData struct {}. That would mean instead of the array being positional in the test server, the actual response could be identified by name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just made this into a string rather than an array as the tests only use a single fixture as it stands, and I can't see there needing to be additional fixtures for the same Elastic version with this current collector, however it could always be introduced in the future if needed

Does that feel OK to you?

collector/slm.go Outdated Show resolved Hide resolved
Signed-off-by: Mike Eves <[email protected]>
@sysadmind sysadmind merged commit 4456c97 into prometheus-community:master May 12, 2022
bobo333 added a commit to bobo333/elasticsearch_exporter that referenced this pull request May 18, 2022
* master:
  Refactor cluster info collector (prometheus-community#536)
  Fix linting that was missed in CI run (prometheus-community#568)
  Grafana dashboard: use new node exporter metric names (prometheus-community#501)
  publish total shards on a node (prometheus-community#535)
  Add additional collector for SLM stats (prometheus-community#558)
  Update common Prometheus files (prometheus-community#565)
  Update build (prometheus-community#562)

Signed-off-by: Steven Cipriano <[email protected]>
bobo333 added a commit to bobo333/elasticsearch_exporter that referenced this pull request May 18, 2022
* master:
  Refactor cluster info collector (prometheus-community#536)
  Fix linting that was missed in CI run (prometheus-community#568)
  Grafana dashboard: use new node exporter metric names (prometheus-community#501)
  publish total shards on a node (prometheus-community#535)
  Add additional collector for SLM stats (prometheus-community#558)
  Update common Prometheus files (prometheus-community#565)
  Update build (prometheus-community#562)

Signed-off-by: Steven Cipriano <[email protected]>
iishabakaev pushed a commit to iishabakaev/elasticsearch_exporter that referenced this pull request Jun 8, 2022
* Add additional collector for SLM stats

Signed-off-by: Mike Eves <[email protected]>

* Add additional metric for SLM status (operation mode)

Signed-off-by: Mike Eves <[email protected]>

* Update README

Signed-off-by: Mike Eves <[email protected]>

* Record time metrics in seconds

Signed-off-by: Mike Eves <[email protected]>

* Update metrics to be counters where appropriate

Signed-off-by: Mike Eves <[email protected]>

* Modify tests and update label on operation_mode metric

Signed-off-by: Mike Eves <[email protected]>

* Simplify test fixture

Signed-off-by: Mike Eves <[email protected]>
Signed-off-by: iishabakaev <[email protected]>
jnadler pushed a commit to jnadler/elasticsearch_exporter that referenced this pull request Oct 27, 2022
* Add additional collector for SLM stats

Signed-off-by: Mike Eves <[email protected]>

* Add additional metric for SLM status (operation mode)

Signed-off-by: Mike Eves <[email protected]>

* Update README

Signed-off-by: Mike Eves <[email protected]>

* Record time metrics in seconds

Signed-off-by: Mike Eves <[email protected]>

* Update metrics to be counters where appropriate

Signed-off-by: Mike Eves <[email protected]>

* Modify tests and update label on operation_mode metric

Signed-off-by: Mike Eves <[email protected]>

* Simplify test fixture

Signed-off-by: Mike Eves <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose SLM Metrics
2 participants