Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Missing Prometheus metrics or wrong metrics name or removed. (Grafana related) #4772

Closed
MichelDiz opened this issue Feb 13, 2020 · 4 comments · Fixed by #5089
Closed
Labels
area/documentation Documentation related issues. area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.

Comments

@MichelDiz
Copy link
Contributor

MichelDiz commented Feb 13, 2020

Documentation

What version of Dgraph are you using?

Dgraph version   : v2.0.0-beta1
Dgraph SHA-256   : 178663a98a3d59879a3d5c42928c89eb5f83afc2bfc0093272941e7a53515847
Commit SHA-1     : 6fac5d7c4
Commit timestamp : 2020-01-30 14:45:54 +1100
Branch           : HEAD
Go version       : go1.13.7

Steps to reproduce the issue (command/config used to run Dgraph).

Check docs and compare with the endpoints:
http://localhost:6080/debug/prometheus_metrics
http://localhost:8080/debug/prometheus_metrics

Docs source

https://github.com/dgraph-io/dgraph/blob/master/wiki/content/deploy/index.md#metrics

All badger metrics are in
http://localhost:8080/debug/vars

This statement:

Dgraph exposes metrics via the /debug/vars endpoint in JSON format and the /debug/prometheus_metrics endpoint in Prometheus's text-based format.

Makes users (and me) think that it is part of Prometheus metrics. Which it can't be, cuz /debug/vars is a JSON response. So, my logical thinking says that it should be in /debug/prometheus_metrics which also it isn't.

What doesn't exists but it is in docs

dgraph_goroutines_total (maybe it was changed to go_goroutines?)

Update

One thing that confuses me, and certainly would confuses users is that the metric dgraph_alpha_health_status
appear in the Zero metrics but it has a different code

dgraph_alpha_health_status{method="",status="error"} 1

But, the Alphas has no errors at all, but it is giving this status and a positive number (which means false by logic).
In Alpha says

dgraph_alpha_health_status{method="",status="ok"} 1

Following the Grafana logic, both are false. Following the status logic, "has error = 1/false" and "has status ok = 1/false".
Why?

Also, some metrics have negative numbers. Which doesn't make sense. Why I would have -1k of pending proposals? or -100 go_threads? These things can't have negative params.

e.g:

dgraph_active_mutations_total{method="n.proposeAndWait",status=""} -1000
@MichelDiz MichelDiz added the area/documentation Documentation related issues. label Feb 13, 2020
@danielmai
Copy link
Contributor

Goroutines total is go_goroutines.

@lgalatin lgalatin added the status/accepted We accept to investigate/work on it. label Feb 13, 2020
@MichelDiz
Copy link
Contributor Author

MichelDiz commented Feb 13, 2020

Please consider adding the following metrics.

All these metrics I've found in other DBs like "cassandra" and "mongodb".

Alphas uptime

Network I/O

Disk I/O Utilization
Disk Reads Completed
Disk Writes Completed

Dgraph Version (e.g we have go_info)

A RAM metric usage specific for Zero

I'm using go_memstats_heap_inuse_bytes+go_memstats_heap_idle_bytes to make it work right.

Maybe add a
"Total of transactions done"
"Total of uncommitted transactions"
"Total of Snapshots"
"Snapshot running or not"

@sleto-it sleto-it added the kind/enhancement Something could be better. label Feb 26, 2020
@sleto-it sleto-it added the area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. label Mar 16, 2020
@fl-max
Copy link

fl-max commented Mar 18, 2020

@MichelDiz most of those metrics can be obtained from other exporters.

I too would like clarity on the dgraph_alpha_health_status metric issue/discrepancy between alpha and zero. It would also make more sense to have a dgraph_zero_health_status metric instead on the Zero nodes.

@prashant-shahi
Copy link
Contributor

Some of the mentioned issues had been resolved in #4948.

danielmai added a commit that referenced this issue Apr 23, 2020
Fixes #4772 

This adds the following badger metrics into /debug/prometheus_metrics:

badger_v2_disk_reads_total
badger_v2_disk_writes_total
badger_v2_gets_total
badger_v2_lsm_bloom_hits_total (per level)
badger_v2_lsm_level_gets_total (per level)
badger_v2_memtable_gets_total
badger_v2_puts_total
badger_v2_read_bytes
badger_v2_written_bytes

This is added via the Prometheus expvar collector.
Update metrics_test.go for the seven initial Badger versions (excluding the LSM metrics).

This adds to the exposed Prometheus metrics. These metrics were already accessible via /debug/vars.

The LSM metrics don't show up immediately. They show up after there are hits/gets to the LSM tree.

Changes
* Add Badger metrics to Prometheus
* Update metrics test to check for badger metrics.

Co-authored-by: Ibrahim Jarif <[email protected]>
danielmai added a commit that referenced this issue Apr 24, 2020
Fixes #4772 

This adds the following badger metrics into /debug/prometheus_metrics:

badger_v2_disk_reads_total
badger_v2_disk_writes_total
badger_v2_gets_total
badger_v2_lsm_bloom_hits_total (per level)
badger_v2_lsm_level_gets_total (per level)
badger_v2_memtable_gets_total
badger_v2_puts_total
badger_v2_read_bytes
badger_v2_written_bytes

This is added via the Prometheus expvar collector.
Update metrics_test.go for the seven initial Badger versions (excluding the LSM metrics).

This adds to the exposed Prometheus metrics. These metrics were already accessible via /debug/vars.

The LSM metrics don't show up immediately. They show up after there are hits/gets to the LSM tree.

Changes
* Add Badger metrics to Prometheus
* Update metrics test to check for badger metrics.

Co-authored-by: Ibrahim Jarif <[email protected]>
danielmai added a commit that referenced this issue Apr 24, 2020
Fixes #4772 

This adds the following badger metrics into /debug/prometheus_metrics:

badger_v2_disk_reads_total
badger_v2_disk_writes_total
badger_v2_gets_total
badger_v2_lsm_bloom_hits_total (per level)
badger_v2_lsm_level_gets_total (per level)
badger_v2_memtable_gets_total
badger_v2_puts_total
badger_v2_read_bytes
badger_v2_written_bytes

This is added via the Prometheus expvar collector.
Update metrics_test.go for the seven initial Badger versions (excluding the LSM metrics).

This adds to the exposed Prometheus metrics. These metrics were already accessible via /debug/vars.

The LSM metrics don't show up immediately. They show up after there are hits/gets to the LSM tree.

Changes
* Add Badger metrics to Prometheus
* Update metrics test to check for badger metrics.

Co-authored-by: Ibrahim Jarif <[email protected]>
dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 18, 2020
Fixes hypermodeinc#4772 

This adds the following badger metrics into /debug/prometheus_metrics:

badger_v2_disk_reads_total
badger_v2_disk_writes_total
badger_v2_gets_total
badger_v2_lsm_bloom_hits_total (per level)
badger_v2_lsm_level_gets_total (per level)
badger_v2_memtable_gets_total
badger_v2_puts_total
badger_v2_read_bytes
badger_v2_written_bytes

This is added via the Prometheus expvar collector.
Update metrics_test.go for the seven initial Badger versions (excluding the LSM metrics).

This adds to the exposed Prometheus metrics. These metrics were already accessible via /debug/vars.

The LSM metrics don't show up immediately. They show up after there are hits/gets to the LSM tree.

Changes
* Add Badger metrics to Prometheus
* Update metrics test to check for badger metrics.

Co-authored-by: Ibrahim Jarif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Documentation related issues. area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/enhancement Something could be better. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

6 participants