Skip to content
This repository has been archived by the owner on Aug 19, 2022. It is now read-only.

feat: Usable out of the box metrics #50

Closed
MarcoPolo opened this issue Jun 14, 2022 · 4 comments · Fixed by #54
Closed

feat: Usable out of the box metrics #50

MarcoPolo opened this issue Jun 14, 2022 · 4 comments · Fixed by #54
Assignees

Comments

@MarcoPolo
Copy link
Contributor

resource manager should expose metrics around the current resource usage. And aggregating where necessary.

It doesn't make sense to record every peer's resource usage, but aggregates would still be very valuable (avg, p90, p99, max, min) to operators.

In most use cases the end-user will want this information, and we already record this so we should expose.

@marten-seemann
Copy link
Contributor

Should this be part of libp2p/go-libp2p#1356? Unfortunately, the API we were going to use is still in Alpha.

@MarcoPolo
Copy link
Contributor Author

It could be, but this is tightly scoped and I don't think it should be blocked on 1356 or on waiting for opentelemetry metrics to come out of alpha.

@MarcoPolo MarcoPolo self-assigned this Jun 16, 2022
@BigLep
Copy link
Contributor

BigLep commented Jun 18, 2022

I know there is work underway on this. I'm putting in my notes for the high-level dashboard I think would be useful for an operator to see:
image
(mocked from here)

The system/transient graphs straight copy/pastes from the observable. The other scops are showing an aggregation. At the minimum when it comes to aggregation I would want to see:

  • n - the number of values that were aggregated. It should be graphed on a separate axis (left axis). For example, with peer scope, at the time of metric datapoint generation, it would denote how many peers were in that metric interval.
  • p50 - at the time of datapoint generation, would show the median value. It should be on a separate axis (right axis)
  • p90 - show the 90th percentil
  • p100 - show the max value

We could obviously have other percentiles too.
If percentiles are too expensive to calculate because of the sorting required we could get away with average and max for now.

I could imagine an operator seeing that dashboard above, and if things look fishy, can then do the tracing/observable route for deeper dive to analyze.

@BigLep
Copy link
Contributor

BigLep commented Jun 28, 2022

Once this feature is developed, we also need to make sure user's can discover it easily enough. There is a docs issue to give visibility to it here: libp2p/docs#158

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants