Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: export rcmgr metrics to prometheus #8785

Merged
merged 6 commits into from
Apr 5, 2022
Merged

Conversation

marten-seemann
Copy link
Member

@marten-seemann marten-seemann commented Mar 11, 2022

Part of #8761

Adds basic metrics under libp2p_rcmgr_*

Demo sample

# HELP libp2p_rcmgr_conns_allowed_total allowed connections
# TYPE libp2p_rcmgr_conns_allowed_total counter
libp2p_rcmgr_conns_allowed_total{direction="inbound",usesFD="false"} 682
libp2p_rcmgr_conns_allowed_total{direction="inbound",usesFD="true"} 614
libp2p_rcmgr_conns_allowed_total{direction="outbound",usesFD="false"} 7647
libp2p_rcmgr_conns_allowed_total{direction="outbound",usesFD="true"} 9107
# HELP libp2p_rcmgr_conns_blocked_total blocked connections
# TYPE libp2p_rcmgr_conns_blocked_total counter
libp2p_rcmgr_conns_blocked_total{direction="inbound",usesFD="false"} 2274
libp2p_rcmgr_conns_blocked_total{direction="inbound",usesFD="true"} 1938
libp2p_rcmgr_conns_blocked_total{direction="outbound",usesFD="false"} 15206
libp2p_rcmgr_conns_blocked_total{direction="outbound",usesFD="true"} 16344
# HELP libp2p_rcmgr_memory_allocations_allowed_total allowed memory allocations
# TYPE libp2p_rcmgr_memory_allocations_allowed_total counter
libp2p_rcmgr_memory_allocations_allowed_total 9808
# HELP libp2p_rcmgr_memory_allocations_blocked_total blocked memory allocations
# TYPE libp2p_rcmgr_memory_allocations_blocked_total counter
libp2p_rcmgr_memory_allocations_blocked_total 0
# HELP libp2p_rcmgr_peer_blocked_total blocked peers
# TYPE libp2p_rcmgr_peer_blocked_total counter
libp2p_rcmgr_peer_blocked_total 0
# HELP libp2p_rcmgr_peers_allowed_total allowed peers
# TYPE libp2p_rcmgr_peers_allowed_total counter
libp2p_rcmgr_peers_allowed_total 17416
# HELP libp2p_rcmgr_protocols_allowed_total allowed streams attached to a protocol
# TYPE libp2p_rcmgr_protocols_allowed_total counter
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/bitswap/1.1.0"} 23
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/bitswap/1.2.0"} 5065
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/id/1.0.0"} 6213
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/id/push/1.0.0"} 859
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/kad/1.0.0"} 4920
libp2p_rcmgr_protocols_allowed_total{protocol="/ipfs/ping/1.0.0"} 936
libp2p_rcmgr_protocols_allowed_total{protocol="/libp2p/autonat/1.0.0"} 11
libp2p_rcmgr_protocols_allowed_total{protocol="/libp2p/circuit/relay/0.1.0"} 29
libp2p_rcmgr_protocols_allowed_total{protocol="/p2p/id/delta/1.0.0"} 18
# HELP libp2p_rcmgr_services_allowed_total allowed streams attached to a service
# TYPE libp2p_rcmgr_services_allowed_total counter
libp2p_rcmgr_services_allowed_total{service="libp2p.autonat"} 11
libp2p_rcmgr_services_allowed_total{service="libp2p.identify"} 6247
libp2p_rcmgr_services_allowed_total{service="libp2p.ping"} 827
# HELP libp2p_rcmgr_services_blocked_total blocked streams attached to a service
# TYPE libp2p_rcmgr_services_blocked_total counter
libp2p_rcmgr_services_blocked_total{service="libp2p.ping"} 109
# HELP libp2p_rcmgr_streams_allowed_total allowed streams
# TYPE libp2p_rcmgr_streams_allowed_total counter
libp2p_rcmgr_streams_allowed_total{direction="inbound"} 9199
libp2p_rcmgr_streams_allowed_total{direction="outbound"} 9743
```

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me, but I don't know how this prometheus business works in ipfs.

@BigLep
Copy link
Contributor

BigLep commented Mar 14, 2022

@marten-seemann : just checking my understanding, but it looks like one sharness test is not passing: https://app.circleci.com/pipelines/github/ipfs/go-ipfs/6284/workflows/1dc038d2-dd08-442e-93e5-30512e10193d/jobs/68339

I assume:

  1. it's related given the prometheus changes.
  2. we'll update the sharness test before we merge this PR.

@BigLep BigLep mentioned this pull request Mar 14, 2022
69 tasks
@marten-seemann
Copy link
Member Author

Yes, that test checks all exported metrics against a list, and fails if any is missing / not expected. I was planning to fix this later.

Note: once libp2p gets a coherent metrics story (see libp2p/go-libp2p#1356), this test should probably be modified to exclude libp2p metrics. I'll leave that decision to the IPFS stewards though.

@BigLep BigLep added the need/author-input Needs input from the original author label Mar 18, 2022
@BigLep
Copy link
Contributor

BigLep commented Mar 18, 2022

@marten-seemann : please comment/ping when the test is updated/passing. Also, I assume we need to update so that this PR only shows the incremental diff on top of #8680 ? I'll make sure we then get reviewer eyes to land this.

@lidel lidel changed the title export rcmgr metrics to Prometheus feat: export rcmgr metrics to prometheus Apr 5, 2022
@lidel lidel removed the need/author-input Needs input from the original author label Apr 5, 2022
@lidel lidel assigned lidel and unassigned marten-seemann Apr 5, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@lidel lidel merged commit 41db58d into update-libp2p-v018 Apr 5, 2022
@lidel lidel deleted the rcmgr-metrics branch April 5, 2022 22:51
guseggert pushed a commit that referenced this pull request Apr 8, 2022
* update go-libp2p to v0.18.0

* initialize the resource manager

* add resource manager stats/limit commands

* load limit file when building resource manager

* log absent limit file

* write rcmgr to file when IPFS_DEBUG_RCMGR is set

* fix: mark swarm limit|stats as experimental

* feat(cfg): opt-in Swarm.ResourceMgr

This ensures we can safely test the resource manager without impacting
default behavior.

- Resource manager is disabled by default
    - Default for Swarm.ResourceMgr.Enabled is false for now
- Swarm.ResourceMgr.Limits allows user to tweak limits per specific
  scope in a way that is persisted across restarts
- 'ipfs swarm limit system' outputs human-readable json
- 'ipfs swarm limit system new-limits.json' sets new runtime limits
  (but does not change Swarm.ResourceMgr.Limits in the config)

Conventions to make libp2p devs life easier:
- 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager
- 'limit.json' overrides implicit defaults from libp2p (if present)

* docs(config): small tweaks

* fix: skip libp2p.ResourceManager if disabled

This ensures 'ipfs swarm limit|stats' work only when enabled.

* fix: use NullResourceManager when disabled

This reverts commit b19f7c9.
after clarification feedback from
#8680 (comment)

* style: rename IPFS_RCMGR to LIBP2P_RCMGR

preexisting libp2p toggles use LIBP2P_ prefix

* test: Swarm.ResourceMgr

* fix: location of opt-in limit.json and rcmgr.json.gz

Places these files inside of IPFS_PATH

* Update docs/config.md

* feat: expose rcmgr metrics when enabled (#8785)

* add metrics for the resource manager
* export protocol and service name in Prometheus metrics
* fix: expose rcmgr metrics only when enabled

Co-authored-by: Marcin Rataj <[email protected]>

* refactor: rcmgr_metrics.go

* refactor: rcmgr_defaults.go

This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled

We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p
decides to change defaults in any of the future releases.

* refactor: adjustedDefaultLimits

Cleans up the way we initialize defaults and adds a fix for case
when connection manager runs with high limits.

It also hides `Swarm.ResourceMgr.Limits` until we have a better
understanding what syntax makes sense.

* chore: cleanup after a review

* fix: restore go-ipld-prime v0.14.2

* fix: restore go-ds-flatfs v0.5.1

Co-authored-by: Lucas Molas <[email protected]>
Co-authored-by: Marcin Rataj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants