feat[bc]: rename gauges, add seeder metrics, and eagerly open blob core on indexers#1692
Merged
yuranich merged 5 commits intoApr 22, 2026
Conversation
…egistration Made-with: Cursor
…ian for view-derived stat panels Made-with: Cursor
ad15947
into
tetherto:feature-qvac-lib-registry-server-metrics-monitoring
6 checks passed
yuranich
added a commit
that referenced
this pull request
Apr 24, 2026
…stry server (#1724) * QVAC-17131 feat: add Prometheus metrics monitoring to registry server (#1600) * feat: add Prometheus metrics monitoring to registry server * fix: restrict registry ping RPC to role and timestamp to avoid exposing operational data * fix: make metrics bind host configurable and move off port 9090 * feat: replace per-model size gauge with view-derived total blob bytes (#1689) * feat[bc]: rename gauges, add seeder metrics, and eagerly open blob core on indexers (#1692) * feat[bc]: rename gauge metrics off _total suffix and pre-initialise rpc counters * feat: add core seeder metrics and eagerly open blob core on indexers * style: drop eslint-disable directives via helper function for gauge registration * refactor[bc]: drop core_name label from blob core metrics and use median for view-derived stat panels * style: drop noisy comment above registerGauge helper * feat[bc]: replace blob_core_fully_downloaded with length/contiguous_length pair and drop blind-peer metrics (#1702) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels (#1716) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels * fix: use vm_name label in QVAC and Holepunch panel legends instead of raw instance IP:port * fix: apply $vm template filter to QVAC and Holepunch selectors for consistent per-node filtering * chore[docs]: tighten registry Grafana dashboard panels based on staging review (#1718) * chore[docs]: tighten registry Grafana dashboard panels based on staging review * chore[docs]: drop redundant Blob Core Contiguous stat, cluster blob panels near the top * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section (#1719) * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section * chore[docs]: split View Core Replication into length, contiguous, and gap panels * chore: remove dead blind-peer helpers and fix stale metrics docs - Drop unreferenced getConnectedBlindPeerKeys / getConfiguredBlindPeerKeys / isBlindPeerConnected chain and the _peerConnectionCounts map that only existed to back isBlindPeerConnected. Left over from the dropped blob_core_blind_peers gauge (1de851b). - Fix DEPLOYMENT_GUIDE.md: default metrics port is 9210, not 9090; drop the hypermetrics reference since it is not a dependency (abandoned, incompatible with Hypercore v11) and per-core visibility is provided by the registry_blob_core_* / registry_view_core_* gauges.
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…stry server (#1724) * QVAC-17131 feat: add Prometheus metrics monitoring to registry server (#1600) * feat: add Prometheus metrics monitoring to registry server * fix: restrict registry ping RPC to role and timestamp to avoid exposing operational data * fix: make metrics bind host configurable and move off port 9090 * feat: replace per-model size gauge with view-derived total blob bytes (#1689) * feat[bc]: rename gauges, add seeder metrics, and eagerly open blob core on indexers (#1692) * feat[bc]: rename gauge metrics off _total suffix and pre-initialise rpc counters * feat: add core seeder metrics and eagerly open blob core on indexers * style: drop eslint-disable directives via helper function for gauge registration * refactor[bc]: drop core_name label from blob core metrics and use median for view-derived stat panels * style: drop noisy comment above registerGauge helper * feat[bc]: replace blob_core_fully_downloaded with length/contiguous_length pair and drop blind-peer metrics (#1702) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels (#1716) * feat: expand Grafana dashboard with blob-core replication, seeders, and Holepunch P2P panels * fix: use vm_name label in QVAC and Holepunch panel legends instead of raw instance IP:port * fix: apply $vm template filter to QVAC and Holepunch selectors for consistent per-node filtering * chore[docs]: tighten registry Grafana dashboard panels based on staging review (#1718) * chore[docs]: tighten registry Grafana dashboard panels based on staging review * chore[docs]: drop redundant Blob Core Contiguous stat, cluster blob panels near the top * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section (#1719) * chore[docs]: promote View Core Replication and Blob Core Bytes to the top of the metrics section * chore[docs]: split View Core Replication into length, contiguous, and gap panels * chore: remove dead blind-peer helpers and fix stale metrics docs - Drop unreferenced getConnectedBlindPeerKeys / getConfiguredBlindPeerKeys / isBlindPeerConnected chain and the _peerConnectionCounts map that only existed to back isBlindPeerConnected. Left over from the dropped blob_core_blind_peers gauge (1de851b). - Fix DEPLOYMENT_GUIDE.md: default metrics port is 9210, not 9090; drop the hypermetrics reference since it is not a dependency (abandoned, incompatible with Hypercore v11) and per-core visibility is provided by the registry_blob_core_* / registry_view_core_* gauges.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Four observability gaps that surfaced on the first staging scrapes after #1689 landed:
_totalsuffix on gauges violates Prometheus / OpenMetrics naming conventions.qvac_registry_models_totalandqvac_registry_blob_cores_totalare gauges (values go up and down). Scrape linters flag them and OpenMetrics parsers can reject them.rate()returnsNaNon fresh dashboards — panels look broken on cold start.qvac_registry_blob_core_*metrics are empty on indexer nodes that don't useQVAC_BLIND_PEER_KEYSmirror replication — the blob core is populated intoblobsCoreslazily (only onaddModelor_setupBlindPeering), so on an indexer that runs the topic-pull flow it staysMap(0)indefinitely.hypercore_total_peersis too coarse (unions all unique UDX streams across every hypercore, including RPC-only clients).How does it solve it?
qvac_registry_models_total→qvac_registry_model_countandqvac_registry_blob_cores_total→qvac_registry_blob_core_count. RPC counters keep_total— they're actual counters.add-model,put-license,update-model-metadata,delete-model,ping) at0in theQvacMetricsconstructor, sorate()returns0from the first scrape._open()whenbase.isIndexer || base.localWriter, soblob_core_peers,blob_core_byte_length,blob_core_fully_downloaded, etc. populate on indexers that use the topic-pull blind-peer flow. Reader-only nodes skip this (theirwritable: truewould create a local core with the wrong key).qvac_registry_view_core_seeders— peers that hold the view core fully and advertiseremoteUploading. View is small (a few MB of autobase metadata), so this converges to connected-peer count within an RTT — no separate raw-peers metric needed.qvac_registry_blob_core_seeders{core_name}— same signal per blob core. Paired with the existingqvac_registry_blob_core_peers, thepeers - seedersgap exposes peers currently downloading vs. serving.DEPLOYMENT_GUIDE.md— metric table, replication-durability alerting guidance, note that_seedersmetrics usep.remoteOpened && p.remoteUploading && p.remoteContiguousLength >= core.lengthso partial/mid-handshake peers aren't counted.0, andview_core_seedersis exported as a single series that is0with no connected peers.Verified with
npm run lint,npm run test:unit(37/37), andnpm run test:integration(30/30, 146/146 asserts).Breaking changes
qvac_registry_models_total→qvac_registry_model_countqvac_registry_blob_cores_total→qvac_registry_blob_core_countAnyone scraping these series must update dashboards / alerts. The in-tree Grafana dashboard is updated in this PR; no other consumers are known.