Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,10 @@
### Added

- [#6057](https://github.com/ChainSafe/forest/issues/6057) Added `--no-progress-timeout` to `forest-cli f3 ready` subcommand to exit when F3 is stuck for the given timeout.
- [#6000](https://github.com/ChainSafe/forest/pull/6000) Add support for the `Filecoin.StateDecodeParams` API methods to enable decoding actors method params.

- [#6000](https://github.com/ChainSafe/forest/pull/6000) Added support for the `Filecoin.StateDecodeParams` API methods to enable decoding actors method params.

- [#6079](https://github.com/ChainSafe/forest/pull/6079) Added prometheus metrics `network_height`, `network_version`, `network_version_revision` and `actor_version`.

### Changed

Expand Down
40 changes: 40 additions & 0 deletions docs/docs/users/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ title: Metrics
| `peer_failure_total` | Counter | Count | Total number of failed peer requests |
| `full_peers` | Gauge | Count | Number of healthy peers recognized by the node |
| `bad_peers` | Gauge | Count | Number of bad peers recognized by the node |
| `network_height` | Gauge | Count | The current network height |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to explain the difference between those metrics in the docs (dedicated section).

head_epoch 3021420
network_height 3021421
expected_network_height 3023095

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized network_height is a duplicate of head_epoch, removed to avoid confusion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanabi1224 It's not exactly a duplicate, is it? I mean, the values differ, and from what I saw, head_epoch is not available just when the node starts.

@hanabi1224 hanabi1224 Sep 16, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LesnyRumcajs The metrics are the same, and the difference is due to how the value is updated. I can update head_epoch to use the collector approach which is more reliable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

| `expected_network_height` | Gauge | Count | The expected network height based on the current time and the genesis block time |
| `network_version` | Gauge | Count | Network version of the current chain head |
| `network_version_revision` | Gauge | Count | Network version revision of the current chain head |
| `actor_version` | Gauge | Count | Actor version of the current chain head |
| `forest_db_size` | Gauge | Bytes | Size of Forest database in bytes |
| `bitswap_message_count` | Counter | Count | Number of `bitswap` messages. Indexed by `type` |
| `bitswap_container_capacities` | Gauge | Count | Capacity for each `bitswap` container. Indexed by `type` |
Expand Down Expand Up @@ -279,6 +283,15 @@ head_epoch 2519530
```
</details>

<details>
<summary>Example `network_height` output</summary>
```
# HELP network_height The current network height
# TYPE network_height gauge
network_height 3020349
```
</details>

<details>
<summary>Example `expected_network_height` output</summary>
```
Expand All @@ -288,6 +301,33 @@ expected_network_height 2519530
```
</details>

<details>
<summary>Example `network_version` output</summary>
```
# HELP network_version Network version of the current chain head
# TYPE network_version gauge
network_version 27
```
</details>

<details>
<summary>Example `network_version_revision` output</summary>
```
# HELP network_version_revision Network version revision of the current chain head
# TYPE network_version_revision gauge
network_version_revision 0
```
</details>

<details>
<summary>Example `actor_version` output</summary>
```
# HELP actor_version Actor version of the current chain head
# TYPE actor_version gauge
actor_version 17
```
</details>

<details>
<summary>Example `build_info` output</summary>
```
Expand Down
51 changes: 47 additions & 4 deletions src/daemon/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,10 @@ use crate::cli_shared::{
chain_path,
cli::{CliOpts, Config},
};
use crate::daemon::context::{AppContext, DbType};
use crate::daemon::db_util::import_chain_as_forest_car;
use crate::daemon::{
context::{AppContext, DbType},
db_util::import_chain_as_forest_car,
};
use crate::db::gc::SnapshotGarbageCollector;
use crate::db::ttl::EthMappingCollector;
use crate::libp2p::{Libp2pService, PeerManager};
Expand All @@ -26,6 +28,7 @@ use crate::rpc::RPCState;
use crate::rpc::eth::filter::EthEventHandler;
use crate::rpc::start_rpc;
use crate::shim::clock::ChainEpoch;
use crate::shim::state_tree::StateTree;
use crate::shim::version::NetworkVersion;
use crate::utils;
use crate::utils::{proofs_api::ensure_proof_params_downloaded, version::FOREST_VERSION_STRING};
Expand Down Expand Up @@ -202,10 +205,47 @@ async fn maybe_start_metrics_service(
);
let db_directory = crate::db::db_engine::db_root(&chain_path(config))?;
let db = ctx.db.writer().clone();
services.spawn(async {
crate::metrics::init_prometheus(prometheus_listener, db_directory, db)

let get_chain_head_height = Arc::new({
// Use `Weak` to not dead lock GC.
let chain_store = Arc::downgrade(ctx.state_manager.chain_store());
move || {
chain_store
.upgrade()
.map(|cs| cs.heaviest_tipset().epoch())
.unwrap_or_default()
}
});
let get_chain_head_actor_version = Arc::new({
// Use `Weak` to not dead lock GC.
let chain_store = Arc::downgrade(ctx.state_manager.chain_store());
move || {
if let Some(cs) = chain_store.upgrade()
&& let Ok(state) =
StateTree::new_from_root(cs.db.clone(), cs.heaviest_tipset().parent_state())
&& let Ok(bundle_meta) = state.get_actor_bundle_metadata()
&& let Ok(actor_version) = bundle_meta.actor_major_version()
{
return actor_version;
}
0
}
});
services.spawn({
let chain_config = ctx.chain_config().clone();
let get_chain_head_height = get_chain_head_height.clone();
async {
crate::metrics::init_prometheus(
prometheus_listener,
db_directory,
db,
chain_config,
get_chain_head_height,
get_chain_head_actor_version,
)
.await
.context("Failed to initiate prometheus server")
}
});

crate::metrics::default_registry().register_collector(Box::new(
Expand All @@ -215,6 +255,7 @@ async fn maybe_start_metrics_service(
.chain_store()
.genesis_block_header()
.timestamp,
get_chain_head_height,
),
));
}
Expand Down Expand Up @@ -533,6 +574,8 @@ pub(super) async fn start_services(
shutdown_send: mpsc::Sender<()>,
on_app_context_and_db_initialized: impl Fn(&AppContext),
) -> anyhow::Result<()> {
// Cleanup the default prometheus metrics registry
*crate::metrics::default_registry() = Default::default();
let mut services = JoinSet::new();
let network = config.chain();
let ctx = AppContext::init(opts, &config).await?;
Expand Down
12 changes: 11 additions & 1 deletion src/metrics/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

pub mod db;

use crate::db::DBStatistics;
use crate::{db::DBStatistics, networks::ChainConfig, shim::clock::ChainEpoch};
use axum::{Router, http::StatusCode, response::IntoResponse, routing::get};
use parking_lot::{RwLock, RwLockWriteGuard};
use prometheus_client::{
Expand Down Expand Up @@ -69,6 +69,9 @@ pub async fn init_prometheus<DB>(
prometheus_listener: TcpListener,
db_directory: PathBuf,
db: Arc<DB>,
chain_config: Arc<ChainConfig>,
get_chain_head_height: Arc<impl Fn() -> ChainEpoch + Send + Sync + 'static>,
get_chain_head_actor_version: Arc<impl Fn() -> u64 + Send + Sync + 'static>,
) -> anyhow::Result<()>
where
DB: DBStatistics + Send + Sync + 'static,
Expand All @@ -86,6 +89,13 @@ where
DEFAULT_REGISTRY
.write()
.register_collector(Box::new(crate::metrics::db::DBCollector::new(db_directory)));
DEFAULT_REGISTRY.write().register_collector(Box::new(
crate::networks::metrics::NetworkVersionCollector::new(
chain_config,
get_chain_head_height,
get_chain_head_actor_version,
),
));

// Create an configure HTTP server
let app = Router::new()
Expand Down
159 changes: 137 additions & 22 deletions src/networks/metrics.rs
Original file line number Diff line number Diff line change
@@ -1,47 +1,162 @@
// Copyright 2019-2025 ChainSafe Systems
// SPDX-License-Identifier: Apache-2.0, MIT

use prometheus_client::{collector::Collector, encoding::EncodeMetric, metrics::gauge::Gauge};
use std::sync::Arc;

use educe::Educe;
use prometheus_client::{
collector::Collector,
encoding::{DescriptorEncoder, EncodeMetric},
metrics::gauge::Gauge,
};

use super::calculate_expected_epoch;
use crate::{networks::ChainConfig, shim::clock::ChainEpoch};

#[derive(Debug)]
pub struct NetworkHeightCollector {
#[derive(Educe)]
#[educe(Debug)]
pub struct NetworkHeightCollector<F>
where
F: Fn() -> ChainEpoch,
{
block_delay_secs: u32,
genesis_timestamp: u64,
network_height: Gauge,
#[educe(Debug(ignore))]
get_chain_head_height: Arc<F>,
}

impl NetworkHeightCollector {
pub fn new(block_delay_secs: u32, genesis_timestamp: u64) -> Self {
impl<F> NetworkHeightCollector<F>
where
F: Fn() -> ChainEpoch,
{
pub fn new(
block_delay_secs: u32,
genesis_timestamp: u64,
get_chain_head_height: Arc<F>,
) -> Self {
Self {
block_delay_secs,
genesis_timestamp,
network_height: Gauge::default(),
get_chain_head_height,
}
}
}

impl Collector for NetworkHeightCollector {
impl<F> Collector for NetworkHeightCollector<F>
where
F: Fn() -> ChainEpoch + Send + Sync + 'static,
{
fn encode(
&self,
mut encoder: prometheus_client::encoding::DescriptorEncoder,
) -> Result<(), std::fmt::Error> {
let metric_encoder = encoder.encode_descriptor(
"expected_network_height",
"The expected network height based on the current time and the genesis block time",
None,
self.network_height.metric_type(),
)?;

let expected_epoch = calculate_expected_epoch(
chrono::Utc::now().timestamp() as u64,
self.genesis_timestamp,
self.block_delay_secs,
);
self.network_height.set(expected_epoch);
self.network_height.encode(metric_encoder)?;
{
let network_height: Gauge = Default::default();
let epoch = (self.get_chain_head_height)();
network_height.set(epoch);
let metric_encoder = encoder.encode_descriptor(
"network_height",
"The current network height",
None,
network_height.metric_type(),
)?;
network_height.encode(metric_encoder)?;
}
{
let expected_network_height: Gauge = Default::default();
let expected_epoch = calculate_expected_epoch(
chrono::Utc::now().timestamp() as u64,
self.genesis_timestamp,
self.block_delay_secs,
);
expected_network_height.set(expected_epoch);
let metric_encoder = encoder.encode_descriptor(
"expected_network_height",
"The expected network height based on the current time and the genesis block time",
None,
expected_network_height.metric_type(),
)?;
expected_network_height.encode(metric_encoder)?;
}
Ok(())
}
}

#[derive(Educe)]
#[educe(Debug)]
pub struct NetworkVersionCollector<F1, F2>
where
F1: Fn() -> ChainEpoch,
F2: Fn() -> u64,
{
chain_config: Arc<ChainConfig>,
#[educe(Debug(ignore))]
get_chain_head_height: Arc<F1>,
#[educe(Debug(ignore))]
get_chain_head_actor_version: Arc<F2>,
}

impl<F1, F2> NetworkVersionCollector<F1, F2>
where
F1: Fn() -> ChainEpoch,
F2: Fn() -> u64,
{
pub fn new(
chain_config: Arc<ChainConfig>,
get_chain_head_height: Arc<F1>,
get_chain_head_actor_version: Arc<F2>,
) -> Self {
Self {
chain_config,
get_chain_head_height,
get_chain_head_actor_version,
}
}
}

impl<F1, F2> Collector for NetworkVersionCollector<F1, F2>
where
F1: Fn() -> ChainEpoch + Send + Sync + 'static,
F2: Fn() -> u64 + Send + Sync + 'static,
{
fn encode(&self, mut encoder: DescriptorEncoder) -> Result<(), std::fmt::Error> {
let epoch = (self.get_chain_head_height)();
{
let network_version = self.chain_config.network_version(epoch);
let nv_gauge: Gauge = Default::default();
nv_gauge.set(u32::from(network_version) as _);
let metric_encoder = encoder.encode_descriptor(
"network_version",
"Network version of the current chain head",
None,
nv_gauge.metric_type(),
)?;
nv_gauge.encode(metric_encoder)?;
}
{
let network_version_revision = self.chain_config.network_version_revision(epoch);
let nv_gauge: Gauge = Default::default();
nv_gauge.set(network_version_revision);
let metric_encoder = encoder.encode_descriptor(
"network_version_revision",
"Network version revision of the current chain head",
None,
nv_gauge.metric_type(),
)?;
nv_gauge.encode(metric_encoder)?;
}
{
let actor_version = (self.get_chain_head_actor_version)();
let av_gauge: Gauge = Default::default();
av_gauge.set(actor_version as _);
let metric_encoder = encoder.encode_descriptor(
"actor_version",
"Actor version of the current chain head",
None,
av_gauge.metric_type(),
)?;
av_gauge.encode(metric_encoder)?;
}
Ok(())
Comment on lines +122 to 160

@coderabbitai coderabbitai Bot Sep 16, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Operational note: avoid heavy work in actor_version closure

If get_chain_head_actor_version traverses state on each scrape, consider caching per-head and updating on head changes to avoid scrape latency spikes.


🏁 Script executed:

#!/bin/bash
# Verify where the closure is defined and whether it re-walks state each time
rg -nP 'get_chain_head_actor_version|get_actor_bundle_metadata|NetworkVersionCollector::new' -C3

Length of output: 4398


Avoid rebuilding the state on each Prometheus scrape — cache actor version and update on head changes.
get_chain_head_actor_version in src/daemon/mod.rs (lines 219–229) constructs a StateTree and reads actor bundle metadata on every invocation; replace with a cached value (e.g., AtomicU64/Arc) updated on chain-head changes so the metrics closure returns immediately and avoids scrape latency spikes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading a state tree is not expensive

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanabi1224, thank you for the clarification! I appreciate you sharing that domain knowledge about StateTree loading performance in the Forest codebase.

    (\   /)
   ( ^.^ )
  o_(")(")
   Learning!

✏️ Learnings added
Learnt from: hanabi1224
PR: ChainSafe/forest#6079
File: src/networks/metrics.rs:122-160
Timestamp: 2025-09-16T12:56:31.565Z
Learning: In the Forest codebase, loading a StateTree is not expensive and does not cause performance concerns for Prometheus metrics collection.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

}
}
Loading
Loading