Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@

### Added

- [#6057](https://github.com/ChainSafe/forest/issues/6057) Added `--no-progress-timeout` to `forest-cli f3 ready` subcommand to exit when F3 is stuck for the given timeout.

### Changed

### Removed
Expand Down
2 changes: 1 addition & 1 deletion scripts/tests/calibnet_export_f3_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ echo "Cleaning up the initial snapshot"
rm --force --verbose ./*.{car,car.zst,sha256sum}

echo "Wait for F3 to sync"
timeout 10m $FOREST_CLI_PATH f3 ready --wait
timeout 10m $FOREST_CLI_PATH f3 ready --wait --no-progress-timeout 5m

echo "Exporting zstd compressed snapshot in v2 format"
$FOREST_CLI_PATH snapshot export --format v2
Expand Down
36 changes: 30 additions & 6 deletions src/cli/subcommands/f3_cmd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
#[cfg(test)]
mod tests;

use std::{borrow::Cow, sync::LazyLock, time::Duration};
use std::{
borrow::Cow,
sync::LazyLock,
time::{Duration, Instant},
};

use crate::{
blocks::{Tipset, TipsetKey},
Expand Down Expand Up @@ -92,13 +96,17 @@ pub enum F3Commands {
#[command(subcommand, name = "powertable", visible_alias = "pt")]
PowerTable(F3PowerTableCommands),
/// Checks if F3 is in sync.
#[group(args = ["no_progress_timeout"], requires = "wait")]
Ready {
/// Wait until F3 is in sync.
#[arg(long)]
wait: bool,
/// The threshold of the epoch gap between chain head and F3 head within which F3 is considered in sync.
#[arg(long, default_value_t = 20)]
threshold: usize,
/// Exit after F3 making no progress for this duration.
#[arg(long, default_value = "10m")]
no_progress_timeout: humantime::Duration,
},

@coderabbitai coderabbitai Bot Sep 9, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Bug: --no-progress-timeout applies even when the flag isn’t passed (due to default_value)

With default_value = "10m", the timeout is always set and the no-progress exit will trigger after 10 minutes even if the user did not pass --no-progress-timeout. The PR objective states the flag should cause exit when provided. Fix by making the arg optional, only enabling the behavior when present, and using a “default when present without value”.

Apply these diffs:

  • Make the arg optional and bind requires directly on the arg (drop the group):
-    #[group(args = ["no_progress_timeout"], requires = "wait")]
+    // requires handled at arg level
     Ready {
@@
-        /// Exit after F3 making no progress for this duration.
-        #[arg(long, default_value = "10m")]
-        no_progress_timeout: humantime::Duration,
+        /// Exit after F3 making no progress for this duration (only when the flag is provided).
+        #[arg(long, requires = "wait", num_args(0..=1), default_missing_value = "10m")]
+        no_progress_timeout: Option<humantime::Duration>,
  • Store an optional duration:
-                let no_progress_timeout_duration: Duration = no_progress_timeout.into();
+                let no_progress_timeout_duration: Option<Duration> =
+                    no_progress_timeout.map(Into::into);
  • Guard the timeout check and use elapsed() (more idiomatic, avoids time arithmetic):
-                    if last_progress + no_progress_timeout_duration < Instant::now() {
-                        eprintln!(
-                            "Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..."
-                        );
-                        std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT);
-                    }
+                    if let Some(timeout) = no_progress_timeout_duration {
+                        if last_progress.elapsed() >= timeout {
+                            pb.finish();
+                            eprintln!(
+                                "Warning: F3 made no progress in the past {}. Exiting...",
+                                humantime::format_duration(timeout)
+                            );
+                            std::process::exit(EXIT_CODE_F3_NO_PROGRESS_TIMEOUT);
+                        }
+                    }

To confirm behavior, after the change please verify:

  • forest-cli f3 ready --wait waits indefinitely without exiting on stall.
  • forest-cli f3 ready --wait --no-progress-timeout uses a default 10m and exits on stall.
  • forest-cli f3 ready --wait --no-progress-timeout 30s exits after ~30s stall.

Also applies to: 167-171, 220-225


Make --no-progress-timeout truly optional and only active when passed
default_value = "10m" means the 10-minute timeout always applies. Refactor in src/cli/subcommands/f3_cmd.rs (lines 99–110, 167–171, 220–225):

  • Drop the #[group(...)]; add on the flag itself
  • Change
    #[arg(long, default_value = "10m")] no_progress_timeout: humantime::Duration,
    to
    #[arg(long, requires = "wait", num_args(0..=1), default_missing_value = "10m")] no_progress_timeout: Option<humantime::Duration>,
  • Map to Option<Duration> and wrap your timeout check in
    if let Some(timeout) = no_progress_timeout.map(Into::into) {
        if last_progress.elapsed() >= timeout {
            /* exit */
        }
    }

This ensures --no-progress-timeout only takes effect when provided.

🤖 Prompt for AI Agents
In src/cli/subcommands/f3_cmd.rs around lines 99–110 (and update uses at 167–171
and 220–225), the current #[group(...)] and default_value = "10m" make the
10-minute no-progress timeout always active; instead remove the #[group(...)]
and apply the attributes on the flag itself, change the field to an optional
humantime::Duration (use #[arg(long, requires = "wait", num_args(0..=1),
default_missing_value = "10m")] no_progress_timeout:
Option<humantime::Duration>), update all code that consumes it to map
Option<humantime::Duration> into Option<std::time::Duration> (e.g.,
no_progress_timeout.map(Into::into)) and wrap the timeout check in if let
Some(timeout) = ... { if last_progress.elapsed() >= timeout { /* exit */ } } so
the timeout only takes effect when the flag is provided.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the group usage. Always having a 10m timeout is fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanabi1224 Understood! Having a default 10-minute timeout makes sense from a fail-fast perspective - it prevents the command from hanging indefinitely even when users don't explicitly specify the timeout flag. This aligns well with preventing operational issues in automated scenarios.


✏️ Learnings added
Learnt from: hanabi1224
PR: ChainSafe/forest#6057
File: src/cli/subcommands/f3_cmd.rs:0-0
Timestamp: 2025-09-09T10:37:17.905Z
Learning: hanabi1224 prefers having default timeouts (like 10m for --no-progress-timeout) to prevent commands from hanging indefinitely, even when the timeout flag isn't explicitly provided by users. This fail-fast approach is preferred over requiring explicit flag usage.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.

}

Expand Down Expand Up @@ -128,7 +136,11 @@ impl F3Commands {
}
Self::Certs(cmd) => cmd.run(client).await,
Self::PowerTable(cmd) => cmd.run(client).await,
Self::Ready { wait, threshold } => {
Self::Ready {
wait,
threshold,
no_progress_timeout,
} => {
let is_running = client.call(F3IsRunning::request(())?).await?;
if !is_running {
anyhow::bail!("F3 is not running");
Expand All @@ -148,16 +160,21 @@ impl F3Commands {
);
pb.enable_steady_tick(std::time::Duration::from_millis(100));
let mut num_consecutive_fetch_failtures = 0;
let no_progress_timeout_duration: Duration = no_progress_timeout.into();
let mut interval = tokio::time::interval(Duration::from_secs(1));
let mut last_f3_head_epoch = 0;
let mut last_progress = Instant::now();
loop {
interval.tick().await;
match get_heads(&client).await {
Ok((chain_head, cert_head)) => {
num_consecutive_fetch_failtures = 0;
if cert_head
.chain_head()
.epoch
.saturating_add(threshold.try_into()?)
let f3_head_epoch = cert_head.chain_head().epoch;
if f3_head_epoch != last_f3_head_epoch {
last_f3_head_epoch = f3_head_epoch;
last_progress = Instant::now();
}
if f3_head_epoch.saturating_add(threshold.try_into()?)
>= chain_head.epoch()
{
let text = format!(
Expand Down Expand Up @@ -195,6 +212,13 @@ impl F3Commands {
}
}
}

if last_progress + no_progress_timeout_duration < Instant::now() {
eprintln!(
"Warning: F3 made no progress in the past {no_progress_timeout}. Exiting..."
);
std::process::exit(3);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit code 1 and 2 have been used above

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use some constants to easily indicate the exit code cause.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
}
Ok(())
}
Expand Down
Loading