fix(test): prevent TestDailyFees from overshooting second proving window by wjmelements · Pull Request #13551 · filecoin-project/lotus

wjmelements · 2026-03-17T00:54:39Z

Summary

Fixes a flaky TestDailyFees failure where the post-PoST wait could advance the chain past a sector's second proving window close, causing paymentsPast > 1 and a mismatch in checkFeeRecords
Replaces head.Height()+WPoStChallengeWindow+10 with dlInfo.Close+5 — a targeted wait to just after the current deadline window closes, which is always within the current proving period
Verified with 10 consecutive local passes

Root cause

After feePostWg.Wait(), the chain head could already be near the end of the proving period (e.g. epoch 5890 with the period ending at 5960). Adding a full WPoStChallengeWindow+10 (70 epochs) pushed past epoch 5960 — the close of deadline 0 in the second period — making paymentsPast = 2 for those sectors and causing checkFeeRecords to fail.

dlInfo.Close+5 advances at most WPoStChallengeWindow+5 = 65 epochs from within the current window, which is always less than the 60-epoch gap before deadline 0's second window closes.

Replace the fixed head+WPoStChallengeWindow+10 wait with a targeted wait to dlInfo.Close+5. The old approach could advance past the close of a sector's second proving window (making paymentsPast > 2) when the chain head was already near the end of the proving period after feePostWg.Wait(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjmelements · 2026-03-17T01:32:56Z

See also #12973 (comment)

ZenGround0 · 2026-03-19T19:29:48Z

+	dlInfo, err := client.StateMinerProvingDeadline(ctx, mminer.ActorAddr, types.EmptyTSK)
 	req.NoError(err)
-	client.WaitTillChain(ctx, kit.HeightAtLeast(head.Height()+miner.WPoStChallengeWindow()+10))
+	client.WaitTillChain(ctx, kit.HeightAtLeast(dlInfo.Close+5))


I wouldn't block on this but there is still a race here I think as is the sad story for many lotus itests. We really need a way to halt mining when querying state to actually get rid of them. WaitTillChain is used everywhere but is inherently racy.

For example if post lands on last epoch of the deadline and your call to grab dlInfo only executes an epoch later when this first proving period is over then you again lose the race and are waiting until after the 2nd proving period.

But forgetting about the race which is a systemic problem this is a cleaner way to express what we are doing so I support merging this.

We should use dlInfo.Close+1 since its the first epoch the condition should be met on and we are now being more exact.

I saw a failure when I did dlInfo.Close+1. Claude explains:

The most likely reason: in Filecoin, the deadline cron — which actually burns the fees — runs as part of the block execution AT dlInfo.Close. But in the test's simulated chain, there can be null rounds around the close epoch. If there's a null round at dlInfo.Close, no block is produced at that exact epoch, and the cron gets deferred to the first block produced after it

So even with +5 we might still see some failures. I just haven't seen any yet.

I think the problem is slightly more subtle than that, null round crons are run before msgs anyway so there's an effective catch-up in place but they do contribute to the problem here. I'm pretty sure just using dlInfo.Close would be enough because the miner is enrolled at close - 1 but the problem is all our APIs use ParentState on the tipset you call them with, so with null rounds you're reaching into the parent before the null rounds. Cron's run, but you're still going back into history.

A properly robust way to deal with this might be to wait for 1 non-null, then wait for 1 more after that so ParentState is always a post-close state:

ts := client.WaitTillChain(ctx, kit.HeightAtLeast(dlInfo.Close)) // or use +1, whatever client.WaitTillChain(ctx, kit.HeightAtLeast(ts.Height()+1))

Pretty janky, might be something we could put into the test kit though because this kind of problem isn't atypical and the solution might apply elsewhere.

client.WaitTillStateIncludes(condition)

github-project-automation Bot added this to FilOz Mar 17, 2026

github-project-automation Bot moved this to 📌 Triage in FilOz Mar 17, 2026

docs(changelog): add TestDailyFees flaky test fix entry

d9f47a2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

wjmelements requested review from rjan90 and rvagg and removed request for rjan90 March 17, 2026 01:21

wjmelements mentioned this pull request Mar 17, 2026

fix(basefee)!: add fip-0115 feature flag (LOTUS_FEES_FIP0115HEIGHT) #13550

Merged

BigLep assigned wjmelements Mar 17, 2026

ZenGround0 approved these changes Mar 19, 2026

View reviewed changes

github-project-automation Bot moved this from 📌 Triage to ✔️ Approved by reviewer in FilOz Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(test): prevent TestDailyFees from overshooting second proving window#13551

fix(test): prevent TestDailyFees from overshooting second proving window#13551
wjmelements wants to merge 2 commits intomasterfrom
fix/daily-fees-flaky-test

wjmelements commented Mar 17, 2026

Uh oh!

wjmelements commented Mar 17, 2026

Uh oh!

ZenGround0 Mar 19, 2026

Uh oh!

ZenGround0 Mar 19, 2026

Uh oh!

wjmelements Mar 19, 2026

Uh oh!

rvagg Mar 20, 2026

Uh oh!

rvagg Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wjmelements commented Mar 17, 2026

Summary

Root cause

Uh oh!

wjmelements commented Mar 17, 2026

Uh oh!

ZenGround0 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ZenGround0 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

wjmelements Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

rvagg Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

rvagg Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants