fix(test): prevent TestDailyFees from overshooting second proving window#13551
fix(test): prevent TestDailyFees from overshooting second proving window#13551wjmelements wants to merge 2 commits intomasterfrom
Conversation
Replace the fixed head+WPoStChallengeWindow+10 wait with a targeted wait to dlInfo.Close+5. The old approach could advance past the close of a sector's second proving window (making paymentsPast > 2) when the chain head was already near the end of the proving period after feePostWg.Wait(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
See also #12973 (comment) |
| dlInfo, err := client.StateMinerProvingDeadline(ctx, mminer.ActorAddr, types.EmptyTSK) | ||
| req.NoError(err) | ||
| client.WaitTillChain(ctx, kit.HeightAtLeast(head.Height()+miner.WPoStChallengeWindow()+10)) | ||
| client.WaitTillChain(ctx, kit.HeightAtLeast(dlInfo.Close+5)) |
There was a problem hiding this comment.
I wouldn't block on this but there is still a race here I think as is the sad story for many lotus itests. We really need a way to halt mining when querying state to actually get rid of them. WaitTillChain is used everywhere but is inherently racy.
For example if post lands on last epoch of the deadline and your call to grab dlInfo only executes an epoch later when this first proving period is over then you again lose the race and are waiting until after the 2nd proving period.
But forgetting about the race which is a systemic problem this is a cleaner way to express what we are doing so I support merging this.
There was a problem hiding this comment.
We should use dlInfo.Close+1 since its the first epoch the condition should be met on and we are now being more exact.
There was a problem hiding this comment.
I saw a failure when I did dlInfo.Close+1. Claude explains:
The most likely reason: in Filecoin, the deadline cron — which actually burns the fees — runs as part of the block execution AT dlInfo.Close. But in the test's simulated chain, there can be null rounds around the close epoch. If there's a null round at dlInfo.Close, no block is produced at that exact epoch, and the cron gets deferred to the first block produced after it
So even with +5 we might still see some failures. I just haven't seen any yet.
There was a problem hiding this comment.
I think the problem is slightly more subtle than that, null round crons are run before msgs anyway so there's an effective catch-up in place but they do contribute to the problem here. I'm pretty sure just using dlInfo.Close would be enough because the miner is enrolled at close - 1 but the problem is all our APIs use ParentState on the tipset you call them with, so with null rounds you're reaching into the parent before the null rounds. Cron's run, but you're still going back into history.
A properly robust way to deal with this might be to wait for 1 non-null, then wait for 1 more after that so ParentState is always a post-close state:
ts := client.WaitTillChain(ctx, kit.HeightAtLeast(dlInfo.Close)) // or use +1, whatever
client.WaitTillChain(ctx, kit.HeightAtLeast(ts.Height()+1))Pretty janky, might be something we could put into the test kit though because this kind of problem isn't atypical and the solution might apply elsewhere.
There was a problem hiding this comment.
client.WaitTillStateIncludes(condition)
Summary
TestDailyFeesfailure where the post-PoST wait could advance the chain past a sector's second proving window close, causingpaymentsPast > 1and a mismatch incheckFeeRecordshead.Height()+WPoStChallengeWindow+10withdlInfo.Close+5— a targeted wait to just after the current deadline window closes, which is always within the current proving periodRoot cause
After
feePostWg.Wait(), the chain head could already be near the end of the proving period (e.g. epoch 5890 with the period ending at 5960). Adding a fullWPoStChallengeWindow+10(70 epochs) pushed past epoch 5960 — the close of deadline 0 in the second period — makingpaymentsPast = 2for those sectors and causingcheckFeeRecordsto fail.dlInfo.Close+5advances at mostWPoStChallengeWindow+5 = 65epochs from within the current window, which is always less than the 60-epoch gap before deadline 0's second window closes.