refactor: extract flashblock timing code and test it#380
Conversation
e7f4fe1 to
98fe236
Compare
|
we can remove --flashblocks.build-at-interval-end --flashblocks.leeway-time and --flashblocks.fixed flags to simplify the logic |
| // lag=300-500ms: 3-4 flashblocks each | ||
| // lag=600-800ms: 2-3 flashblocks each | ||
| // Total = 42 flashblocks across all 9 blocks. | ||
| assert_eq!(42, flashblocks.len()); |
There was a problem hiding this comment.
What exactly changed that made the total flashblocks jump from 34 to 42?
There was a problem hiding this comment.
This old test used leeway_time to calculate the deadline and I think I ported the calculation using that field slightly incorrectly.
With the deprecation of leeway_time though I added the equivalent test using the end buffer: progressive_lag_reduces_flashblocks. It also has assertions for the number of flashblocks for each block so we'll be able to catch this kind of drift more precisely in the future.
98fe236 to
a0220d7
Compare
Done |
a0220d7 to
45d7222
Compare
| }; | ||
| } | ||
|
|
||
| // FLASHBLOCK TIMING SCENARIOS |
There was a problem hiding this comment.
could we keep this visualisation in the code or docs? I think its easier to understand when visualised like this
15a22f9 to
fec8153
Compare
fec8153 to
ab15ea4
Compare
ab15ea4 to
c2cb471
Compare
| let reference_system = std::time::SystemTime::now(); | ||
| let reference_instant = tokio::time::Instant::now(); | ||
|
|
||
| let target_flashblocks = (block_time.as_millis() / config.interval.as_millis()) as u64; |
There was a problem hiding this comment.
Is this another breaking change? The old logic (on the old and discontinued config flashblocks.fixed = false) uses the dynamic remaining time/time drift (which is the minimum of the time diff from fcu arrival vs block time) instead.
There was a problem hiding this comment.
This PR is breaking overall since it remove some flags, deprecating old timing calculations. I'm not sure what the rest of this comment is referring to since this PR only allows for dynamically adjusting flashblock timing based on the remaining time
There was a problem hiding this comment.
I see, thanks for clarifying and i can see better now the intent of this PR. When merging from upstream we didnt think too much into this commit since it was titled as a refactor, however on stability testings and deeper digging did we find that this fix is buggy and backwards breaking
| } | ||
|
|
||
| /// Calculates when the first flashblock should be triggered. | ||
| fn calculate_first_flashblock_offset(remaining_time: Duration, interval: Duration) -> Duration { |
There was a problem hiding this comment.
Can we rename this to calculate_first_flashblock_timing? Since we dont apply the offset here which can be misleading
There was a problem hiding this comment.
The function returns an offset, which is applied later to create the timing
There was a problem hiding this comment.
The configurable offset is only applied to the first and last flashblock with your new PR.
| let first_flashblock_offset = apply_offset(first_flashblock_offset, send_offset_ms); | ||
| let flashblocks_deadline = apply_offset( | ||
| remaining_time.saturating_sub(Duration::from_millis(end_buffer_ms)), | ||
| send_offset_ms, | ||
| ); |
There was a problem hiding this comment.
I think theres some issue with calculating send offset like this? Send offset should be applied to flashblocks intervals, which should be applied to the flashblock interval timing.
I think the logic here is incorrect since offsets are only applied on first and last flashblock send timing
There was a problem hiding this comment.
I don't understand what you're trying to say. Send offset should be applied to flashblock send times, not to the interval.
The other configurable offset is the end one which applies to the end.
There was a problem hiding this comment.
At the intervals calculated, we dont just start the next flashblock index building but we also close and the current flashblock. the interval is now only applied to the first and last interval.
| _ = &mut deadline_sleep => { | ||
| // Deadline reached (with leeway applied to end). Cancel current payload building job | ||
| fb_cancel.cancel(); | ||
| if tx.send(block_cancel.child_token()).is_err() { | ||
| error!( | ||
| target: "payload_builder", | ||
| "Did not trigger next flashblock build due to payload building error or block building being cancelled", | ||
| ); | ||
| } | ||
| return; | ||
| } | ||
| _ = block_cancel.cancelled() => { | ||
| drop(tx); | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
I believe that there is a potential breaking change here compared to the new refatored FlashblockScheduler::run. Shen testing this refactoring change (on latest main b63a563), we noted that during high chain load (full blocks), we could potentially skip resolving the final flashblock if the FCU arrives too early - example log:
2026-02-13T08:25:11.726785Z INFO payload_builder: Flashblock built event="flashblock_built" id=0x03398c8aabb5519d flashblock_index=2 current_gas=79900707 current_da=227900 target_flashblocks=5
2026-02-13T08:25:11.727545Z DEBUG flashblocks-p2p: broadcasted message to 0 peers
2026-02-13T08:25:11.802570Z INFO payload_builder: Building flashblock block_number=8594146 flashblock_index=3 target_gas=120000000 gas_used=79900707 da_used=227900 block_gas_used=200000000 target_da_footprint=120000000
2026-02-13T08:25:11.887827Z INFO payload_builder: event="flashblock_sent" Sending flashblock to rollup-boost id=0x03398c8aabb5519d index=3 base=false
2026-02-13T08:25:11.888639Z DEBUG flashblocks-p2p: received message to broadcast on protocol /flashblocks/1.0.0
2026-02-13T08:25:11.888863Z INFO payload_builder: Flashblock built event="flashblock_built" id=0x03398c8aabb5519d flashblock_index=3 current_gas=119915245 current_da=342100 target_flashblocks=5
2026-02-13T08:25:11.890015Z DEBUG flashblocks-p2p: broadcasted message to 0 peers
2026-02-13T08:25:11.976273Z INFO op_rbuilder::builders::generator: Resolve kind Earliest
2026-02-13T08:25:11.976314Z WARN payload_builder: Missing flashblocks because the payload building job was cancelled too early id=0x03398c8aabb5519d missed_count=2 target_flashblocks=5
There was a problem hiding this comment.
This was happening before too, but the refactoring and this log message actually captures it now
There was a problem hiding this comment.
This definitely wasnt happening before, because the old logic uses a div ceil and starts payload building immediately.
However, the new logic waits till the first flashblock offset before starting the first flashblock building.
| let target_flashblocks = self.send_times.len(); | ||
| for (i, send_time) in self.send_times.into_iter().enumerate() { | ||
| tokio::select! { | ||
| _ = tokio::time::sleep_until(send_time) => { |
There was a problem hiding this comment.
The bug is located here wrt this comment - https://github.com/flashbots/op-rbuilder/pull/380/changes/BASE..c2cb47177d6e8d74fb40aec1884325f6f1064e2f#r2826086602.
We need to trigger the first flashblock building immediately, however the first flashblock building is only triggered on the first flashblock building time.
There was a problem hiding this comment.
If you want to trigger the first flashblock immediately, you can set the send offset flag to an appropriate value (e.g. maybe send offset = -190 while flashblock interval = 200). Though I'm not sure why you'd want to do that since then flashblocks are going to be skewed to the front of the slot.
There was a problem hiding this comment.
This is definitely not our intention. The old logic allows for a fixed interval flashblock building where we definitively stuck to fixed slots, as well as a dynamic approach - which is our use case.
to be explicit, dynamic approach means a div ceil from the remaining time (whenever fcu arrives up till protocol block time) with the configured flashblock interval.
however, your new fixes no longer allows this, and the builder waits idle until the first flashblock index, and thus wasting precious sequencing time.
| if !build_at_interval_end && tx.send(fb_cancel.clone()).is_err() { | ||
| error!( | ||
| target: "payload_builder", | ||
| "Did not trigger first flashblock build due to payload building error or block building being cancelled"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Bug is located here - we do not trigger the first flashblock building immediately once flashblocks building begins inside the scheduler.
There was a problem hiding this comment.
This is not a bug, this is intended behavior as we're deprecating the build_at_interval_end flag
| if ctx.flashblock_index() > ctx.target_flashblock_count() { | ||
| self.record_flashblocks_metrics(&ctx, &info, flashblocks_per_block, &span); | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
We should maintain this old logic? The only reason unit tests are passing now is because this refactoring fix contains a bug, that actually only starts building the first flashblock payload on the first offset timing (instead of minting the first flashblock on the first timing)
There was a problem hiding this comment.
Again, not a bug. From the PR description:
This also deprecates the following flags
- --flashblocks.fixed -- we only support dynamics timing calculations
- --flashblocks.build-at-interval-end -- always choose to build at the end
- --flashblocks.leeway-time -- only was used when building at the start
📝 Summary
Extracts flashblock timing logic from payload.rs into a dedicated timing.rs module. Consolidates duplicate timing calculation code paths into pure functions with clear branching.
Also precomputes the times at which we trigger flashblock builds. This allows for thorough unit testing and decouples the trigger loop from the timing calculations. So the tests don't need any kind of mocking or interaction with tokio.
In addition to all the unit tests for computing flashblock send times I also manually tested locally using builder playground.
This also deprecates the following flags
--flashblocks.fixed-- we only support dynamics timing calculations--flashblocks.build-at-interval-end-- always choose to build at the end--flashblocks.leeway-time-- only was used when building at the start💡 Motivation and Context
The flashblock timing code is currently tangled with less related code in payload.rs. There are also separate code paths for build_at_interval_end and fixed/dynamic mode, leading to duplicated logic. This PR addresses these quality concerns.
As we're working on making flashblocks more reliable, we need to have code clear, isolated, and testable enough so we can make changes quickly.
✅ I have completed the following steps:
make lintmake test