op-node: Switch channel bank and channel in reader to a pull based stage#3596
op-node: Switch channel bank and channel in reader to a pull based stage#3596mslipper merged 5 commits intojg/l1retrievalfrom
Conversation
|
|
This PR changes implementation code, but doesn't include a changeset. Did you forget to add one? |
protolambda
left a comment
There was a problem hiding this comment.
LGTM. Left some comments about the interesting bits (to me), but nothing blocking merge, nice work 👍
| } else if err != nil && errors.Is(err, derive.NotEnoughData) { | ||
| stepAttempts = 0 // don't do a backoff for this error | ||
| reqStep() | ||
| continue |
There was a problem hiding this comment.
This is quite subtle, but preserves one of the best things from the old design I think was most undervalued in redesign work.
While the pipeline is deriving, a stage can choose to stop and return this error, handing back control to the caller (driver). This keeps the derivation pipeline nice and synchronous for Cannon usage, while handling cases where there's a slow background fetching thing or something requiring repeated smaller updates (like here with the channels). And thus we can put a short and strict timeout on the pipeline work, while keeping the whole thing snappy and simple.
The alternative would be to make the pipeline blocking and essentially halt until we pull either a batch or an EOF, but then it would hold the Engine control captive unless we decouple that further (and deal with queuing and async stuff there without breaking Cannon compatibility), which I think isn't worth it (but maybe with a good design can be done eventually at a later point, since it's not a breaking protocol change).
cc @norswap this is related to thing we started to discuss.
There was a problem hiding this comment.
This error was mainly introduced to simplify some of the code (so it doesn't need to do a loop inside a function). Honestly, I'm really skeptical about the benefit of yielding back to the caller - it tends to introduce a lot of extra internal state to be able to resume and I'm not sure what the benefit of this is. Having a block pipeline is really simple and easy to understand. In addition, because the fetcher is just an interface, it's really easy to sub out for cannon.
There was a problem hiding this comment.
The pipeline blocking means we can't interact with the engine from the driver without knowing if we interfere with the engine state or if things change while we use the engine, handing back control to the caller to not lock the engine is important IMHO. Agreed on the blocking being easier, but I don't believe we should block on something too long when it wraps a resource as important as the engine.
This again requires a fair amount of changes to channel_in_reader.go for the channel in reader to maintain its progress state.
Like the rest of the changes, this also required modifications to the next stage - the batch queue in order for it to manage the progress API. This commit required even more changes than usual. I changed the pipeline to be reset to a common starting point and now use the L2SafeHead block to filter out adding batches & L1 blocks to the batch queue.
The attributes queue actually had pretty few modifications to work with the progress API. The logic of switching the batch queue over was a bit more complex because the batch queue is very stateful, but still not the worst.
2c31590 to
94958ab
Compare
The progress API is very nearly removed from the engine queue stage.
op-node: Switch batch queue to be pull based
Description
This is two commits because the intermediate commit was failing a test & the second commit fixes it.
op-node: Switch channel bank to be pull based
This again requires a fair amount of changes to channel_in_reader.go
for the channel in reader to maintain its progress state.
op-node: Switch channel in reader to a pull based stage
Like the rest of the changes, this also required modifications to
the next stage - the batch queue in order for it to manage the
progress API. This commit required even more changes than usual.
I changed the pipeline to be reset to a common starting point
and now use the L2SafeHead block to filter out adding batches &
L1 blocks to the batch queue.
Metadata