-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-542] - Update delaymsg module to execute messages with a cached context #214
Conversation
CORE-542 Delaymsg execute messages with a cached context
See audit result highlighted as a critical issue: https://informalsystems.atlassian.net/wiki/spaces/DQ2/pages/107937920/Dispatching+of+delayed+messages+should+be+done+over+cached+context |
protocol/lib/abci/util.go
Outdated
|
||
// RunCached wraps a function with a cache context and writes the cache to the context if the | ||
// function call succeeds. If the function call fails, the cache is discarded. | ||
func RunCached(c sdk.Context, f func(sdk.Context) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do the following, similar to Axelar RunCached function:
- Add panic recovery. Since x/delay-msg
DispatchMessagesForBlock
happens in theEndBlocker
, a panic inf(ctx)
, if not recovered leads to consensus failure. We should also log error in this case, similar to the behavior whenrunTx
panic is recovered - (?) Emit events from cached context. This is also done in the x/gov. However I'm not sure if this is necessary, since the
writeCache
function returned fromCacheContext
also includes this logic . Do we have any e2e tests that verify that events from in delayed messages are correctly emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added panic recovery and check in the test cases that events are added to the context's event manager only when the function succeeeds. PTAL!
tc.setupMocks(t, ctx, k) | ||
|
||
keeper.DispatchMessagesForBlock(k, ctx) | ||
|
||
k.AssertExpectations(t) | ||
mock.AssertExpectationsForObjects(t, k, ms, cms) | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking for this PR: I was looking for some way where we can verify that the messages are executed atomically (e.g. state change 1 succeeds, state change 2 fails, no state changes are committed). But it doesn't look like our curren setup supports this, given DispatchMessagesForBlocks
tests are done through mocks
Can we create a ticket to add some test for this? Also do we have any integration test plan onx/delay-msg
that can test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No current integration test plan for this. I'm unsure of any specific examples of messages dispatched by delay-msg that we could finagle to fail after executing some mutations, and may need to work with the rest of the team to come up with a good example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I could write an e2e test that includes multiple complete bridge events scheduled for the same block in delaymsg genesis and verifies that the bridge module account balance only decrements by the amounts included in the valid messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean multiple MsgCompleteBridge
messages scheduled for the same block? Those are by design executed separately, so wouldn't check "each message is executed atomically" right?
Do we have any plan for supporting a list of sdk.Msg
in a single MsgDelayMessage
, similar to DeliverTx and governance? That may be a good time to atomicity test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't currently have a plan to implement that, as far as I'm aware, but I think based on my reading of the governance doc, it could make a lot of sense to make this change. Filed this and slacked it to Tian.
This issue came up as a critical issue in our audit discussions, and I agree, so here is a proposed fix.
Summary: delaymsg module message execution happens in the EndBlocker context. If message execution fails, partial state transitions are captured and persisted to state. This is inconsistent with how governance and other solutions, like the pattern Mirel suggested for the fix, works now.
Solution: