Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-542] - Update delaymsg module to execute messages with a cached context #214

Merged
merged 3 commits into from
Sep 13, 2023

Conversation

clemire
Copy link
Contributor

@clemire clemire commented Sep 8, 2023

This issue came up as a critical issue in our audit discussions, and I agree, so here is a proposed fix.

Summary: delaymsg module message execution happens in the EndBlocker context. If message execution fails, partial state transitions are captured and persisted to state. This is inconsistent with how governance and other solutions, like the pattern Mirel suggested for the fix, works now.

Solution:

  • construct a cachedCtx, execute each message independently with it's own cached ctx, and write the context to state if the message succeeds.

@clemire clemire changed the title [CORE-542] - Update module to execute messages with a cached context [CORE-542] - Update delaymsg module to execute messages with a cached context Sep 8, 2023

// RunCached wraps a function with a cache context and writes the cache to the context if the
// function call succeeds. If the function call fails, the cache is discarded.
func RunCached(c sdk.Context, f func(sdk.Context) error) error {
Copy link
Contributor

@teddyding teddyding Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also do the following, similar to Axelar RunCached function:

  • Add panic recovery. Since x/delay-msg DispatchMessagesForBlock happens in the EndBlocker, a panic in f(ctx), if not recovered leads to consensus failure. We should also log error in this case, similar to the behavior when runTx panic is recovered
  • (?) Emit events from cached context. This is also done in the x/gov. However I'm not sure if this is necessary, since the writeCache function returned from CacheContext also includes this logic . Do we have any e2e tests that verify that events from in delayed messages are correctly emitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added panic recovery and check in the test cases that events are added to the context's event manager only when the function succeeeds. PTAL!

tc.setupMocks(t, ctx, k)

keeper.DispatchMessagesForBlock(k, ctx)

k.AssertExpectations(t)
mock.AssertExpectationsForObjects(t, k, ms, cms)
})
}
}
Copy link
Contributor

@teddyding teddyding Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking for this PR: I was looking for some way where we can verify that the messages are executed atomically (e.g. state change 1 succeeds, state change 2 fails, no state changes are committed). But it doesn't look like our curren setup supports this, given DispatchMessagesForBlocks tests are done through mocks

Can we create a ticket to add some test for this? Also do we have any integration test plan onx/delay-msg that can test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@clemire clemire Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No current integration test plan for this. I'm unsure of any specific examples of messages dispatched by delay-msg that we could finagle to fail after executing some mutations, and may need to work with the rest of the team to come up with a good example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I could write an e2e test that includes multiple complete bridge events scheduled for the same block in delaymsg genesis and verifies that the bridge module account balance only decrements by the amounts included in the valid messages?

Copy link
Contributor

@teddyding teddyding Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean multiple MsgCompleteBridge messages scheduled for the same block? Those are by design executed separately, so wouldn't check "each message is executed atomically" right?

Do we have any plan for supporting a list of sdk.Msg in a single MsgDelayMessage, similar to DeliverTx and governance? That may be a good time to atomicity test.

Copy link
Contributor Author

@clemire clemire Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't currently have a plan to implement that, as far as I'm aware, but I think based on my reading of the governance doc, it could make a lot of sense to make this change. Filed this and slacked it to Tian.

@clemire clemire merged commit 1befa27 into main Sep 13, 2023
9 checks passed
@clemire clemire deleted the crystal/CORE-542-delaymsg-with-cached-ctx branch September 13, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

2 participants