-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: allow for hardcoded upgrade schedules #2583
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2583 +/- ##
==========================================
+ Coverage 20.61% 21.09% +0.47%
==========================================
Files 132 138 +6
Lines 15338 15767 +429
==========================================
+ Hits 3162 3326 +164
- Misses 11872 12117 +245
- Partials 304 324 +20
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this approach and its kinda nice to have a version of signalling from the get go. Have a few questions/discussions for the first round, and I'll continue to review after chewing on it a bit more.
since we're incrementing the protocol version during DeliverTx instead of endblock, will it be weird that we're executing txs for a height using the incremented version, depsite the version for that header not yet being incremented? I could be missing something, and I know that deferred execution will sort of cover this up, but just seemed like a departure from the current system. Before we always used the app version in the header in every scenario, now we use the app version in the header in every scenario, except for the first height of an upgrade.
where do migrations occur? should these also occur during DeliverTx? If so, are we running into the same migration issue that we discussed earlier, where process proposal is running on a potentially different version of the state than when the txs are executed?
app/deliver_tx.go
Outdated
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | ||
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | ||
if err == nil { | ||
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok { | ||
if !IsSupported(appVersion) { | ||
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | ||
} | ||
app.SetProtocolVersion(appVersion) | ||
// TODO: we may want to emit an event for this | ||
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | ||
} | ||
} | ||
return app.BaseApp.DeliverTx(req) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since v1 nodes won't be running this, what will they do with this tx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're setting the protocol version here, I think we have to be extra careful to never use the app.Version
method before this in process or prepare. we might even want to remove it. The reason being that we don't want different versions used for different abci methods during the same height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since v1 nodes won't be running this, what will they do with this tx?
They will just error and skip over it. In the following block they will panic (not sure if in cometbft or in app) because of the app version difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[non-blocking] I think de-indenting and following an exit early strategy makes the code easier to follow and easier to test. Additionally I think it reduces the chances of future changes causing bugs.
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | |
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | |
if err == nil { | |
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok { | |
if !IsSupported(appVersion) { | |
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | |
} | |
app.SetProtocolVersion(appVersion) | |
// TODO: we may want to emit an event for this | |
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | |
} | |
} | |
return app.BaseApp.DeliverTx(req) | |
} | |
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | |
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | |
if err != nil { | |
return app.BaseApp.DeliverTx(req) | |
} | |
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); !ok { | |
return app.BaseApp.DeliverTx(req) | |
} | |
if !IsSupported(appVersion) { | |
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | |
} | |
app.SetProtocolVersion(appVersion) | |
// TODO: we may want to emit an event for this | |
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | |
} |
Now it's super clear that there are 4 test cases that need to be created.
err != nil
fromapp.txConfig.TxDecoder()(req.Tx)
!ok
fromupgrade.IsUpgradeMsg(sdkTx.GetMsgs())
!IsSupported(appVersion)
- happy case
func(b *baseapp.BaseApp) { | ||
b.SetProtocolVersion(appconsts.LatestVersion) | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that this is in init, will it still get called after rebooting? in the past we've had bugs where the chain-id was not set after rebooting because its supposed to be set during init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[proposal for a future PR, no change needed]
perhaps we should just make this a stateful value? its a one off benefit as it would only apply to v1, however it would guarantee nodes running v1 will hit an apphash error instead of ignoring txs that they don't know about. We'll probably hit one of those anyway, but this just guarantees it asap. The other benefit being that we never have to think about loading a consensus critical value into the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should just make this a stateful value?
I agree. This is what later versions of the SDK have. Maybe we can add it in v2 as a migration
if appVersion, ok := upgrade.IsUpgradeMsg(msgs); ok { | ||
if idx != 0 { | ||
logInvalidPropBlock(app.Logger(), req.Header, fmt.Sprintf("upgrade message %d is not the first transaction", idx)) | ||
return reject() | ||
} | ||
|
||
if !IsSupported(appVersion) { | ||
logInvalidPropBlock(app.Logger(), req.Header, fmt.Sprintf("block proposes an unsupported app version %d", appVersion)) | ||
return reject() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a scenario where nodes running v1 that don't know how to parse this msg or have this logic vote for the proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about this. This was why I chose to wrap it as a tx rather than having it unparseable which would then be accepted. As a tx, it would fail the ante handler because there is no signature attached to it. Ideally I can write a test to check it out.
app/deliver_tx.go
Outdated
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | ||
if err == nil { | ||
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok { | ||
if !IsSupported(appVersion) { | ||
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | ||
} | ||
app.SetProtocolVersion(appVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is meant to be done as the first transaction, should we just do this in BeginBlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be more complex as we would need to track all the different proposed blocks and match them with the one that eventually gets committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be easier in finalize block because it can reside as part of the preBlock
call back function
app/deliver_tx.go
Outdated
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | ||
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | ||
if err == nil { | ||
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok { | ||
if !IsSupported(appVersion) { | ||
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | ||
} | ||
app.SetProtocolVersion(appVersion) | ||
// TODO: we may want to emit an event for this | ||
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | ||
} | ||
} | ||
return app.BaseApp.DeliverTx(req) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[non-blocking] I think de-indenting and following an exit early strategy makes the code easier to follow and easier to test. Additionally I think it reduces the chances of future changes causing bugs.
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | |
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | |
if err == nil { | |
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok { | |
if !IsSupported(appVersion) { | |
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | |
} | |
app.SetProtocolVersion(appVersion) | |
// TODO: we may want to emit an event for this | |
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | |
} | |
} | |
return app.BaseApp.DeliverTx(req) | |
} | |
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx { | |
sdkTx, err := app.txConfig.TxDecoder()(req.Tx) | |
if err != nil { | |
return app.BaseApp.DeliverTx(req) | |
} | |
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); !ok { | |
return app.BaseApp.DeliverTx(req) | |
} | |
if !IsSupported(appVersion) { | |
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion)) | |
} | |
app.SetProtocolVersion(appVersion) | |
// TODO: we may want to emit an event for this | |
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK} | |
} |
Now it's super clear that there are 4 test cases that need to be created.
err != nil
fromapp.txConfig.TxDecoder()(req.Tx)
!ok
fromupgrade.IsUpgradeMsg(sdkTx.GetMsgs())
!IsSupported(appVersion)
- happy case
@@ -0,0 +1,92 @@ | |||
package upgrade_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend having unit tests for all the functions as well as this integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup for sure. Will add all the tests once we're happy with the design
…tions and more tests
v1moduleVersionMap = module.VersionMap{ | ||
"bank": bank.AppModule{}.ConsensusVersion(), | ||
"auth": auth.AppModule{}.ConsensusVersion(), | ||
"authz": authzmodule.AppModule{}.ConsensusVersion(), | ||
"staking": staking.AppModule{}.ConsensusVersion(), | ||
"mint": mint.AppModule{}.ConsensusVersion(), | ||
"distribution": distribution.AppModule{}.ConsensusVersion(), | ||
"slashing": slashing.AppModule{}.ConsensusVersion(), | ||
"gov": gov.AppModule{}.ConsensusVersion(), | ||
"params": params.AppModule{}.ConsensusVersion(), | ||
"vesting": vesting.AppModule{}.ConsensusVersion(), | ||
"feegrant": feegrantmodule.AppModule{}.ConsensusVersion(), | ||
"evidence": evidence.AppModule{}.ConsensusVersion(), | ||
"crisis": crisis.AppModule{}.ConsensusVersion(), | ||
"genutil": genutil.AppModule{}.ConsensusVersion(), | ||
"capability": capability.AppModule{}.ConsensusVersion(), | ||
"blob": blob.AppModule{}.ConsensusVersion(), | ||
"qgb": qgb.AppModule{}.ConsensusVersion(), | ||
"ibc": ibc.AppModule{}.ConsensusVersion(), | ||
"transfer": transfer.AppModule{}.ConsensusVersion(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we map app versions to module versions will need to be modified later on as it doesn't quite fit to how the SDK are used to doing migrations. I will follow up with this in a later PR but for now this makes the tests pass and somewhat signals how migrations will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can dig it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me!! not only is this hardcoded upgrades, but this is "protected" or arguably even a form of signalling! had a few non-blocking questions
x/upgrade/upgrade_test.go
Outdated
func TestUpgradeAppVersion(t *testing.T) { | ||
testApp := setupTestApp(t, upgrade.NewSchedule(upgrade.NewPlan(3, 5, 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[can be done in a future PR]
heading in the direction where we rely on the test test is certainly a good thing, and I think this test is superior in many ways.
HoWeVeRr, since the upgrade module is passing important information back to comet, I actually think it would be good to also test that communication as well.
as a side note, I think there are a ton of improvement to the existing testapp we do to make it more ergonomic ref #2535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can also write an integration test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast.ToUint(appOpts.Get(server.FlagInvCheckPeriod)), | ||
encoding.MakeConfig(app.ModuleEncodingRegisters...), // Ideally, we would reuse the one created by NewRootCmd. | ||
nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's your current preferred way to pass this in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the main question I have from your question is whether we want to allow node operators to be able to modify this in their app.toml config. Either way we will have some hardcoded value that gets ported in the binary. I'm just not sure yet whether that value is a default that can be overrided or a built-in value that can only be changed by switching to a different binary
// ClearIBCState clears any planned IBC state | ||
func (k Keeper) ClearIBCState(ctx sdk.Context, lastHeight int64) { | ||
// delete IBC client and consensus state from store if this is IBC plan | ||
store := ctx.KVStore(k.storeKey) | ||
store.Delete(types.UpgradedClientKey(lastHeight)) | ||
store.Delete(types.UpgradedConsStateKey(lastHeight)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would we call this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IBC can call this when an upgrade message is executed. Since we don't allow IBC upgrade proposal messages, this path should never get hit.
require.EqualValues(t, 1, testApp.AppVersion()) | ||
respEndBlock := testApp.EndBlock(abci.RequestEndBlock{Height: 2}) | ||
// now the app version changes | ||
require.EqualValues(t, 2, respEndBlock.ConsensusParamUpdates.Version.AppVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not entirely evident where the consensus app version gets updated here, where in the sdk are we upgrading it? I remember running into this issue when we did the bsr upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little convoluted. We update the app version which then gets returned in the ConsensusParams
in EndBlock
. I've changed it so the app version is only updated right at the end
Certainly not suggesting we change this in this PR, this is just general comment. With a more complex signalling mechanism, we might want to consider only proceeding with the upgrade if more than some fraction greater than 2/3 signal their readiness. |
Yeah I also thought about this but theoretically since it only takes 2/3 to approve the block then only 2/3 are necessary to signal the upgrade. What we could do is inject some delay but I don't think it's necessary. Another idea is to have a more intelligent downgrade mechanism. If we're not reaching consensus across multiple rounds then a validator should be free to propose a lower version that might get the quorum required. While I've thought a little about downgrades, they haven't yet been properly addressed and should do in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR implements the protocol work behind ADR018. After several iterations I have come across a design which I think best meets our requirements. I will need to update the ADR afterwards.
This still relies on a hardcoded upgrade schedule but rather than simply upgrading at that height through
EndBlocker
, a proposer will modify the proposed block of the height before with a message indicating a version change. Nodes will vote on that version change inProcessProposal
. If 2/3+ support that app version then the proposal will pass and the block be committed. In the following height the upgraded nodes will perform the state migration if any and propose a block corresponding to that new app version. Nodes that don't support that app version will panic inDeliverTx
.