Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: allow for hardcoded upgrade schedules #2583

Merged
merged 16 commits into from
Oct 10, 2023
Merged

feat!: allow for hardcoded upgrade schedules #2583

merged 16 commits into from
Oct 10, 2023

Conversation

cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Sep 27, 2023

This PR implements the protocol work behind ADR018. After several iterations I have come across a design which I think best meets our requirements. I will need to update the ADR afterwards.

This still relies on a hardcoded upgrade schedule but rather than simply upgrading at that height through EndBlocker, a proposer will modify the proposed block of the height before with a message indicating a version change. Nodes will vote on that version change in ProcessProposal. If 2/3+ support that app version then the proposal will pass and the block be committed. In the following height the upgraded nodes will perform the state migration if any and propose a block corresponding to that new app version. Nodes that don't support that app version will panic in DeliverTx.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Merging #2583 (d82c2af) into main (1fee9de) will increase coverage by 0.47%.
The diff coverage is 36.96%.

@@            Coverage Diff             @@
##             main    #2583      +/-   ##
==========================================
+ Coverage   20.61%   21.09%   +0.47%     
==========================================
  Files         132      138       +6     
  Lines       15338    15767     +429     
==========================================
+ Hits         3162     3326     +164     
- Misses      11872    12117     +245     
- Partials      304      324      +20     
Files Coverage Δ
test/tokenfilter/setup.go 100.00% <100.00%> (ø)
x/blob/types/query.pb.go 1.73% <ø> (ø)
x/qgb/types/query.pb.go 0.79% <ø> (ø)
app/default_overrides.go 15.10% <0.00%> (ø)
x/upgrade/upgrade.go 75.00% <75.00%> (ø)
app/deliver_tx.go 0.00% <0.00%> (ø)
app/version.go 50.00% <50.00%> (ø)
x/upgrade/types.go 83.13% <83.13%> (ø)
app/process_proposal.go 0.00% <0.00%> (ø)
app/app.go 4.36% <0.00%> (-0.16%) ⬇️
... and 3 more

... and 2 files with indirect coverage changes

@cmwaters cmwaters marked this pull request as ready for review October 4, 2023 12:10
@cmwaters cmwaters changed the title feat: allow for hardcoded upgrade schedules feat!: allow for hardcoded upgrade schedules Oct 4, 2023
@celestia-bot celestia-bot requested a review from a team October 4, 2023 12:13
@cmwaters cmwaters self-assigned this Oct 4, 2023
@cmwaters cmwaters added this to the v2 milestone Oct 4, 2023
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this approach and its kinda nice to have a version of signalling from the get go. Have a few questions/discussions for the first round, and I'll continue to review after chewing on it a bit more.

since we're incrementing the protocol version during DeliverTx instead of endblock, will it be weird that we're executing txs for a height using the incremented version, depsite the version for that header not yet being incremented? I could be missing something, and I know that deferred execution will sort of cover this up, but just seemed like a departure from the current system. Before we always used the app version in the header in every scenario, now we use the app version in the header in every scenario, except for the first height of an upgrade.

where do migrations occur? should these also occur during DeliverTx? If so, are we running into the same migration issue that we discussed earlier, where process proposal is running on a potentially different version of the state than when the txs are executed?

Comment on lines 10 to 23
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err == nil {
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok {
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}
}
return app.BaseApp.DeliverTx(req)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since v1 nodes won't be running this, what will they do with this tx?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're setting the protocol version here, I think we have to be extra careful to never use the app.Version method before this in process or prepare. we might even want to remove it. The reason being that we don't want different versions used for different abci methods during the same height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since v1 nodes won't be running this, what will they do with this tx?

They will just error and skip over it. In the following block they will panic (not sure if in cometbft or in app) because of the app version difference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] I think de-indenting and following an exit early strategy makes the code easier to follow and easier to test. Additionally I think it reduces the chances of future changes causing bugs.

Suggested change
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err == nil {
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok {
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}
}
return app.BaseApp.DeliverTx(req)
}
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err != nil {
return app.BaseApp.DeliverTx(req)
}
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); !ok {
return app.BaseApp.DeliverTx(req)
}
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}

Now it's super clear that there are 4 test cases that need to be created.

  1. err != nil from app.txConfig.TxDecoder()(req.Tx)
  2. !ok from upgrade.IsUpgradeMsg(sdkTx.GetMsgs())
  3. !IsSupported(appVersion)
  4. happy case

app/prepare_proposal.go Show resolved Hide resolved
app/deliver_tx.go Show resolved Hide resolved
app/process_proposal.go Show resolved Hide resolved
Comment on lines -252 to -254
func(b *baseapp.BaseApp) {
b.SetProtocolVersion(appconsts.LatestVersion)
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this is in init, will it still get called after rebooting? in the past we've had bugs where the chain-id was not set after rebooting because its supposed to be set during init

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[proposal for a future PR, no change needed]

perhaps we should just make this a stateful value? its a one off benefit as it would only apply to v1, however it would guarantee nodes running v1 will hit an apphash error instead of ignoring txs that they don't know about. We'll probably hit one of those anyway, but this just guarantees it asap. The other benefit being that we never have to think about loading a consensus critical value into the application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should just make this a stateful value?

I agree. This is what later versions of the SDK have. Maybe we can add it in v2 as a migration

Comment on lines +79 to +88
if appVersion, ok := upgrade.IsUpgradeMsg(msgs); ok {
if idx != 0 {
logInvalidPropBlock(app.Logger(), req.Header, fmt.Sprintf("upgrade message %d is not the first transaction", idx))
return reject()
}

if !IsSupported(appVersion) {
logInvalidPropBlock(app.Logger(), req.Header, fmt.Sprintf("block proposes an unsupported app version %d", appVersion))
return reject()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a scenario where nodes running v1 that don't know how to parse this msg or have this logic vote for the proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about this. This was why I chose to wrap it as a tx rather than having it unparseable which would then be accepted. As a tx, it would fail the ante handler because there is no signature attached to it. Ideally I can write a test to check it out.

Comment on lines 11 to 17
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err == nil {
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok {
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is meant to be done as the first transaction, should we just do this in BeginBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more complex as we would need to track all the different proposed blocks and match them with the one that eventually gets committed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be easier in finalize block because it can reside as part of the preBlock call back function

Comment on lines 10 to 23
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err == nil {
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok {
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}
}
return app.BaseApp.DeliverTx(req)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-blocking] I think de-indenting and following an exit early strategy makes the code easier to follow and easier to test. Additionally I think it reduces the chances of future changes causing bugs.

Suggested change
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err == nil {
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); ok {
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}
}
return app.BaseApp.DeliverTx(req)
}
func (app *App) DeliverTx(req abci.RequestDeliverTx) abci.ResponseDeliverTx {
sdkTx, err := app.txConfig.TxDecoder()(req.Tx)
if err != nil {
return app.BaseApp.DeliverTx(req)
}
if appVersion, ok := upgrade.IsUpgradeMsg(sdkTx.GetMsgs()); !ok {
return app.BaseApp.DeliverTx(req)
}
if !IsSupported(appVersion) {
panic(fmt.Sprintf("network has upgraded to version %d which is not supported by this node. Please upgrade and restart", appVersion))
}
app.SetProtocolVersion(appVersion)
// TODO: we may want to emit an event for this
return abci.ResponseDeliverTx{Code: abci.CodeTypeOK}
}

Now it's super clear that there are 4 test cases that need to be created.

  1. err != nil from app.txConfig.TxDecoder()(req.Tx)
  2. !ok from upgrade.IsUpgradeMsg(sdkTx.GetMsgs())
  3. !IsSupported(appVersion)
  4. happy case

@@ -0,0 +1,92 @@
package upgrade_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend having unit tests for all the functions as well as this integration test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup for sure. Will add all the tests once we're happy with the design

@celestia-bot celestia-bot requested a review from a team October 5, 2023 13:30
Comment on lines +34 to +54
v1moduleVersionMap = module.VersionMap{
"bank": bank.AppModule{}.ConsensusVersion(),
"auth": auth.AppModule{}.ConsensusVersion(),
"authz": authzmodule.AppModule{}.ConsensusVersion(),
"staking": staking.AppModule{}.ConsensusVersion(),
"mint": mint.AppModule{}.ConsensusVersion(),
"distribution": distribution.AppModule{}.ConsensusVersion(),
"slashing": slashing.AppModule{}.ConsensusVersion(),
"gov": gov.AppModule{}.ConsensusVersion(),
"params": params.AppModule{}.ConsensusVersion(),
"vesting": vesting.AppModule{}.ConsensusVersion(),
"feegrant": feegrantmodule.AppModule{}.ConsensusVersion(),
"evidence": evidence.AppModule{}.ConsensusVersion(),
"crisis": crisis.AppModule{}.ConsensusVersion(),
"genutil": genutil.AppModule{}.ConsensusVersion(),
"capability": capability.AppModule{}.ConsensusVersion(),
"blob": blob.AppModule{}.ConsensusVersion(),
"qgb": qgb.AppModule{}.ConsensusVersion(),
"ibc": ibc.AppModule{}.ConsensusVersion(),
"transfer": transfer.AppModule{}.ConsensusVersion(),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we map app versions to module versions will need to be modified later on as it doesn't quite fit to how the SDK are used to doing migrations. I will follow up with this in a later PR but for now this makes the tests pass and somewhat signals how migrations will work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dig it

evan-forbes
evan-forbes previously approved these changes Oct 6, 2023
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me!! not only is this hardcoded upgrades, but this is "protected" or arguably even a form of signalling! had a few non-blocking questions

Comment on lines 23 to 24
func TestUpgradeAppVersion(t *testing.T) {
testApp := setupTestApp(t, upgrade.NewSchedule(upgrade.NewPlan(3, 5, 2)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[can be done in a future PR]

heading in the direction where we rely on the test test is certainly a good thing, and I think this test is superior in many ways.

HoWeVeRr, since the upgrade module is passing important information back to comet, I actually think it would be good to also test that communication as well.

as a side note, I think there are a ton of improvement to the existing testapp we do to make it more ergonomic ref #2535

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can also write an integration test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast.ToUint(appOpts.Get(server.FlagInvCheckPeriod)),
encoding.MakeConfig(app.ModuleEncodingRegisters...), // Ideally, we would reuse the one created by NewRootCmd.
nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's your current preferred way to pass this in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main question I have from your question is whether we want to allow node operators to be able to modify this in their app.toml config. Either way we will have some hardcoded value that gets ported in the binary. I'm just not sure yet whether that value is a default that can be overrided or a built-in value that can only be changed by switching to a different binary

Comment on lines +95 to +100
// ClearIBCState clears any planned IBC state
func (k Keeper) ClearIBCState(ctx sdk.Context, lastHeight int64) {
// delete IBC client and consensus state from store if this is IBC plan
store := ctx.KVStore(k.storeKey)
store.Delete(types.UpgradedClientKey(lastHeight))
store.Delete(types.UpgradedConsStateKey(lastHeight))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would we call this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IBC can call this when an upgrade message is executed. Since we don't allow IBC upgrade proposal messages, this path should never get hit.

require.EqualValues(t, 1, testApp.AppVersion())
respEndBlock := testApp.EndBlock(abci.RequestEndBlock{Height: 2})
// now the app version changes
require.EqualValues(t, 2, respEndBlock.ConsensusParamUpdates.Version.AppVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not entirely evident where the consensus app version gets updated here, where in the sdk are we upgrading it? I remember running into this issue when we did the bsr upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little convoluted. We update the app version which then gets returned in the ConsensusParams in EndBlock. I've changed it so the app version is only updated right at the end

@evan-forbes
Copy link
Member

evan-forbes commented Oct 7, 2023

Certainly not suggesting we change this in this PR, this is just general comment.

With a more complex signalling mechanism, we might want to consider only proceeding with the upgrade if more than some fraction greater than 2/3 signal their readiness.

@cmwaters
Copy link
Contributor Author

cmwaters commented Oct 9, 2023

With a more complex signalling mechanism, we might want to consider only proceeding with the upgrade if more than some fraction greater than 2/3 signal their readiness.

Yeah I also thought about this but theoretically since it only takes 2/3 to approve the block then only 2/3 are necessary to signal the upgrade. What we could do is inject some delay but I don't think it's necessary.

Another idea is to have a more intelligent downgrade mechanism. If we're not reaching consensus across multiple rounds then a validator should be free to propose a lower version that might get the quorum required. While I've thought a little about downgrades, they haven't yet been properly addressed and should do in the future

@cmwaters cmwaters mentioned this pull request Oct 9, 2023
4 tasks
rootulp
rootulp previously approved these changes Oct 10, 2023
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

app/prepare_proposal.go Show resolved Hide resolved
proto/celestia/upgrade/v1/types.proto Outdated Show resolved Hide resolved
proto/celestia/upgrade/v1/types.proto Outdated Show resolved Hide resolved
x/upgrade/keeper.go Outdated Show resolved Hide resolved
x/upgrade/upgrade.go Show resolved Hide resolved
x/upgrade/upgrade_test.go Show resolved Hide resolved
evan-forbes
evan-forbes previously approved these changes Oct 10, 2023
rootulp
rootulp previously approved these changes Oct 10, 2023
x/upgrade/upgrade_test.go Show resolved Hide resolved
@cmwaters cmwaters dismissed stale reviews from rootulp and evan-forbes via dccd2b5 October 10, 2023 13:36
@cmwaters cmwaters requested a review from rach-id as a code owner October 10, 2023 13:36
@celestia-bot celestia-bot requested a review from a team October 10, 2023 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants