Conversation
|
I expect the current run to be failing on the upgrade kind tests as those have not been updated yet. Kind brokenness seemed to come from the readiness probe being hardcoded to check for version 1. |
|
Altering the version slightly such that the hash includes the genesis state will also make it much simpler to deal with the versioning that the p2p and such is dealing with atm. |
03bbd32 to
e6bb6e0
Compare
962dd6e to
878821f
Compare
|
|
||
| function test_GivenSufficientBalance(uint256 _numberOfRollups) external { | ||
| function test_GivenSufficientBalance() external { | ||
| // it should create a message for the newest version |
There was a problem hiding this comment.
This no longer deals with newest version per say
There was a problem hiding this comment.
💀 never writing comments again, will only come back to hount me.
Practically here though it is still the newest and it would fail if not due to the inbox being pulled based on what is the canonical in the registry - but I get your point 😆
| $EXE vote-on-governance-proposal --proposal-id $PROPOSAL_ID --in-favor true --wait true -r $REGISTRY | ||
|
|
||
| $EXE execute-governance-proposal --proposal-id $PROPOSAL_ID --wait true -r $REGISTRY | ||
| $EXE deploy-new-rollup -r $REGISTRY --salt $SALT --json $TEST_ACCOUNTS $SPONSORED_FPC |
There was a problem hiding this comment.
We dont want to stop testing this functionality, even if we are not using it for the existing deployment
There was a problem hiding this comment.
The things that we stop testing inside the test using this:
- minting and depositing funds into gov
- proposing with lock
- voting on our own proposal
- executing it
All things that are already tested in the solidity contracts.
The thing we kept in is essentially the new deployment making it into registry, and seeing that they when restarted will switch to and use that.
So while we are not doing it from the aztec-cli, I would say this functionality is still tested, just in the solidity.
There was a problem hiding this comment.
Am adding an explicit solidity tests for this so it is simpler to see 👍
| # echo "$hash timeout -v 30m ./spartan/bootstrap.sh test-prod-deployment" | ||
| echo "$hash ./spartan/bootstrap.sh test-cli-upgrade-with-lock" | ||
| echo "$hash ./spartan/bootstrap.sh test-cli-upgrade" | ||
| fi |
There was a problem hiding this comment.
If smoke takes 20m we are in serious trouble
There was a problem hiding this comment.
Not following here?
You are not pointing at the smoke test, and the smoke test ran in 381 seconds in ci, where is 20m from? test-cli-upgrade running in 277 second 🤷
There was a problem hiding this comment.
There was a diff where a timeout of 20 minutes was added to the smoke, that was concerning me, it seems the diff has disappeared now
| ): Fr { | ||
| ): Promise<Fr> { | ||
| const version = await this.outbox.read.VERSION(); | ||
|
|
There was a problem hiding this comment.
If we need the rollup contract, lets make in the constructor - getVersion will be memoized so it will only need to fetch it once, if we construct it each time we will need to request it each time
There was a problem hiding this comment.
Rollup contract was removed from there (it increased bundle size) and the outbox got the data directly. Theres no wrapper of the outbox so just did it this way
| const rollup = new RollupContract(l1PublicClient, originalL1ContractAddresses.rollupAddress.toString()); | ||
| const governanceProposer = new GovernanceProposerContract( | ||
| l1PublicClient, | ||
| originalL1ContractAddresses.governanceProposerAddress.toString(), |
There was a problem hiding this comment.
again, im pretty scared of explicitly removing all of this
There was a problem hiding this comment.
Adding an explicit solidity test to showcase the flow. Also, we have an E2E for doing the votes and such so it is essentially already there in the case without killing nodes to see connection so should be similarly tested but with reduced runtime
| curl -s -X POST -H 'content-type: application/json' \ | ||
| -d '{"jsonrpc":"2.0","method":"pxe_getNodeInfo","params":[],"id":67}' \ | ||
| 127.0.0.1:{{ .Values.pxe.service.nodePort }} | grep -q '"protocolVersion":1' | ||
| 127.0.0.1:{{ .Values.pxe.service.nodePort }} | grep -q '"protocolVersion":[0-9]*[1-9][0-9]*' |
There was a problem hiding this comment.
Ye problably better, should not be with a 0 start if number so don't have to cover 👍
Maddiaa0
left a comment
There was a problem hiding this comment.
overall lgtm
one worry is removing all tests for gov based upgrades rn, but if others are cool in removing non stage 0 things, then lets do it
It is removing it from the kind, there is a E2E. And also adding an explicit solidity test to showcase this case more simply. |
878821f to
3440cdb
Compare
3440cdb to
00384e9
Compare
|
Will add explicit test for full flow to add new version. See #13208 for description and then we |
🤖 I have created a new Aztec Packages release --- ## [0.83.0](v0.82.3...v0.83.0) (2025-04-03) ### ⚠ BREAKING CHANGES * operation mouthwash ([#13171](#13171)) * change version and protocol version to rollupVersion ([#13145](#13145)) * processing events in Aztec.nr ([#12957](#12957)) ### Features * accept multiple consensus client endpoints ([#13022](#13022)) ([1d0185d](1d0185d)) * add ecdsa non ssh account to cli wallet ([#13085](#13085)) ([c5f9984](c5f9984)) * Add parent log link to top of CI logs. ([#13170](#13170)) ([a851087](a851087)) * Add public data read gadget ([#13138](#13138)) ([6bb76db](6bb76db)) * **avm:** merkle db hints (part 3) ([#13199](#13199)) ([7f96676](7f96676)) * **Barretenberg:** static analyzer's routine ([#13207](#13207)) ([84890c2](84890c2)) * derived pending notes capsules slot ([#13102](#13102)) ([6307ba0](6307ba0)) * get mana limit from rollup by default. ([#13029](#13029)) ([a406c54](a406c54)) * Increase CIVC depth with no rollup cost ([#13106](#13106)) ([3a555ef](3a555ef)) * making SyncDataProvider throw before sync ([#13151](#13151)) ([9840241](9840241)) * more benchmarks ([#13103](#13103)) ([7a2c9b7](7a2c9b7)) * **noir:** Allow missing optional fields in msgpack ([#13141](#13141)) ([493dede](493dede)) * processing events in Aztec.nr ([#12957](#12957)) ([88c0e04](88c0e04)) * Prover id defaults to publisher address ([#13206](#13206)) ([f459c3e](f459c3e)) * purge of log decryption in TS ([#12992](#12992)) ([800ab8d](800ab8d)) * register private-only contracts in cli-wallet and misc improvements ([#13245](#13245)) ([ad0530f](ad0530f)) * remove unary trick in decomposition and constraints polishing ([#13080](#13080)) ([0e60255](0e60255)) * split public inputs from proof ([#12816](#12816)) ([ca7151e](ca7151e)) * use magic address for fee juice portal in msgs ([#13241](#13241)) ([6fbc378](6fbc378)) * util for computing proposer/forwarder address ([#13169](#13169)) ([87809b2](87809b2)) * Zw/goblin avm tests ([#12904](#12904)) ([baea4b3](baea4b3)) ### Bug Fixes * add check for rollup version in tx validator ([#13197](#13197)) ([c8220c9](c8220c9)), closes [#13192](#13192) * add flake ([13bdfa7](13bdfa7)) * add flakes ([547d50a](547d50a)) * **avm:** alu interface ([#13115](#13115)) ([101ff78](101ff78)) * await transaction to make proposal ([#13132](#13132)) ([180db9e](180db9e)) * bb mac publish on tag push ([#13174](#13174)) ([8d46c8c](8d46c8c)) * bb workdir permission issue ([#13095](#13095)) ([3685a80](3685a80)) * **bb:** dont publish .zst ([#13173](#13173)) ([4dba4e9](4dba4e9)) * boolean config helper for cli args works now ([#13110](#13110)) ([a93ce6e](a93ce6e)) * **docs:** Register FPC docs, contract deployment, events ([#13222](#13222)) ([49aabfb](49aabfb)) * don't log sepolia acc mnemonic on github action ([#13178](#13178)) ([809b44d](809b44d)) * eq instead of !== ([#13161](#13161)) ([921e347](921e347)) * fetch the correct vk in getSolidityVerifier ([#13157](#13157)) ([71b6719](71b6719)) * force anvil/blob networking to ipv4 on localhost. attempt to fix port flakes ([#13099](#13099)) ([970dae5](970dae5)) * fuzzing build issues ([#13114](#13114)) ([bacae3d](bacae3d)) * Handle proven chain events referring to unseen blocks ([#13144](#13144)) ([229515f](229515f)), closes [#13142](#13142) * handling multiple identical logs in a tx ([#13184](#13184)) ([c100499](c100499)) * incorrect blocknumber in syncTaggedLogs ([e58ea95](e58ea95)) * indexeddb multimap dupes ([#13254](#13254)) ([1e470f0](1e470f0)) * load two more points ([#13119](#13119)) ([2a2904a](2a2904a)) * Masternet to run with the nightly tag ([#13239](#13239)) ([f16b70e](f16b70e)) * newline weirdness in package.json ([#13111](#13111)) ([244ea99](244ea99)) * nightly deploy quotes ([#13253](#13253)) ([775eecf](775eecf)) * prover config read from L1 ([#13237](#13237)) ([13426fe](13426fe)) * Race condition while unwinding blocks ([#13148](#13148)) ([1c2291a](1c2291a)) * read and pass rollup version ([#13232](#13232)) ([4ee1e4a](4ee1e4a)) * recursive sumcheck bugs ([#12885](#12885)) ([a784802](a784802)) * reenable test cache ([4dfca94](4dfca94)) * remove [skip ci] edge condition ([#13228](#13228)) ([4e699ea](4e699ea)) * separator in pending partial notes capsule array slot ([#13153](#13153)) ([0d6ec63](0d6ec63)) * sysdig oops ([5cf6ad1](5cf6ad1)) * tagging bug ([#13061](#13061)) ([280e1a6](280e1a6)) * test tracking failures in merge queue ([#13235](#13235)) ([650fdc1](650fdc1)) * title and external check need to run in merge queue ([e326ed6](e326ed6)) * Transpile cmov ([#13194](#13194)) ([6b94555](6b94555)) * trying to fix EADDRINUSE ([#13176](#13176)) ([260a057](260a057)) * Uninstall gossipsub event handler on service stop ([#13190](#13190)) ([081f30d](081f30d)) * use version from registry for rollup instead of config ([#12938](#12938)) ([7dac390](7dac390)) * validate private double spends in txs with public funcs ([#13088](#13088)) ([4555871](4555871)) * yolo flake and alert fix ([f8de52b](f8de52b)) ### Miscellaneous * add container id to netlog ([9ecdf15](9ecdf15)) * Add e2e test to bootstrap test_cmds ([#13146](#13146)) ([09e4722](09e4722)) * add rollup version as universal cli option ([#13205](#13205)) ([#13213](#13213)) ([568d9e9](568d9e9)) * Add ultra versions of fuzzers in stdlib ([#13139](#13139)) ([aea210b](aea210b)) * align TS and C++ indexed leaf types ([#13185](#13185)) ([d2574cc](d2574cc)) * Assign bb test flake ([#13127](#13127)) ([69fdb04](69fdb04)) * **avm:** remove check_interaction from tests ([#13136](#13136)) ([7d875a6](7d875a6)) * change version and protocol version to rollupVersion ([#13145](#13145)) ([24d7f8b](24d7f8b)) * Cleanup config vars for alpha-testnet ([#13204](#13204)) ([153a606](153a606)) * convenient way to run app ivc from bb ([#13158](#13158)) ([cb1a857](cb1a857)) * Cron snapshot upload in spartan ([#13108](#13108)) ([7c520a8](7c520a8)) * Do not close store before stopping p2p client in tests ([#13223](#13223)) ([63a843e](63a843e)) * docker flake diagnostics. ([48da272](48da272)) * **docs:** alpha-testnet versioning ([#13016](#13016)) ([9ca5d3c](9ca5d3c)) * **docs:** Update CLI faucet command ([#13104](#13104)) ([6eb71de](6eb71de)) * **docs:** Update versions-updating.md ([#13090](#13090)) ([0310d4e](0310d4e)) * Enable debug logging for annoying unit tests ([#13191](#13191)) ([5556524](5556524)) * enable sentinel in sentinel test ([#13219](#13219)) ([f2cfe3f](f2cfe3f)) * Exclude nightly versions from stable fn ([#13196](#13196)) ([a832e90](a832e90)) * Fix flake in e2e fees failures ([#13229](#13229)) ([5defe47](5defe47)) * fix kind logs ([#13023](#13023)) ([c6c2727](c6c2727)), closes [#13053](#13053) * fuzzing build in ci ([#13105](#13105)) ([1c08d38](1c08d38)) * Improve callstacks for public dispatch fns ([#13120](#13120)) ([f67375d](f67375d)) * log out the slash factory when a new rollup is deployed ([#13131](#13131)) ([4643a31](4643a31)) * Merge alpha back to master ([#13128](#13128)) ([504c338](504c338)) * metric attributes ([#13126](#13126)) ([f87d5e3](f87d5e3)), closes [#13063](#13063) * minor tagging API improvement ([#13092](#13092)) ([f5bcece](f5bcece)) * more benchmarking ([#13211](#13211)) ([4b0a4ad](4b0a4ad)) * move unbound impl generics to their functions ([#13147](#13147)) ([62f497f](62f497f)) * operation mouthwash ([#13171](#13171)) ([384f4e5](384f4e5)) * remove catch-all branch from opcode match in transpiler ([#13210](#13210)) ([2bc9ca2](2bc9ca2)) * remove templating by flavor in merge protocol ([#13098](#13098)) ([cf5e217](cf5e217)) * rename journal dir and file to state manager & mv to up to public/ ([#13159](#13159)) ([f13be09](f13be09)) * replace relative paths to noir-protocol-circuits ([f18336d](f18336d)) * replace relative paths to noir-protocol-circuits ([d620c12](d620c12)) * replace relative paths to noir-protocol-circuits ([be3e74e](be3e74e)) * replace relative paths to noir-protocol-circuits ([5bfac63](5bfac63)) * replace relative paths to noir-protocol-circuits ([f41ab2c](f41ab2c)) * replace relative paths to noir-protocol-circuits ([806e560](806e560)) * streamlined log processing ([#13107](#13107)) ([b184865](b184865)) * use testnet optimized trace ([#13135](#13135)) ([5a4f2ac](5a4f2ac)) * uses a new larger redis cache. flake file tweaks. ([#13167](#13167)) ([0951fb6](0951fb6)) ### Documentation * Add intermediate migration note versions ([#13183](#13183)) ([f5de7b8](f5de7b8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Operation Mouthwash
Named so because it is me washing out the taste of shit in my mouth after my registry code made it very easy to misuse which have put us into a hard spot.
The Problem
The registry was supposed to be "per family", meaning that whenever there is a no inheritance (state) that goes from parent to child a new deployment would require a new registry.
Because of this inheritance assumption the registry stores its own view of "versions" e.g., 1, 2, 3..., for the "history".
The fee-asset-portal for example follows what the registry says, AND pulls from its versioning to figure out what it should include in its messages. This puts us into a weird case when the registry is misused because the rollup might force the circuits to use
version = 1but the portal will pullversion = 3making the message non-consumable (unless our circuit code is buggy).aztec-packages/l1-contracts/src/core/FeeJuicePortal.sol
Line 70 in 0951fb6
Furthermore, because we just used the same implementation we practically had that the version number were the same across all deployments (from the rollup contracts point of view). This have caused some confusion.
How did it become a problem?
The misuse of the registry was something that we did because for EMPTY BLOCKS it is not a problem that messages are not arriving or that fee assets are lost, as there are no need for any of them. This meant that for a "quick fix" for the zombie-chain, we could simple add another rollup from a different family to the registry and it would not cause issues.
However, as this was not clear we ended up not getting back to it, making it cause a mess now.
Solution
Do deal with the problem. I think we need to replace the registry and fee asset portal implementations with ones that are less implied and can keep the same registry across families.
There is really two paths that I think we can take to get to that point.
The hard way is that we use the governance (which is only us) and do a big update where we use its ownership of different things to essentially replace EVERYTHING but the governance contract.
The easy way, we abandon that deployment in its tracks, and we just run a fresh start with code that don't have the same issue.
I don't think the hard way is alluring at all, it is more pain that I would really prefer, and I think it will be much simpler to deal with going the clean slate.
In the following sections, we will look at the changes suggested in code, and then the stages of gov and how they match.
Changes and reasoning:
Shared and individual deployments
Deployments are to be more clearly be split out into a
sharedpart that we only need to do once, and anindividualpart that is done for each rollup.Shared deployment include:
Individual rollup include:
Where all of these contracts are tightly coupled.
When an individual deployment is made, the rollup contract should be added to the registry.
The rollup contract deploys its dependencies
To make it a bit simpler to deal with, I have altered the
RollupCore.solsuch that the rollup contract itself will deploy and link itshard-dependencies. Essentially removing the ability to pass in contracts that would be shared but should not be.Version becomes "kinda" chainid
Atm the version was a hardcoded number that was intended to go up as new deployments are made as code changes etc. However as there was often new deployments without L1 code changes the number would not change.
Also, the number was the same regardless of being on what network it was deployed to. To make the value more telling, and move it closer to a chainid usage for the future. I have altered it such that it is computed from the
block.chainidof the L1, the stringaztec_rollupand the rollup address.The ensures that devnet, testnet and such all will have different "version"s which should at least make that a bit clearer and easier to figure out where you are looking.
Right now, it is based on a hash, and not really following the things that a normal chain id, does so I have not changed the name of it.
Registry and portal
The registry and portal are altered such that the registry is not longer influencing the portals, and now just keeps a list of rollups deployed without addings its own logic around their versions.
The registry is now taking care of deploying the
reward-distributor, since I believe that at least makes it a big clearer that they are linked more strictly and just to avoid things being deployed nedlessly and cause confusion.Lastly, we allow separation of the
ownerof theregistryand thegovernancesuch that we can specify gov as the governance while the owner is just us, helping on the stages mentioned below.Inbox and Outbox
The inbox and outbox is currently made to go between versions (old thing) which means that users are providing a
versionas input. This is not needed when not following, so we can remove the input from user as there is only one acceptable value anyway.However, instead of doing this, we will simply limit it strictly. Reasoning being that we expect to use this interface again when we get to the point of moving state around, so keeping it should minimize pain to devs.
Stages of Governance
Furthermore, the combination of the governance setup and how we are using it inflict more pain on us than needed. Namely, we have a gov setup with timelocks, but we are the only actor, and we are in full control, so might as well get rid of some of the extra steps for now.
How does a new deployment look
This depends on what exact stage we are at, but I'm inclined to alter the setup so we don't have scripts for going through and faking governance as much as we are now since we might as well just skip it and deal with it MUCH simpler directly in the deployment.
This means that for the early stages, e.g., 0 and 1, we could just directly update. And we are just testing the signalling to test.
Conclusion
The mix of staging the governance and altering the contracts, means that it is possible for us to give just a single registry address that is stable across all the versions. But also gives us the power to deal with upgrades more simply short term because the deployment can simply add it to the registry skipping a bunch of these extra steps.
TODOs: