Skip to content

refactor!: operation mouthwash#13171

Merged
LHerskind merged 8 commits intomasterfrom
lh/operation-mouthwash
Apr 1, 2025
Merged

refactor!: operation mouthwash#13171
LHerskind merged 8 commits intomasterfrom
lh/operation-mouthwash

Conversation

@LHerskind
Copy link
Contributor

@LHerskind LHerskind commented Mar 30, 2025

Operation Mouthwash

Named so because it is me washing out the taste of shit in my mouth after my registry code made it very easy to misuse which have put us into a hard spot.

The Problem

The registry was supposed to be "per family", meaning that whenever there is a no inheritance (state) that goes from parent to child a new deployment would require a new registry.

Because of this inheritance assumption the registry stores its own view of "versions" e.g., 1, 2, 3..., for the "history".

The fee-asset-portal for example follows what the registry says, AND pulls from its versioning to figure out what it should include in its messages. This puts us into a weird case when the registry is misused because the rollup might force the circuits to use version = 1 but the portal will pull version = 3 making the message non-consumable (unless our circuit code is buggy).

uint256 version = REGISTRY.getVersionFor(rollup);

Furthermore, because we just used the same implementation we practically had that the version number were the same across all deployments (from the rollup contracts point of view). This have caused some confusion.

How did it become a problem?

The misuse of the registry was something that we did because for EMPTY BLOCKS it is not a problem that messages are not arriving or that fee assets are lost, as there are no need for any of them. This meant that for a "quick fix" for the zombie-chain, we could simple add another rollup from a different family to the registry and it would not cause issues.

However, as this was not clear we ended up not getting back to it, making it cause a mess now.

Solution

Do deal with the problem. I think we need to replace the registry and fee asset portal implementations with ones that are less implied and can keep the same registry across families.

There is really two paths that I think we can take to get to that point.

  1. The hard way
  2. The easy way, but take a small L

The hard way is that we use the governance (which is only us) and do a big update where we use its ownership of different things to essentially replace EVERYTHING but the governance contract.

The easy way, we abandon that deployment in its tracks, and we just run a fresh start with code that don't have the same issue.

I don't think the hard way is alluring at all, it is more pain that I would really prefer, and I think it will be much simpler to deal with going the clean slate.

In the following sections, we will look at the changes suggested in code, and then the stages of gov and how they match.

Changes and reasoning:

Shared and individual deployments

Deployments are to be more clearly be split out into a shared part that we only need to do once, and an individual part that is done for each rollup.

Shared deployment include:

  • governance
  • governance proposer
  • fee asset
  • fee asset handler
  • staking asset
  • staking asset handler
  • registry
  • reward distributor
  • coin issuer

Individual rollup include:

  • Rollup contract
  • Fee Asset Portal
  • Inbox
  • Outbox
  • Verifier

Where all of these contracts are tightly coupled.

When an individual deployment is made, the rollup contract should be added to the registry.

The rollup contract deploys its dependencies

To make it a bit simpler to deal with, I have altered the RollupCore.sol such that the rollup contract itself will deploy and link its hard-dependencies. Essentially removing the ability to pass in contracts that would be shared but should not be.

Version becomes "kinda" chainid

Atm the version was a hardcoded number that was intended to go up as new deployments are made as code changes etc. However as there was often new deployments without L1 code changes the number would not change.

Also, the number was the same regardless of being on what network it was deployed to. To make the value more telling, and move it closer to a chainid usage for the future. I have altered it such that it is computed from the block.chainid of the L1, the string aztec_rollup and the rollup address.

The ensures that devnet, testnet and such all will have different "version"s which should at least make that a bit clearer and easier to figure out where you are looking.

Right now, it is based on a hash, and not really following the things that a normal chain id, does so I have not changed the name of it.

Registry and portal

The registry and portal are altered such that the registry is not longer influencing the portals, and now just keeps a list of rollups deployed without addings its own logic around their versions.

The registry is now taking care of deploying the reward-distributor, since I believe that at least makes it a big clearer that they are linked more strictly and just to avoid things being deployed nedlessly and cause confusion.

Lastly, we allow separation of the owner of the registry and the governance such that we can specify gov as the governance while the owner is just us, helping on the stages mentioned below.

Inbox and Outbox

The inbox and outbox is currently made to go between versions (old thing) which means that users are providing a version as input. This is not needed when not following, so we can remove the input from user as there is only one acceptable value anyway.

However, instead of doing this, we will simply limit it strictly. Reasoning being that we expect to use this interface again when we get to the point of moving state around, so keeping it should minimize pain to devs.

Stages of Governance

Furthermore, the combination of the governance setup and how we are using it inflict more pain on us than needed. Namely, we have a gov setup with timelocks, but we are the only actor, and we are in full control, so might as well get rid of some of the extra steps for now.

  • Stage 0: We own of all the contracts. This allow us to just add a new rollup whenever we want. And is practically what we have now, just with less pain around how we are running it.
  • Stage 1: We own all the contracts, but will look at signals that sequencers are sending. This is practically 0, just with them signalling.
  • Stage 2: We pass ownership of most contracts to the governance, but still keep the staking asset such that we can mint and force things.
  • Stage 3: We hand tokens out to others to vote, and practically don't have to be part of it anymore.

How does a new deployment look

This depends on what exact stage we are at, but I'm inclined to alter the setup so we don't have scripts for going through and faking governance as much as we are now since we might as well just skip it and deal with it MUCH simpler directly in the deployment.

This means that for the early stages, e.g., 0 and 1, we could just directly update. And we are just testing the signalling to test.

Conclusion

The mix of staging the governance and altering the contracts, means that it is possible for us to give just a single registry address that is stable across all the versions. But also gives us the power to deal with upgrades more simply short term because the deployment can simply add it to the registry skipping a bunch of these extra steps.

TODOs:

  • Figure out why kind tests are broken
    • It was using a hardcoded version in the readinessProbe
  • Update the upgrade kind tests to follow flow.
    • Some of them seem to just be broken (not run in CI so have been for some time as well it seems...)
  • update inbox outbox

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@LHerskind LHerskind changed the title chore: solidity chunks chore: operation mouthwash Mar 30, 2025
@LHerskind
Copy link
Contributor Author

LHerskind commented Mar 30, 2025

I expect the current run to be failing on the upgrade kind tests as those have not been updated yet. Kind brokenness seemed to come from the readiness probe being hardcoded to check for version 1.

@LHerskind LHerskind marked this pull request as ready for review March 30, 2025 14:03
@LHerskind LHerskind added the ci-full Run all master checks. label Mar 30, 2025
@LHerskind
Copy link
Contributor Author

Altering the version slightly such that the hash includes the genesis state will also make it much simpler to deal with the versioning that the p2p and such is dealing with atm.

@LHerskind LHerskind requested a review from charlielye as a code owner March 30, 2025 17:57
@LHerskind LHerskind force-pushed the lh/operation-mouthwash branch 2 times, most recently from 03bbd32 to e6bb6e0 Compare March 30, 2025 23:04
@LHerskind LHerskind requested a review from Thunkar as a code owner March 30, 2025 23:04
@LHerskind LHerskind changed the title chore: operation mouthwash refactor!: operation mouthwash Mar 31, 2025
@LHerskind LHerskind force-pushed the lh/operation-mouthwash branch 2 times, most recently from 962dd6e to 878821f Compare March 31, 2025 10:35

function test_GivenSufficientBalance(uint256 _numberOfRollups) external {
function test_GivenSufficientBalance() external {
// it should create a message for the newest version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer deals with newest version per say

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀 never writing comments again, will only come back to hount me.

Practically here though it is still the newest and it would fail if not due to the inbox being pulled based on what is the canonical in the registry - but I get your point 😆

$EXE vote-on-governance-proposal --proposal-id $PROPOSAL_ID --in-favor true --wait true -r $REGISTRY

$EXE execute-governance-proposal --proposal-id $PROPOSAL_ID --wait true -r $REGISTRY
$EXE deploy-new-rollup -r $REGISTRY --salt $SALT --json $TEST_ACCOUNTS $SPONSORED_FPC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont want to stop testing this functionality, even if we are not using it for the existing deployment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things that we stop testing inside the test using this:

  • minting and depositing funds into gov
  • proposing with lock
  • voting on our own proposal
  • executing it

All things that are already tested in the solidity contracts.
The thing we kept in is essentially the new deployment making it into registry, and seeing that they when restarted will switch to and use that.

So while we are not doing it from the aztec-cli, I would say this functionality is still tested, just in the solidity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am adding an explicit solidity tests for this so it is simpler to see 👍

# echo "$hash timeout -v 30m ./spartan/bootstrap.sh test-prod-deployment"
echo "$hash ./spartan/bootstrap.sh test-cli-upgrade-with-lock"
echo "$hash ./spartan/bootstrap.sh test-cli-upgrade"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If smoke takes 20m we are in serious trouble

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that happening now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following here?

You are not pointing at the smoke test, and the smoke test ran in 381 seconds in ci, where is 20m from? test-cli-upgrade running in 277 second 🤷

Copy link
Member

@Maddiaa0 Maddiaa0 Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a diff where a timeout of 20 minutes was added to the smoke, that was concerning me, it seems the diff has disappeared now

): Fr {
): Promise<Fr> {
const version = await this.outbox.read.VERSION();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need the rollup contract, lets make in the constructor - getVersion will be memoized so it will only need to fetch it once, if we construct it each time we will need to request it each time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollup contract was removed from there (it increased bundle size) and the outbox got the data directly. Theres no wrapper of the outbox so just did it this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

const rollup = new RollupContract(l1PublicClient, originalL1ContractAddresses.rollupAddress.toString());
const governanceProposer = new GovernanceProposerContract(
l1PublicClient,
originalL1ContractAddresses.governanceProposerAddress.toString(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, im pretty scared of explicitly removing all of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an explicit solidity test to showcase the flow. Also, we have an E2E for doing the votes and such so it is essentially already there in the case without killing nodes to see connection so should be similarly tested but with reduced runtime

curl -s -X POST -H 'content-type: application/json' \
-d '{"jsonrpc":"2.0","method":"pxe_getNodeInfo","params":[],"id":67}' \
127.0.0.1:{{ .Values.pxe.service.nodePort }} | grep -q '"protocolVersion":1'
127.0.0.1:{{ .Values.pxe.service.nodePort }} | grep -q '"protocolVersion":[0-9]*[1-9][0-9]*'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1-9][0-9]* ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye problably better, should not be with a 0 start if number so don't have to cover 👍

Copy link
Member

@Maddiaa0 Maddiaa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm

one worry is removing all tests for gov based upgrades rn, but if others are cool in removing non stage 0 things, then lets do it

@LHerskind
Copy link
Contributor Author

overall lgtm

one worry is removing all tests for gov based upgrades rn, but if others are cool in removing non stage 0 things, then lets do it

It is removing it from the kind, there is a E2E. And also adding an explicit solidity test to showcase this case more simply.

@LHerskind LHerskind force-pushed the lh/operation-mouthwash branch from 3440cdb to 00384e9 Compare April 1, 2025 10:23
@LHerskind
Copy link
Contributor Author

LHerskind commented Apr 1, 2025

Will add explicit test for full flow to add new version. See #13208 for description and then we

@LHerskind LHerskind merged commit 384f4e5 into master Apr 1, 2025
8 checks passed
@LHerskind LHerskind deleted the lh/operation-mouthwash branch April 1, 2025 15:13
charlielye pushed a commit that referenced this pull request Apr 3, 2025
🤖 I have created a new Aztec Packages release
---


##
[0.83.0](v0.82.3...v0.83.0)
(2025-04-03)


### ⚠ BREAKING CHANGES

* operation mouthwash
([#13171](#13171))
* change version and protocol version to rollupVersion
([#13145](#13145))
* processing events in Aztec.nr
([#12957](#12957))

### Features

* accept multiple consensus client endpoints
([#13022](#13022))
([1d0185d](1d0185d))
* add ecdsa non ssh account to cli wallet
([#13085](#13085))
([c5f9984](c5f9984))
* Add parent log link to top of CI logs.
([#13170](#13170))
([a851087](a851087))
* Add public data read gadget
([#13138](#13138))
([6bb76db](6bb76db))
* **avm:** merkle db hints (part 3)
([#13199](#13199))
([7f96676](7f96676))
* **Barretenberg:** static analyzer's routine
([#13207](#13207))
([84890c2](84890c2))
* derived pending notes capsules slot
([#13102](#13102))
([6307ba0](6307ba0))
* get mana limit from rollup by default.
([#13029](#13029))
([a406c54](a406c54))
* Increase CIVC depth with no rollup cost
([#13106](#13106))
([3a555ef](3a555ef))
* making SyncDataProvider throw before sync
([#13151](#13151))
([9840241](9840241))
* more benchmarks
([#13103](#13103))
([7a2c9b7](7a2c9b7))
* **noir:** Allow missing optional fields in msgpack
([#13141](#13141))
([493dede](493dede))
* processing events in Aztec.nr
([#12957](#12957))
([88c0e04](88c0e04))
* Prover id defaults to publisher address
([#13206](#13206))
([f459c3e](f459c3e))
* purge of log decryption in TS
([#12992](#12992))
([800ab8d](800ab8d))
* register private-only contracts in cli-wallet and misc improvements
([#13245](#13245))
([ad0530f](ad0530f))
* remove unary trick in decomposition and constraints polishing
([#13080](#13080))
([0e60255](0e60255))
* split public inputs from proof
([#12816](#12816))
([ca7151e](ca7151e))
* use magic address for fee juice portal in msgs
([#13241](#13241))
([6fbc378](6fbc378))
* util for computing proposer/forwarder address
([#13169](#13169))
([87809b2](87809b2))
* Zw/goblin avm tests
([#12904](#12904))
([baea4b3](baea4b3))


### Bug Fixes

* add check for rollup version in tx validator
([#13197](#13197))
([c8220c9](c8220c9)),
closes
[#13192](#13192)
* add flake
([13bdfa7](13bdfa7))
* add flakes
([547d50a](547d50a))
* **avm:** alu interface
([#13115](#13115))
([101ff78](101ff78))
* await transaction to make proposal
([#13132](#13132))
([180db9e](180db9e))
* bb mac publish on tag push
([#13174](#13174))
([8d46c8c](8d46c8c))
* bb workdir permission issue
([#13095](#13095))
([3685a80](3685a80))
* **bb:** dont publish .zst
([#13173](#13173))
([4dba4e9](4dba4e9))
* boolean config helper for cli args works now
([#13110](#13110))
([a93ce6e](a93ce6e))
* **docs:** Register FPC docs, contract deployment, events
([#13222](#13222))
([49aabfb](49aabfb))
* don't log sepolia acc mnemonic on github action
([#13178](#13178))
([809b44d](809b44d))
* eq instead of !==
([#13161](#13161))
([921e347](921e347))
* fetch the correct vk in getSolidityVerifier
([#13157](#13157))
([71b6719](71b6719))
* force anvil/blob networking to ipv4 on localhost. attempt to fix port
flakes
([#13099](#13099))
([970dae5](970dae5))
* fuzzing build issues
([#13114](#13114))
([bacae3d](bacae3d))
* Handle proven chain events referring to unseen blocks
([#13144](#13144))
([229515f](229515f)),
closes
[#13142](#13142)
* handling multiple identical logs in a tx
([#13184](#13184))
([c100499](c100499))
* incorrect blocknumber in syncTaggedLogs
([e58ea95](e58ea95))
* indexeddb multimap dupes
([#13254](#13254))
([1e470f0](1e470f0))
* load two more points
([#13119](#13119))
([2a2904a](2a2904a))
* Masternet to run with the nightly tag
([#13239](#13239))
([f16b70e](f16b70e))
* newline weirdness in package.json
([#13111](#13111))
([244ea99](244ea99))
* nightly deploy quotes
([#13253](#13253))
([775eecf](775eecf))
* prover config read from L1
([#13237](#13237))
([13426fe](13426fe))
* Race condition while unwinding blocks
([#13148](#13148))
([1c2291a](1c2291a))
* read and pass rollup version
([#13232](#13232))
([4ee1e4a](4ee1e4a))
* recursive sumcheck bugs
([#12885](#12885))
([a784802](a784802))
* reenable test cache
([4dfca94](4dfca94))
* remove [skip ci] edge condition
([#13228](#13228))
([4e699ea](4e699ea))
* separator in pending partial notes capsule array slot
([#13153](#13153))
([0d6ec63](0d6ec63))
* sysdig oops
([5cf6ad1](5cf6ad1))
* tagging bug
([#13061](#13061))
([280e1a6](280e1a6))
* test tracking failures in merge queue
([#13235](#13235))
([650fdc1](650fdc1))
* title and external check need to run in merge queue
([e326ed6](e326ed6))
* Transpile cmov
([#13194](#13194))
([6b94555](6b94555))
* trying to fix EADDRINUSE
([#13176](#13176))
([260a057](260a057))
* Uninstall gossipsub event handler on service stop
([#13190](#13190))
([081f30d](081f30d))
* use version from registry for rollup instead of config
([#12938](#12938))
([7dac390](7dac390))
* validate private double spends in txs with public funcs
([#13088](#13088))
([4555871](4555871))
* yolo flake and alert fix
([f8de52b](f8de52b))


### Miscellaneous

* add container id to netlog
([9ecdf15](9ecdf15))
* Add e2e test to bootstrap test_cmds
([#13146](#13146))
([09e4722](09e4722))
* add rollup version as universal cli option
([#13205](#13205))
([#13213](#13213))
([568d9e9](568d9e9))
* Add ultra versions of fuzzers in stdlib
([#13139](#13139))
([aea210b](aea210b))
* align TS and C++ indexed leaf types
([#13185](#13185))
([d2574cc](d2574cc))
* Assign bb test flake
([#13127](#13127))
([69fdb04](69fdb04))
* **avm:** remove check_interaction from tests
([#13136](#13136))
([7d875a6](7d875a6))
* change version and protocol version to rollupVersion
([#13145](#13145))
([24d7f8b](24d7f8b))
* Cleanup config vars for alpha-testnet
([#13204](#13204))
([153a606](153a606))
* convenient way to run app ivc from bb
([#13158](#13158))
([cb1a857](cb1a857))
* Cron snapshot upload in spartan
([#13108](#13108))
([7c520a8](7c520a8))
* Do not close store before stopping p2p client in tests
([#13223](#13223))
([63a843e](63a843e))
* docker flake diagnostics.
([48da272](48da272))
* **docs:** alpha-testnet versioning
([#13016](#13016))
([9ca5d3c](9ca5d3c))
* **docs:** Update CLI faucet command
([#13104](#13104))
([6eb71de](6eb71de))
* **docs:** Update versions-updating.md
([#13090](#13090))
([0310d4e](0310d4e))
* Enable debug logging for annoying unit tests
([#13191](#13191))
([5556524](5556524))
* enable sentinel in sentinel test
([#13219](#13219))
([f2cfe3f](f2cfe3f))
* Exclude nightly versions from stable fn
([#13196](#13196))
([a832e90](a832e90))
* Fix flake in e2e fees failures
([#13229](#13229))
([5defe47](5defe47))
* fix kind logs
([#13023](#13023))
([c6c2727](c6c2727)),
closes
[#13053](#13053)
* fuzzing build in ci
([#13105](#13105))
([1c08d38](1c08d38))
* Improve callstacks for public dispatch fns
([#13120](#13120))
([f67375d](f67375d))
* log out the slash factory when a new rollup is deployed
([#13131](#13131))
([4643a31](4643a31))
* Merge alpha back to master
([#13128](#13128))
([504c338](504c338))
* metric attributes
([#13126](#13126))
([f87d5e3](f87d5e3)),
closes
[#13063](#13063)
* minor tagging API improvement
([#13092](#13092))
([f5bcece](f5bcece))
* more benchmarking
([#13211](#13211))
([4b0a4ad](4b0a4ad))
* move unbound impl generics to their functions
([#13147](#13147))
([62f497f](62f497f))
* operation mouthwash
([#13171](#13171))
([384f4e5](384f4e5))
* remove catch-all branch from opcode match in transpiler
([#13210](#13210))
([2bc9ca2](2bc9ca2))
* remove templating by flavor in merge protocol
([#13098](#13098))
([cf5e217](cf5e217))
* rename journal dir and file to state manager & mv to up to public/
([#13159](#13159))
([f13be09](f13be09))
* replace relative paths to noir-protocol-circuits
([f18336d](f18336d))
* replace relative paths to noir-protocol-circuits
([d620c12](d620c12))
* replace relative paths to noir-protocol-circuits
([be3e74e](be3e74e))
* replace relative paths to noir-protocol-circuits
([5bfac63](5bfac63))
* replace relative paths to noir-protocol-circuits
([f41ab2c](f41ab2c))
* replace relative paths to noir-protocol-circuits
([806e560](806e560))
* streamlined log processing
([#13107](#13107))
([b184865](b184865))
* use testnet optimized trace
([#13135](#13135))
([5a4f2ac](5a4f2ac))
* uses a new larger redis cache. flake file tweaks.
([#13167](#13167))
([0951fb6](0951fb6))


### Documentation

* Add intermediate migration note versions
([#13183](#13183))
([f5de7b8](f5de7b8))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full Run all master checks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants