test(DO NOT MERGE): Try running integration tests with msgpack#13021
Closed
test(DO NOT MERGE): Try running integration tests with msgpack#13021
msgpack#13021Conversation
dbf2762 to
31ef57b
Compare
8 tasks
31ef57b to
8837281
Compare
msgpack
5 tasks
31f6ce5 to
d39ebcf
Compare
30f2854 to
0a8cb58
Compare
aakoshh
added a commit
that referenced
this pull request
Mar 27, 2025
Adds `msgpack` serialisation to the generated Acir and Witness C++ code. I moved the alterations described in `dsl/README.md` into the code generation itself, so no manual work is required. The PR is running against a feature branch with the same name in Noir, here's the upstream PR: noir-lang/noir#7716 With this PR is merged, `bb` should be able to handle `msgpack` or `bincode`. Once that's released we can switch to using `msgpack` in Noir in both native and wasm by merging noir-lang/noir#7810. And then we can remove the `msgpack` format detection and the fallback to `bincode`. **TODO**: - [x] Get it to compile - [x] Change `nargo` to allow compiling contracts with `msgpack` format: added `NOIR_SERIALIZATION_FORMAT` env var - [x] Add a first byte to the data to say which serialization format it is. There is a chance that it would clash with existing bincode data though, so a fallback would anyway be necessary. (Or we should ascertain that bincode never starts with some specific bit sequence). - [x] ~Change the `bb` code so it tries `bincode`, then falls back to `msgpack` - this way the currently used format stays fast, but we can feed it new data.~ _This looks problematic, as exceptions in the wasm build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm. Instead we run [msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968) first to check if the data looks like msgpack; if not, we use bincode._ - [x] Run integration tests with `msgpack` on both sides in #13021 - [x] Ignore the Brillig opcodes in Barretenberg - [x] Change the serialization of `WitnessStack` and `WitnessMap` to use the env var, add fallbacks in `bb` for them - [x] Revert the change to `noir-repo-ref` before merging ### Use of `MSGPACK_FIELDS` The generated code is using `MSGPACK_FIELDS` for structs, to keep it more terse. At some point during debugging the memory issue below I changed it so that I can have more direct control by generating code for individual fields. That needed some helper functions which I looted from the `msgpack-c` library and injected into the namespaces as a `Helpers` struct. This approach might be useful if we wanted to have extra checks, for example rejecting the data if there are extra fields, indicating a type has been extended with things we don't recognise, or if we wanted handle renamed fields. I left it out so there is less code to maintain, but if we need it we can recover it from the [commit history](noir-lang/noir@b0a612d). ### Compile `nargo` with the `msgpack` feature ```bash echo af/msgpack-codegen > noir/noir-repo-ref noir/bootstrap.sh ``` ### Generate and compile C++ code ```bash cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd - cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp cd barretenberg/cpp && ./format.sh changed && cd - barretenberg/cpp/bootstrap.sh ``` ### Test `nargo` with `bb` One example of an integration test that uses `bb` and noir in the Noir repo is https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964 We can call it like this: ```bash cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd - ./barretenberg/cpp/bootstrap.sh export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb export NOIR_SERIALIZATION_FORMAT=msgpack noir/noir-repo/examples/prove_and_verify/test.sh ``` If it works, it should print this: ```console % unset NOIR_SERIALIZATION_FORMAT % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Finalized circuit size: 18 Proof saved to "./proofs/proof" Finalized circuit size: 18 VK saved to "./target/vk" Proof verified successfully ``` Whereas if it doesn't: ```console % export NOIR_SERIALIZATION_FORMAT=msgpack % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Length is too large ``` I attached the final artefacts to the PR so it's easier to test with just `bb`. [hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json) [witness.gz](https://github.com/user-attachments/files/19391074/witness.gz) ### Further testing With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen` feature branch, we can run all the contract compilations and tests with `msgpack` as follows: ```shell export NOIR_SERIALIZATION_FORMAT=msgpack ./bootstrap.sh ci ``` This is tested in #13021 ### Peek into artefacts We can inspect the file in JSON format using [this](https://crates.io/crates/msgpack-cli) msgpack CLI tool. ```shell jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq ``` Thanks Tom for the [spell](AztecProtocol/msgpack-c#5 (comment)) 🙏 ### False bug At some point I thought had to make some fixes in `msgpack-c` itself to make this work: AztecProtocol/msgpack-c#5 A similar [blocking bug](#12841 (comment)) was encountered when running the entire `ci` build with msgpack format. It turned out it was a [dangling pointer](msgpack/msgpack-c#695 (comment)) issue, fixed in 5810e3b Much of the comments below are related to my struggles that came from this mistake; you can ignore them.
0a8cb58 to
674b4c0
Compare
AztecBot
pushed a commit
to AztecProtocol/barretenberg
that referenced
this pull request
Mar 28, 2025
Adds `msgpack` serialisation to the generated Acir and Witness C++ code. I moved the alterations described in `dsl/README.md` into the code generation itself, so no manual work is required. The PR is running against a feature branch with the same name in Noir, here's the upstream PR: noir-lang/noir#7716 With this PR is merged, `bb` should be able to handle `msgpack` or `bincode`. Once that's released we can switch to using `msgpack` in Noir in both native and wasm by merging noir-lang/noir#7810. And then we can remove the `msgpack` format detection and the fallback to `bincode`. **TODO**: - [x] Get it to compile - [x] Change `nargo` to allow compiling contracts with `msgpack` format: added `NOIR_SERIALIZATION_FORMAT` env var - [x] Add a first byte to the data to say which serialization format it is. There is a chance that it would clash with existing bincode data though, so a fallback would anyway be necessary. (Or we should ascertain that bincode never starts with some specific bit sequence). - [x] ~Change the `bb` code so it tries `bincode`, then falls back to `msgpack` - this way the currently used format stays fast, but we can feed it new data.~ _This looks problematic, as exceptions in the wasm build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm. Instead we run [msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968) first to check if the data looks like msgpack; if not, we use bincode._ - [x] Run integration tests with `msgpack` on both sides in AztecProtocol/aztec-packages#13021 - [x] Ignore the Brillig opcodes in Barretenberg - [x] Change the serialization of `WitnessStack` and `WitnessMap` to use the env var, add fallbacks in `bb` for them - [x] Revert the change to `noir-repo-ref` before merging ### Use of `MSGPACK_FIELDS` The generated code is using `MSGPACK_FIELDS` for structs, to keep it more terse. At some point during debugging the memory issue below I changed it so that I can have more direct control by generating code for individual fields. That needed some helper functions which I looted from the `msgpack-c` library and injected into the namespaces as a `Helpers` struct. This approach might be useful if we wanted to have extra checks, for example rejecting the data if there are extra fields, indicating a type has been extended with things we don't recognise, or if we wanted handle renamed fields. I left it out so there is less code to maintain, but if we need it we can recover it from the [commit history](noir-lang/noir@b0a612d). ### Compile `nargo` with the `msgpack` feature ```bash echo af/msgpack-codegen > noir/noir-repo-ref noir/bootstrap.sh ``` ### Generate and compile C++ code ```bash cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd - cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp cd barretenberg/cpp && ./format.sh changed && cd - barretenberg/cpp/bootstrap.sh ``` ### Test `nargo` with `bb` One example of an integration test that uses `bb` and noir in the Noir repo is https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964 We can call it like this: ```bash cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd - ./barretenberg/cpp/bootstrap.sh export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb export NOIR_SERIALIZATION_FORMAT=msgpack noir/noir-repo/examples/prove_and_verify/test.sh ``` If it works, it should print this: ```console % unset NOIR_SERIALIZATION_FORMAT % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Finalized circuit size: 18 Proof saved to "./proofs/proof" Finalized circuit size: 18 VK saved to "./target/vk" Proof verified successfully ``` Whereas if it doesn't: ```console % export NOIR_SERIALIZATION_FORMAT=msgpack % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Length is too large ``` I attached the final artefacts to the PR so it's easier to test with just `bb`. [hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json) [witness.gz](https://github.com/user-attachments/files/19391074/witness.gz) ### Further testing With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen` feature branch, we can run all the contract compilations and tests with `msgpack` as follows: ```shell export NOIR_SERIALIZATION_FORMAT=msgpack ./bootstrap.sh ci ``` This is tested in AztecProtocol/aztec-packages#13021 ### Peek into artefacts We can inspect the file in JSON format using [this](https://crates.io/crates/msgpack-cli) msgpack CLI tool. ```shell jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq ``` Thanks Tom for the [spell](AztecProtocol/msgpack-c#5 (comment)) 🙏 ### False bug At some point I thought had to make some fixes in `msgpack-c` itself to make this work: AztecProtocol/msgpack-c#5 A similar [blocking bug](AztecProtocol/aztec-packages#12841 (comment)) was encountered when running the entire `ci` build with msgpack format. It turned out it was a [dangling pointer](msgpack/msgpack-c#695 (comment)) issue, fixed in AztecProtocol/aztec-packages@5810e3b Much of the comments below are related to my struggles that came from this mistake; you can ignore them.
674b4c0 to
e49faa7
Compare
aakoshh
added a commit
that referenced
this pull request
Mar 28, 2025
Followup for #12841 Changes code generation for msgpack so that it doesn't throw an error if an optional field of a `struct` is not present in the data. This is to allow @TomAFrench and @asterite to delete `Opcode::MemoryOp::predicate` which is an optional field that we no longer use. Removing such a field should be a non-breaking change, as the field was optional to begin with, so while even if it's present in C++ it should already handle it being empty. Unfortunately the `msgpack-c` library as far as I can see [would throw an error](https://github.com/AztecProtocol/msgpack-c/blob/54e9865b84bbdc73cfbf8d1d437dbf769b64e386/include/msgpack/v1/adaptor/detail/cpp11_define_map.hpp#L33-L45) if the optional field would not be present in the data as NIL. For this to work the PR re-introduces the `Helpers` and enumerates fields explicitly, instead of using `MSGPACK_FIELDS`. I changed the unmerged noir-lang/noir#7716 to do codegen with this change. I rebased #13021 on top of this PR to see if it works when msgpack is actually in use. ### Note for future migration path @ludamad reached out that while the bytecode size increase shown in noir-lang/noir#7690 doesn't seem too bad, and compression compensates for the inclusion of string field names, it's still wasteful to have to parse them, and it would be better to use arrays. I established in [tests](https://github.com/noir-lang/noir/pull/7690/files#diff-2d66028e5a8966511a76d1740d752be294c0b6a46e0a567bc2959f91d9ce224bR169-R176) that we what we call `msgpack-compact` format uses arrays for structs, but still tags enums with types. We use a lot of enums, so there is still quite a few strings. Ostensibly we could use [serde attributes](https://serde.rs/container-attrs.html) to shorten names, but it would probably be a nightmare and ugly. Nevertheless if we generated C++ code to deal with arrays, we could save some space. And if we want to stick to `bincode`, we can use `NOIR_SERIALIZATION_FORMAT=bincode`, which I back ported to the Noir codegen PR, to generate `bincode` with a format byte marker. There is also `bincode-legacy`, but unfortunately we only have one shot at deserialising bincode in C++: if it fails, we can't catch the exception. Therefore the path to be able to use the bincode format marker is: 1. Release `bb` which can handle the `msgpack` format (which has a probe, doesn't have to throw) 2. Start producing msgpack data from `nargo` 3. Stop accepting unmarked bincode in `bb` and look for format byte == 1 to show that bincode is in use 4. Tell `nargo` which format to use EDIT: Unfortunately if we use `binpack` with today's data types it forces us to parse the Brillig part, as established by #13143 which would mean the Noir team can't make any changes to Brillig opcodes without breaking `bb`. We would need to change the format again to use two tier encoding, or use msgpack arrays.
AztecBot
pushed a commit
to AztecProtocol/barretenberg
that referenced
this pull request
Mar 29, 2025
Followup for AztecProtocol/aztec-packages#12841 Changes code generation for msgpack so that it doesn't throw an error if an optional field of a `struct` is not present in the data. This is to allow @TomAFrench and @asterite to delete `Opcode::MemoryOp::predicate` which is an optional field that we no longer use. Removing such a field should be a non-breaking change, as the field was optional to begin with, so while even if it's present in C++ it should already handle it being empty. Unfortunately the `msgpack-c` library as far as I can see [would throw an error](https://github.com/AztecProtocol/msgpack-c/blob/54e9865b84bbdc73cfbf8d1d437dbf769b64e386/include/msgpack/v1/adaptor/detail/cpp11_define_map.hpp#L33-L45) if the optional field would not be present in the data as NIL. For this to work the PR re-introduces the `Helpers` and enumerates fields explicitly, instead of using `MSGPACK_FIELDS`. I changed the unmerged noir-lang/noir#7716 to do codegen with this change. I rebased AztecProtocol/aztec-packages#13021 on top of this PR to see if it works when msgpack is actually in use. ### Note for future migration path @ludamad reached out that while the bytecode size increase shown in noir-lang/noir#7690 doesn't seem too bad, and compression compensates for the inclusion of string field names, it's still wasteful to have to parse them, and it would be better to use arrays. I established in [tests](https://github.com/noir-lang/noir/pull/7690/files#diff-2d66028e5a8966511a76d1740d752be294c0b6a46e0a567bc2959f91d9ce224bR169-R176) that we what we call `msgpack-compact` format uses arrays for structs, but still tags enums with types. We use a lot of enums, so there is still quite a few strings. Ostensibly we could use [serde attributes](https://serde.rs/container-attrs.html) to shorten names, but it would probably be a nightmare and ugly. Nevertheless if we generated C++ code to deal with arrays, we could save some space. And if we want to stick to `bincode`, we can use `NOIR_SERIALIZATION_FORMAT=bincode`, which I back ported to the Noir codegen PR, to generate `bincode` with a format byte marker. There is also `bincode-legacy`, but unfortunately we only have one shot at deserialising bincode in C++: if it fails, we can't catch the exception. Therefore the path to be able to use the bincode format marker is: 1. Release `bb` which can handle the `msgpack` format (which has a probe, doesn't have to throw) 2. Start producing msgpack data from `nargo` 3. Stop accepting unmarked bincode in `bb` and look for format byte == 1 to show that bincode is in use 4. Tell `nargo` which format to use EDIT: Unfortunately if we use `binpack` with today's data types it forces us to parse the Brillig part, as established by AztecProtocol/aztec-packages#13143 which would mean the Noir team can't make any changes to Brillig opcodes without breaking `bb`. We would need to change the format again to use two tier encoding, or use msgpack arrays.
Contributor
|
Hello, we have moved to a development model where |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Testing the version of
bbthat can handlemsgpackencoded Noir artefacts with anargothat actually produces them, instead of the defaultbincode.We could tell
nargoto usemsgpackby setting theNOIR_SERIALIZATION_FORMATenv var inbootstrap.sh, but then some of the Noir tests would fail, so we build against noir-lang/noir#7810 instead.We don't have to merge this PR, it's just here to check whether it's safe to start using
msgpackinnargoby default, once #12841 is in. Then, the changes on the Noir side will be part of thenightlyupdates.