Skip to content

test(DO NOT MERGE): Try running integration tests with msgpack#13021

Closed
aakoshh wants to merge 4 commits intomasterfrom
af/msgpack-test
Closed

test(DO NOT MERGE): Try running integration tests with msgpack#13021
aakoshh wants to merge 4 commits intomasterfrom
af/msgpack-test

Conversation

@aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Mar 25, 2025

Testing the version of bb that can handle msgpack encoded Noir artefacts with a nargo that actually produces them, instead of the default bincode.

We could tell nargo to use msgpack by setting the NOIR_SERIALIZATION_FORMAT env var in bootstrap.sh, but then some of the Noir tests would fail, so we build against noir-lang/noir#7810 instead.

We don't have to merge this PR, it's just here to check whether it's safe to start using msgpack in nargo by default, once #12841 is in. Then, the changes on the Noir side will be part of the nightly updates.

@aakoshh aakoshh requested a review from charlielye as a code owner March 25, 2025 18:46
@aakoshh aakoshh requested a review from TomAFrench as a code owner March 25, 2025 21:48
@aakoshh aakoshh marked this pull request as draft March 25, 2025 21:49
@aakoshh aakoshh added the trigger-workflow Used by ci script to manually trigger CI3. label Mar 25, 2025
@aakoshh aakoshh marked this pull request as ready for review March 25, 2025 21:50
@aakoshh aakoshh changed the title test(DRAFT): Try running integration tests with msgpack test(DO NOT MERGE): Try running integration tests with msgpack Mar 25, 2025
@aakoshh aakoshh changed the title test(DO NOT MERGE): Try running integration tests with msgpack test(DO NOT MERGE): Try running integration tests with msgpack Mar 25, 2025
@aakoshh aakoshh force-pushed the af/msgpack-codegen branch from 31f6ce5 to d39ebcf Compare March 26, 2025 13:17
aakoshh added a commit that referenced this pull request Mar 27, 2025
Adds `msgpack` serialisation to the generated Acir and Witness C++ code.

I moved the alterations described in `dsl/README.md` into the code
generation itself, so no manual work is required. The PR is running
against a feature branch with the same name in Noir, here's the upstream
PR: noir-lang/noir#7716

With this PR is merged, `bb` should be able to handle `msgpack` or
`bincode`. Once that's released we can switch to using `msgpack` in Noir
in both native and wasm by merging
noir-lang/noir#7810. And then we can remove the
`msgpack` format detection and the fallback to `bincode`.

**TODO**:
- [x] Get it to compile 
- [x] Change `nargo` to allow compiling contracts with `msgpack` format:
added `NOIR_SERIALIZATION_FORMAT` env var
- [x] Add a first byte to the data to say which serialization format it
is. There is a chance that it would clash with existing bincode data
though, so a fallback would anyway be necessary. (Or we should ascertain
that bincode never starts with some specific bit sequence).
- [x] ~Change the `bb` code so it tries `bincode`, then falls back to
`msgpack` - this way the currently used format stays fast, but we can
feed it new data.~ _This looks problematic, as exceptions in the wasm
build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm.
Instead we run
[msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968)
first to check if the data looks like msgpack; if not, we use bincode._
- [x] Run integration tests with `msgpack` on both sides in
#13021
- [x] Ignore the Brillig opcodes in Barretenberg
- [x] Change the serialization of `WitnessStack` and `WitnessMap` to use
the env var, add fallbacks in `bb` for them
- [x] Revert the change to `noir-repo-ref` before merging


### Use of `MSGPACK_FIELDS`

The generated code is using `MSGPACK_FIELDS` for structs, to keep it
more terse.

At some point during debugging the memory issue below I changed it so
that I can have more direct control by generating code for individual
fields. That needed some helper functions which I looted from the
`msgpack-c` library and injected into the namespaces as a `Helpers`
struct. This approach might be useful if we wanted to have extra checks,
for example rejecting the data if there are extra fields, indicating a
type has been extended with things we don't recognise, or if we wanted
handle renamed fields. I left it out so there is less code to maintain,
but if we need it we can recover it from the [commit
history](noir-lang/noir@b0a612d).

### Compile `nargo` with the `msgpack` feature

```bash
echo af/msgpack-codegen > noir/noir-repo-ref
noir/bootstrap.sh
```

### Generate and compile C++ code

```bash
cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd -
cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp
cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp
cd barretenberg/cpp && ./format.sh changed && cd -

barretenberg/cpp/bootstrap.sh
```

### Test `nargo` with `bb`

One example of an integration test that uses `bb` and noir in the Noir
repo is
https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964

We can call it like this:

```bash
cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd -
./barretenberg/cpp/bootstrap.sh
export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb
export NOIR_SERIALIZATION_FORMAT=msgpack
noir/noir-repo/examples/prove_and_verify/test.sh
```

If it works, it should print this:
```console
% unset NOIR_SERIALIZATION_FORMAT                       
% noir/noir-repo/examples/prove_and_verify/test.sh      
[hello_world] Circuit witness successfully solved
[hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz
Finalized circuit size: 18
Proof saved to "./proofs/proof"
Finalized circuit size: 18
VK saved to "./target/vk"
Proof verified successfully
```

Whereas if it doesn't:
```console
% export NOIR_SERIALIZATION_FORMAT=msgpack                                                                                                                                                            
% noir/noir-repo/examples/prove_and_verify/test.sh             
[hello_world] Circuit witness successfully solved
[hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz
Length is too large
```

I attached the final artefacts to the PR so it's easier to test with
just `bb`.

[hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json)

[witness.gz](https://github.com/user-attachments/files/19391074/witness.gz)


### Further testing

With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen`
feature branch, we can run all the contract compilations and tests with
`msgpack` as follows:

```shell
export NOIR_SERIALIZATION_FORMAT=msgpack       
./bootstrap.sh ci
```

This is tested in
#13021

### Peek into artefacts

We can inspect the file in JSON format using
[this](https://crates.io/crates/msgpack-cli) msgpack CLI tool.

```shell
jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq
```

Thanks Tom for the
[spell](AztecProtocol/msgpack-c#5 (comment))
🙏

### False bug

At some point I thought had to make some fixes in `msgpack-c` itself to
make this work: AztecProtocol/msgpack-c#5
A similar [blocking
bug](#12841 (comment))
was encountered when running the entire `ci` build with msgpack format.

It turned out it was a [dangling
pointer](msgpack/msgpack-c#695 (comment))
issue, fixed in
5810e3b

Much of the comments below are related to my struggles that came from
this mistake; you can ignore them.
Base automatically changed from af/msgpack-codegen to master March 27, 2025 19:51
AztecBot pushed a commit to AztecProtocol/barretenberg that referenced this pull request Mar 28, 2025
Adds `msgpack` serialisation to the generated Acir and Witness C++ code.

I moved the alterations described in `dsl/README.md` into the code
generation itself, so no manual work is required. The PR is running
against a feature branch with the same name in Noir, here's the upstream
PR: noir-lang/noir#7716

With this PR is merged, `bb` should be able to handle `msgpack` or
`bincode`. Once that's released we can switch to using `msgpack` in Noir
in both native and wasm by merging
noir-lang/noir#7810. And then we can remove the
`msgpack` format detection and the fallback to `bincode`.

**TODO**:
- [x] Get it to compile 
- [x] Change `nargo` to allow compiling contracts with `msgpack` format:
added `NOIR_SERIALIZATION_FORMAT` env var
- [x] Add a first byte to the data to say which serialization format it
is. There is a chance that it would clash with existing bincode data
though, so a fallback would anyway be necessary. (Or we should ascertain
that bincode never starts with some specific bit sequence).
- [x] ~Change the `bb` code so it tries `bincode`, then falls back to
`msgpack` - this way the currently used format stays fast, but we can
feed it new data.~ _This looks problematic, as exceptions in the wasm
build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm.
Instead we run
[msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968)
first to check if the data looks like msgpack; if not, we use bincode._
- [x] Run integration tests with `msgpack` on both sides in
AztecProtocol/aztec-packages#13021
- [x] Ignore the Brillig opcodes in Barretenberg
- [x] Change the serialization of `WitnessStack` and `WitnessMap` to use
the env var, add fallbacks in `bb` for them
- [x] Revert the change to `noir-repo-ref` before merging


### Use of `MSGPACK_FIELDS`

The generated code is using `MSGPACK_FIELDS` for structs, to keep it
more terse.

At some point during debugging the memory issue below I changed it so
that I can have more direct control by generating code for individual
fields. That needed some helper functions which I looted from the
`msgpack-c` library and injected into the namespaces as a `Helpers`
struct. This approach might be useful if we wanted to have extra checks,
for example rejecting the data if there are extra fields, indicating a
type has been extended with things we don't recognise, or if we wanted
handle renamed fields. I left it out so there is less code to maintain,
but if we need it we can recover it from the [commit
history](noir-lang/noir@b0a612d).

### Compile `nargo` with the `msgpack` feature

```bash
echo af/msgpack-codegen > noir/noir-repo-ref
noir/bootstrap.sh
```

### Generate and compile C++ code

```bash
cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd -
cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp
cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp
cd barretenberg/cpp && ./format.sh changed && cd -

barretenberg/cpp/bootstrap.sh
```

### Test `nargo` with `bb`

One example of an integration test that uses `bb` and noir in the Noir
repo is
https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964

We can call it like this:

```bash
cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd -
./barretenberg/cpp/bootstrap.sh
export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb
export NOIR_SERIALIZATION_FORMAT=msgpack
noir/noir-repo/examples/prove_and_verify/test.sh
```

If it works, it should print this:
```console
% unset NOIR_SERIALIZATION_FORMAT                       
% noir/noir-repo/examples/prove_and_verify/test.sh      
[hello_world] Circuit witness successfully solved
[hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz
Finalized circuit size: 18
Proof saved to "./proofs/proof"
Finalized circuit size: 18
VK saved to "./target/vk"
Proof verified successfully
```

Whereas if it doesn't:
```console
% export NOIR_SERIALIZATION_FORMAT=msgpack                                                                                                                                                            
% noir/noir-repo/examples/prove_and_verify/test.sh             
[hello_world] Circuit witness successfully solved
[hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz
Length is too large
```

I attached the final artefacts to the PR so it's easier to test with
just `bb`.

[hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json)

[witness.gz](https://github.com/user-attachments/files/19391074/witness.gz)


### Further testing

With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen`
feature branch, we can run all the contract compilations and tests with
`msgpack` as follows:

```shell
export NOIR_SERIALIZATION_FORMAT=msgpack       
./bootstrap.sh ci
```

This is tested in
AztecProtocol/aztec-packages#13021

### Peek into artefacts

We can inspect the file in JSON format using
[this](https://crates.io/crates/msgpack-cli) msgpack CLI tool.

```shell
jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq
```

Thanks Tom for the
[spell](AztecProtocol/msgpack-c#5 (comment))
🙏

### False bug

At some point I thought had to make some fixes in `msgpack-c` itself to
make this work: AztecProtocol/msgpack-c#5
A similar [blocking
bug](AztecProtocol/aztec-packages#12841 (comment))
was encountered when running the entire `ci` build with msgpack format.

It turned out it was a [dangling
pointer](msgpack/msgpack-c#695 (comment))
issue, fixed in
AztecProtocol/aztec-packages@5810e3b

Much of the comments below are related to my struggles that came from
this mistake; you can ignore them.
@aakoshh aakoshh changed the base branch from master to af/msgpack-optional March 28, 2025 17:39
aakoshh added a commit that referenced this pull request Mar 28, 2025
Followup for #12841

Changes code generation for msgpack so that it doesn't throw an error if
an optional field of a `struct` is not present in the data. This is to
allow @TomAFrench and @asterite to delete `Opcode::MemoryOp::predicate`
which is an optional field that we no longer use. Removing such a field
should be a non-breaking change, as the field was optional to begin
with, so while even if it's present in C++ it should already handle it
being empty.

Unfortunately the `msgpack-c` library as far as I can see [would throw
an
error](https://github.com/AztecProtocol/msgpack-c/blob/54e9865b84bbdc73cfbf8d1d437dbf769b64e386/include/msgpack/v1/adaptor/detail/cpp11_define_map.hpp#L33-L45)
if the optional field would not be present in the data as NIL.

For this to work the PR re-introduces the `Helpers` and enumerates
fields explicitly, instead of using `MSGPACK_FIELDS`. I changed the
unmerged noir-lang/noir#7716 to do codegen with
this change.

I rebased #13021 on
top of this PR to see if it works when msgpack is actually in use.

### Note for future migration path

@ludamad reached out that while the bytecode size increase shown in
noir-lang/noir#7690 doesn't seem too bad, and
compression compensates for the inclusion of string field names, it's
still wasteful to have to parse them, and it would be better to use
arrays.

I established in
[tests](https://github.com/noir-lang/noir/pull/7690/files#diff-2d66028e5a8966511a76d1740d752be294c0b6a46e0a567bc2959f91d9ce224bR169-R176)
that we what we call `msgpack-compact` format uses arrays for structs,
but still tags enums with types. We use a lot of enums, so there is
still quite a few strings. Ostensibly we could use [serde
attributes](https://serde.rs/container-attrs.html) to shorten names, but
it would probably be a nightmare and ugly.

Nevertheless if we generated C++ code to deal with arrays, we could save
some space.

And if we want to stick to `bincode`, we can use
`NOIR_SERIALIZATION_FORMAT=bincode`, which I back ported to the Noir
codegen PR, to generate `bincode` with a format byte marker. There is
also `bincode-legacy`, but unfortunately we only have one shot at
deserialising bincode in C++: if it fails, we can't catch the exception.
Therefore the path to be able to use the bincode format marker is:
1. Release `bb` which can handle the `msgpack` format (which has a
probe, doesn't have to throw)
2. Start producing msgpack data from `nargo` 
3. Stop accepting unmarked bincode in `bb` and look for format byte == 1
to show that bincode is in use
4. Tell `nargo` which format to use

EDIT: Unfortunately if we use `binpack` with today's data types it
forces us to parse the Brillig part, as established by
#13143 which would
mean the Noir team can't make any changes to Brillig opcodes without
breaking `bb`. We would need to change the format again to use two tier
encoding, or use msgpack arrays.
Base automatically changed from af/msgpack-optional to master March 28, 2025 18:40
AztecBot pushed a commit to AztecProtocol/barretenberg that referenced this pull request Mar 29, 2025
Followup for AztecProtocol/aztec-packages#12841

Changes code generation for msgpack so that it doesn't throw an error if
an optional field of a `struct` is not present in the data. This is to
allow @TomAFrench and @asterite to delete `Opcode::MemoryOp::predicate`
which is an optional field that we no longer use. Removing such a field
should be a non-breaking change, as the field was optional to begin
with, so while even if it's present in C++ it should already handle it
being empty.

Unfortunately the `msgpack-c` library as far as I can see [would throw
an
error](https://github.com/AztecProtocol/msgpack-c/blob/54e9865b84bbdc73cfbf8d1d437dbf769b64e386/include/msgpack/v1/adaptor/detail/cpp11_define_map.hpp#L33-L45)
if the optional field would not be present in the data as NIL.

For this to work the PR re-introduces the `Helpers` and enumerates
fields explicitly, instead of using `MSGPACK_FIELDS`. I changed the
unmerged noir-lang/noir#7716 to do codegen with
this change.

I rebased AztecProtocol/aztec-packages#13021 on
top of this PR to see if it works when msgpack is actually in use.

### Note for future migration path

@ludamad reached out that while the bytecode size increase shown in
noir-lang/noir#7690 doesn't seem too bad, and
compression compensates for the inclusion of string field names, it's
still wasteful to have to parse them, and it would be better to use
arrays.

I established in
[tests](https://github.com/noir-lang/noir/pull/7690/files#diff-2d66028e5a8966511a76d1740d752be294c0b6a46e0a567bc2959f91d9ce224bR169-R176)
that we what we call `msgpack-compact` format uses arrays for structs,
but still tags enums with types. We use a lot of enums, so there is
still quite a few strings. Ostensibly we could use [serde
attributes](https://serde.rs/container-attrs.html) to shorten names, but
it would probably be a nightmare and ugly.

Nevertheless if we generated C++ code to deal with arrays, we could save
some space.

And if we want to stick to `bincode`, we can use
`NOIR_SERIALIZATION_FORMAT=bincode`, which I back ported to the Noir
codegen PR, to generate `bincode` with a format byte marker. There is
also `bincode-legacy`, but unfortunately we only have one shot at
deserialising bincode in C++: if it fails, we can't catch the exception.
Therefore the path to be able to use the bincode format marker is:
1. Release `bb` which can handle the `msgpack` format (which has a
probe, doesn't have to throw)
2. Start producing msgpack data from `nargo` 
3. Stop accepting unmarked bincode in `bb` and look for format byte == 1
to show that bincode is in use
4. Tell `nargo` which format to use

EDIT: Unfortunately if we use `binpack` with today's data types it
forces us to parse the Brillig part, as established by
AztecProtocol/aztec-packages#13143 which would
mean the Noir team can't make any changes to Brillig opcodes without
breaking `bb`. We would need to change the format again to use two tier
encoding, or use msgpack arrays.
@alexghr
Copy link
Contributor

alexghr commented Jun 9, 2025

Hello, we have moved to a development model where master only accepts bugfixes. All development work should now target the next branch. If this change is still important, rebase on top of next and open a new PR.

@alexghr alexghr closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

trigger-workflow Used by ci script to manually trigger CI3.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants