feat(ssa): constant_folding with revisits#10013
Conversation
Changes to Brillig bytecode sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
Changes to number of Brillig opcodes executed
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Execution Time'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
rollup-block-root-single-tx |
0.004 s |
0.002 s |
2 |
semaphore-depth-10 |
0.047 s |
0.036 s |
1.31 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Execution Memory'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: b445198 | Previous: 075a31b | Ratio |
|---|---|---|---|
rollup-checkpoint-root-single-block |
9680 MB |
1030 MB |
9.40 |
rollup-checkpoint-root |
9690 MB |
1030 MB |
9.41 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Test Suite Duration'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
test_report_zkpassport_noir-ecdsa_ |
2 s |
1 s |
2 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
|
|
||
| // Because we removed some instruction from this block, we will have to (re)visit its successors. | ||
| if origin != block { | ||
| origins.reused.insert(origin); |
There was a problem hiding this comment.
I think these cascading optimizations only occur in the NeedToHoistToCommonBlock case. Looking at #9963 (comment), b1 does not dominate b3 (as every path from the entry to b3 does not have to go through b1). If b1 were to dominate b3 we would simply remove the instruction in b3 and be left with the single instruction in b1. The issue is that upon hoisting we are not actually folding the instruction that triggered the hoisting in the first place.
There was a problem hiding this comment.
You are right, and the revisits are only triggered by hoisting. This instruction is here for a different reason, it is to prevent the "missing instruction" failure in avoid_unmapped_instructions_during_revisit.
There is an explanation in that test. The problem is that if between b1 and b3 we were to find a block b2 which b1 dominates, then in b2 we would just reuse the instruction results from b1. Great, but then b3 causes the instruction that created those results to be hoisted from b3 into b0, and then be replaced in b1 by the new results of the hoisted instruction. If we don't revisit b2, then it will have some instruction ID in its block that doesn't exist any more, so we need to go and do the values_to_replace chore on it.
There was a problem hiding this comment.
Got it. I did only skim the PR, but it is a little bit difficult to keep track of these revisits. I think it may be simpler to just do a fixed point loop but with some informed starting point as outlined in #9963 (comment). Rather than trying to be so granular in a single pass, we can just track the first origin for each NeedToHoistToCommonBlock case. After the entire constant_folding pass is completed we just kick start another pass at the first origin in our RPO. We then keep looping until no more instructions are folded.
There was a problem hiding this comment.
That would also give us a chance to clear out the values_to_replace cache, although I think it would still keep the transient instructions in the DFG.
A slightly less granular option would be to just clear the "visited" from the block_queue to allow all descendant to be reprocessed, not just the dependant ones.
There was a problem hiding this comment.
We could also experiment with not even trying to start at the first origin of the RPO and just go through all of the origins until instructions are no longer being folded.
There was a problem hiding this comment.
It looks like the performance alert is just mislabeled. We can see the compilation memory degradation here #10013 (review)
There was a problem hiding this comment.
If that's the degradation, it's all +/- 1%, that would be great.
I just can't find the following on the list at #10013 (review):

Which one is mislabelled you think?
There was a problem hiding this comment.
Sorry I was looking at the wrong thing 🤦. If we look at the comment history for #10013 (review) we see those circuits which we alerted for in the first comment but not the later ones. Possibly a bug in the reports.
There was a problem hiding this comment.
Ah those reports got cancelled https://github.com/noir-lang/noir/actions/runs/18041209670/job/51340643425?pr=10013
There was a problem hiding this comment.
Yes, I suspected those could have been the ones to bring fresh data.
Some tests are much slower as well, in particular execution_success/slices takes ages, while it's fast on master. I'll try to see if there is some simple limit I could add to stop it from spinning its wheels.
These turn out to be slow on ❯ cargo nextest run --cargo-quiet -p nargo_cli --test execute 5252
────────────
Nextest run ID 3375c065-65fe-474f-a30b-2524979cf728 with nextest profile: default
Starting 14 tests across 1 binary (5581 tests skipped)
PASS [ 2.820s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_true_inliner_i64_max_expects
PASS [ 2.979s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_true_inliner_0_expects
PASS [ 3.055s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_false_inliner_i64_min_expects
PASS [ 3.056s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_false_inliner_0_expects
PASS [ 3.062s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_false_inliner_i64_max_expects
PASS [ 3.069s] nargo_cli::execute tests::interpret_execution_success::test_regression_5252::forcebrillig_true_inliner_i64_min_expects
PASS [ 18.579s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_false_inliner_i64_min_expects
PASS [ 18.707s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_true_inliner_0_expects
PASS [ 21.785s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_false_inliner_i64_max_expects
PASS [ 21.917s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_true_inliner_i64_min_expects
PASS [ 22.064s] nargo_cli::execute tests::minimal_execution_success::test_regression_5252::forcebrillig_false_inliner_0_expects
PASS [ 25.064s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_false_inliner_0_expects
PASS [ 25.209s] nargo_cli::execute tests::execution_success::test_regression_5252::forcebrillig_true_inliner_i64_max_expects
PASS [ 34.463s] nargo_cli::execute tests::nargo_expand_execution_success::test_regression_5252
────────────
Summary [ 34.522s] 14 tests run: 14 passed, 5581 skipped
❯ cargo nextest run --cargo-quiet -p nargo_cli --test execute fold_2_to_17
────────────
Nextest run ID 8eefa93a-0e63-40c0-aa29-10fc243422bf with nextest profile: default
Starting 14 tests across 1 binary (5581 tests skipped)
PASS [ 8.060s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_false_inliner_i64_min_expects
PASS [ 15.213s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_false_inliner_0_expects
PASS [ 20.649s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_true_inliner_i64_min_expects
PASS [ 23.991s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_true_inliner_i64_max_expects
PASS [ 27.638s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_true_inliner_0_expects
PASS [ 33.247s] nargo_cli::execute tests::interpret_execution_success::test_fold_2_to_17::forcebrillig_false_inliner_i64_max_expects
PASS [ 37.328s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_true_inliner_0_expects
PASS [ 37.462s] nargo_cli::execute tests::minimal_execution_success::test_fold_2_to_17::forcebrillig_false_inliner_0_expects
PASS [ 39.481s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_false_inliner_i64_max_expects
PASS [ 41.508s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_false_inliner_i64_min_expects
PASS [ 41.548s] nargo_cli::execute tests::nargo_expand_execution_success::test_fold_2_to_17
PASS [ 41.672s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_true_inliner_i64_max_expects
PASS [ 43.480s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_false_inliner_0_expects
PASS [ 43.636s] nargo_cli::execute tests::execution_success::test_fold_2_to_17::forcebrillig_true_inliner_i64_min_expects
────────────
Summary [ 43.679s] 14 tests run: 14 passed, 5581 skipped
❯ git rev-parse HEAD
6425a803f3045687174ce97650b88f4adc286787 |
|
This seems like something that can be delayed until post 1.0. |
I agree it's speculative, but I'd lift the two "Other changes:" from it, as they have a benefit on their own and require no looping, just doing less work. |
There was a problem hiding this comment.
ACVM Benchmarks
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
purely_sequential_opcodes |
251117 ns/iter (± 535) |
251566 ns/iter (± 1107) |
1.00 |
perfectly_parallel_opcodes |
221914 ns/iter (± 7275) |
222720 ns/iter (± 1455) |
1.00 |
perfectly_parallel_batch_inversion_opcodes |
2778277 ns/iter (± 1279) |
2781636 ns/iter (± 2135) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Compilation Time
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
1.802 s |
1.806 s |
1.00 |
private-kernel-reset |
7.746 s |
8.17 s |
0.95 |
private-kernel-tail |
1.452 s |
1.372 s |
1.06 |
rollup-block-root-first-empty-tx |
1.434 s |
1.452 s |
0.99 |
rollup-block-root-single-tx |
1.34 s |
1.38 s |
0.97 |
rollup-block-root |
1.48 s |
1.42 s |
1.04 |
rollup-checkpoint-merge |
1.492 s |
1.502 s |
0.99 |
rollup-checkpoint-root-single-block |
193 s |
192 s |
1.01 |
rollup-checkpoint-root |
195 s |
202 s |
0.97 |
rollup-root |
1.472 s |
1.488 s |
0.99 |
rollup-tx-base-private |
18.7 s |
18.82 s |
0.99 |
rollup-tx-base-public |
77.32 s |
76.54 s |
1.01 |
rollup-tx-merge |
1.416 s |
1.436 s |
0.99 |
semaphore-depth-10 |
0.763 s |
0.776 s |
0.98 |
sha512-100-bytes |
1.75 s |
1.576 s |
1.11 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Execution Time
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
0.014 s |
0.013 s |
1.08 |
private-kernel-reset |
0.161 s |
0.161 s |
1 |
private-kernel-tail |
0.01 s |
0.01 s |
1 |
rollup-block-root-first-empty-tx |
0.003 s |
0.003 s |
1 |
rollup-block-root-single-tx |
0.004 s |
0.002 s |
2 |
rollup-block-root |
0.004 s |
0.004 s |
1 |
rollup-checkpoint-merge |
0.003 s |
0.004 s |
0.75 |
rollup-checkpoint-root-single-block |
14.4 s |
14.1 s |
1.02 |
rollup-checkpoint-root |
12.6 s |
12.7 s |
0.99 |
rollup-root |
0.004 s |
0.004 s |
1 |
rollup-tx-base-private |
0.308 s |
0.31 s |
0.99 |
rollup-tx-base-public |
0.247 s |
0.252 s |
0.98 |
rollup-tx-merge |
0.002 s |
0.002 s |
1 |
semaphore-depth-10 |
0.047 s |
0.036 s |
1.31 |
sha512-100-bytes |
0.061 s |
0.052 s |
1.17 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Artifact Size
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
740.3 KB |
740.3 KB |
1 |
private-kernel-reset |
2087.3 KB |
2087.3 KB |
1 |
private-kernel-tail |
543.8 KB |
543.7 KB |
1.00 |
rollup-block-root-first-empty-tx |
172.2 KB |
172.2 KB |
1 |
rollup-block-root-single-tx |
170.9 KB |
170.9 KB |
1 |
rollup-block-root |
234 KB |
234 KB |
1 |
rollup-checkpoint-merge |
386.4 KB |
386.4 KB |
1 |
rollup-checkpoint-root-single-block |
27907 KB |
27899.6 KB |
1.00 |
rollup-checkpoint-root |
27939.7 KB |
27936.7 KB |
1.00 |
rollup-root |
415.1 KB |
415.1 KB |
1 |
rollup-tx-base-private |
5120.6 KB |
5120.6 KB |
1 |
rollup-tx-base-public |
5108.5 KB |
5108.5 KB |
1 |
rollup-tx-merge |
186.9 KB |
186.9 KB |
1 |
semaphore-depth-10 |
570.7 KB |
570.7 KB |
1 |
sha512-100-bytes |
506.4 KB |
506.4 KB |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Opcode count
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
15937 opcodes |
15937 opcodes |
1 |
private-kernel-reset |
76858 opcodes |
76858 opcodes |
1 |
private-kernel-tail |
11710 opcodes |
11710 opcodes |
1 |
rollup-block-root-first-empty-tx |
1351 opcodes |
1351 opcodes |
1 |
rollup-block-root-single-tx |
1047 opcodes |
1047 opcodes |
1 |
rollup-block-root |
2340 opcodes |
2340 opcodes |
1 |
rollup-checkpoint-merge |
2334 opcodes |
2334 opcodes |
1 |
rollup-checkpoint-root-single-block |
962971 opcodes |
962971 opcodes |
1 |
rollup-checkpoint-root |
964281 opcodes |
964281 opcodes |
1 |
rollup-root |
2835 opcodes |
2835 opcodes |
1 |
rollup-tx-base-private |
270240 opcodes |
270240 opcodes |
1 |
rollup-tx-base-public |
273204 opcodes |
273204 opcodes |
1 |
rollup-tx-merge |
1426 opcodes |
1426 opcodes |
1 |
semaphore-depth-10 |
5700 opcodes |
5700 opcodes |
1 |
sha512-100-bytes |
13173 opcodes |
13173 opcodes |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Execution Memory
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
255.51 MB |
255.51 MB |
1 |
private-kernel-reset |
292.82 MB |
292.82 MB |
1 |
private-kernel-tail |
243.73 MB |
243.73 MB |
1 |
rollup-block-root |
336.49 MB |
336.49 MB |
1 |
rollup-checkpoint-merge |
335.47 MB |
335.47 MB |
1 |
rollup-checkpoint-root-single-block |
1030 MB |
1030 MB |
1 |
rollup-checkpoint-root |
1030 MB |
1030 MB |
1 |
rollup-root |
336.67 MB |
336.67 MB |
1 |
rollup-tx-base-private |
456.44 MB |
456.44 MB |
1 |
rollup-tx-base-public |
540.08 MB |
540.08 MB |
1 |
rollup-tx-merge |
334.77 MB |
334.77 MB |
1 |
semaphore_depth_10 |
73.48 MB |
73.48 MB |
1 |
sha512_100_bytes |
71.73 MB |
71.73 MB |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Compilation Memory
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
private-kernel-inner |
286.48 MB |
286.46 MB |
1.00 |
private-kernel-reset |
562.6 MB |
585.23 MB |
0.96 |
private-kernel-tail |
257.13 MB |
258.72 MB |
0.99 |
rollup-block-root-first-empty-tx |
340.95 MB |
341.03 MB |
1.00 |
rollup-block-root-single-tx |
338.18 MB |
338.17 MB |
1.00 |
rollup-block-root |
341.99 MB |
341.99 MB |
1 |
rollup-checkpoint-merge |
342.07 MB |
342.09 MB |
1.00 |
rollup-checkpoint-root-single-block |
9680 MB |
9680 MB |
1 |
rollup-checkpoint-root |
9690 MB |
9690 MB |
1 |
rollup-root |
344.63 MB |
347.38 MB |
0.99 |
rollup-tx-base-private |
1470 MB |
1490 MB |
0.99 |
rollup-tx-base-public |
6960 MB |
6960 MB |
1 |
rollup-tx-merge |
339.44 MB |
339.44 MB |
1 |
semaphore_depth_10 |
96.82 MB |
107.46 MB |
0.90 |
sha512_100_bytes |
251.26 MB |
251.19 MB |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Test Suite Duration
Details
| Benchmark suite | Current: d73fed9 | Previous: 8ca4af7 | Ratio |
|---|---|---|---|
test_report_AztecProtocol_aztec-packages_noir-projects_aztec-nr |
116 s |
119 s |
0.97 |
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_blob |
256 s |
244 s |
1.05 |
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_private-kernel-lib |
211 s |
237 s |
0.89 |
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_reset-kernel-lib |
35 s |
35 s |
1 |
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_types |
133 s |
127 s |
1.05 |
test_report_noir-lang_noir-bignum_ |
159 s |
164 s |
0.97 |
test_report_noir-lang_sha256_ |
15 s |
16 s |
0.94 |
test_report_noir-lang_sha512_ |
12 s |
14 s |
0.86 |
test_report_zkpassport_noir-ecdsa_ |
2 s |
1 s |
2 |
test_report_zkpassport_noir_rsa_ |
1 s |
1 s |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
|
Closing in favour of #10019 which at least doesn't fail to compile |
Description
Problem*
Resolves #9963
Summary*
Changes
constant_foldingto schedule revisits of blocks under the following circumstances:b0 -> [b1, b2]. When inb2we find a duplicate instruction fromb1and hoist the instruction fromb2to the common denominatorb0, schedule a revisit ofb1and thenb2, so that we can deduplicate the instruction inb1with that ofb0and then we can look for any instruction that we can now again hoist fromb2orb1intob0. (It could potentially be enough to revisitb1, as it can trigger hoisting fromb2, but if that schedules a revisit ofb2, that revisit will make the original schedule be skipped).b0 -> [b1 -> b2 -> b3, b4], and we replaced an instruction inb2with the results of its duplicate inb1, then we found another duplicate of the oneb1inb4. We hoisted the instruction fromb4intob0, and revisitedb1, where we replaced it with the instruction now inb0. Then, revisitb2andb3again as a descendant ofb1, so that we can re-resolve the instructions in them.Other changes:
simplifyto not insert a new result if the instruction simplified to itself (can happen withBinary: instead ofNoneit returnsSimplifiedToInstruction(<maybe-self>), andConstraincan simplify toSimplifiedToInstructionMultiple(vec![<maybe-self>])).b1inb2, and we hoist intob0, then let the result inb0immediately become active in the cache, so that an instruction inb3is replaced, rather than hoisted.TODO:
regression_5252andfold_2_to_17Additional Context
Documentation*
Check one:
PR Checklist*
cargo fmton default settings.