chore: improve register moves in brillig return code-gen#10305
chore: improve register moves in brillig return code-gen#10305
Conversation
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Execution Time'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: 2ea4158 | Previous: b34828f | Ratio |
|---|---|---|---|
rollup-block-root-single-tx |
0.003 s |
0.002 s |
1.50 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Execution Memory'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: d89896e | Previous: dd8b7e8 | Ratio |
|---|---|---|---|
rollup-checkpoint-root-single-block |
5910 MB |
1020 MB |
5.79 |
rollup-checkpoint-root |
5920 MB |
1020 MB |
5.80 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
Changes to Brillig bytecode sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
Changes to number of Brillig opcodes executed
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Test Suite Duration'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: 2ea4158 | Previous: b34828f | Ratio |
|---|---|---|---|
test_report_zkpassport_noir_rsa_ |
2 s |
1 s |
2 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'ACVM Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.
| Benchmark suite | Current: 4fa38db | Previous: 6826de8 | Ratio |
|---|---|---|---|
perfectly_parallel_batch_inversion_opcodes |
2794853 ns/iter (± 4395) |
2259734 ns/iter (± 10347) |
1.24 |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @TomAFrench
vezenovm
left a comment
There was a problem hiding this comment.
I like the removal of the loop detector. It would be nice to unify our index representation though.
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
|
I'll go over this again to understand it; it passes the randomised test, so it should be correct. On a conceptual level, I liked having the more general algorithm available, even if it was only used for this specific task of copying to the beginning of the memory space, and wouldn't handle more more movements than the stack size due to recursion. |
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
|
Do you think it would be difficult to have an iterative algorithm that isn't restricted in its destinations to be 1,2,3... ? |
aakoshh
left a comment
There was a problem hiding this comment.
I believe the algorithm is correct, just a bit more difficult for me to understand than the original, but that's personal, for you it was the opposite.
- It removes the potential stack overflow of the original recursive algorithm, but we assume the input will never be large enough to hit that limit.
- It also removes the generality of the original when it comes to the range of acceptable destinations, which feels wrong to do, but we never call it with anything but the input it expects.
I'll leave it up to @vezenovm to give a final verdict ![]()
I don't think it would that difficult, just a bit more complicated. One more indirection where currently, the i-th destination is simply i, and testing if a node is a destination is easy: 'is it below n?' |
There was a problem hiding this comment.
I am good with the new algorithm. It is a bit harder to understand but it should be more efficient and avoids recursion. I tried to trigger a recursive stack overflow in Noir but I was unable to do so. I think it is better to be defensive here as our stack limits or register allocation algorithms could change in the future.
I was able to trigger a stack overflow on master with this test:
/// This creates a chain 0->1->2->...->N where N is large enough to overflow the stack
#[test]
fn test_deep_chain() {
// Each movement is i -> i+1, creating a single long chain
const CHAIN_DEPTH: usize = 10_000;
let movements: Vec<(usize, usize)> = (0..CHAIN_DEPTH)
.map(|i| (i, i + 1))
.collect();
let (sources, destinations) = movements_to_source_and_destinations(movements);
let mut context = create_context();
// This should overflow the stack with recursive implementation
context.codegen_mov_registers_to_registers(&sources, &destinations);
}Maybe we could try and add a similar test before merging? Although we know it will not stack overflow as it is now an iterative algorithm so maybe it is redundant.
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
compiler/noirc_evaluator/src/brillig/brillig_ir/codegen_stack.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Maxim Vezenov <mvezenov@gmail.com>
|
I created issue #10494, which is a follow-up to this PR. |
Automated pull of nightly from the [noir](https://github.com/noir-lang/noir) programming language, a dependency of Aztec. BEGIN_COMMIT_OVERRIDE fix(fuzzer): Set `in_dynamic` in `gen_match` (noir-lang/noir#10470) chore(elaborator): Check that assert message fragments are ABI compatible (noir-lang/noir#10491) feat(fuzz): Add support for more functions in comptime_vs_brillig_direct (noir-lang/noir#10500) chore(frontend): Correct type for struct field on type mismatch and extra negative case unit tests (noir-lang/noir#10493) chore: improve register moves in brillig return code-gen (noir-lang/noir#10305) END_COMMIT_OVERRIDE
Automated pull of nightly from the [noir](https://github.com/noir-lang/noir) programming language, a dependency of Aztec. BEGIN_COMMIT_OVERRIDE fix(fuzzer): Set `in_dynamic` in `gen_match` (noir-lang/noir#10470) chore(elaborator): Check that assert message fragments are ABI compatible (noir-lang/noir#10491) feat(fuzz): Add support for more functions in comptime_vs_brillig_direct (noir-lang/noir#10500) chore(frontend): Correct type for struct field on type mismatch and extra negative case unit tests (noir-lang/noir#10493) chore: improve register moves in brillig return code-gen (noir-lang/noir#10305) END_COMMIT_OVERRIDE
Description
Problem*
For returning n values from a Brillig functions, code-gen copy them into registers 1,..n, so that the caller knows where to fetch the returned values.
When the return values overlap with the target registers 1,..,n, we cannot simply move each register to its target.
In order to minimise the cost of these copies, an analysis is performed on the movement graph of the registers.
As part of the internal audit, I have noticed a few improvements and implemented them here.
Summary*
This PR improves on the previous implementation on the following points;
In the worst case, create one temporary register at most, instead of one per loop, and re-use a register from the graph when possible.Additional Context
I had to remove the 'one register for all the loops' optimisation, because the data written to it, coming from several registers, can have different types, in which case the Brillig VM will complain.
Documentation*
Check one:
PR Checklist*
cargo fmton default settings.