Skip to content

fix(licm): Account for nested loops being control dependent when analyzing outer loops#8593

Merged
vezenovm merged 10 commits intomasterfrom
mv/check-for-control-dep-nested-loop
May 20, 2025
Merged

fix(licm): Account for nested loops being control dependent when analyzing outer loops#8593
vezenovm merged 10 commits intomasterfrom
mv/check-for-control-dep-nested-loop

Conversation

@vezenovm
Copy link
Contributor

@vezenovm vezenovm commented May 20, 2025

Description

Problem*

Resolves #8586

The bug was that we were not appropriately accounting for a nested loop being control dependent. Taking the issue snippet:

fn main(a: pub bool) {
    for _ in 0..1 {
        if a {
            for _ in 0..1 {
                let _ = (1 / (a as Field));
            }
        };
    }
}

For completeness this has the following SSA before LICM:

Details

        acir(inline) predicate_pure fn main f0 {
          b0(v0: u1):
            jmp b1(u32 0)
          b1(v1: u32):
            v4 = eq v1, u32 0
            jmpif v4 then: b2, else: b3
          b2():
            jmpif v0 then: b4, else: b5
          b3():
            return
          b4():
            jmp b6(u32 0)
          b5():
            v7 = unchecked_add v1, u32 1
            jmp b1(v7)
          b6(v2: u32):
            v5 = eq v2, u32 0
            jmpif v5 then: b7, else: b8
          b7():
            v8 = cast v0 as Field
            v10 = div Field 1, v8
            v11 = unchecked_add v2, u32 1
            jmp b6(v11)
          b8():
            jmp b5()
        }

When checking whether inner loop block (b7) is control dependent, we will get false. This is because we are not accounting for transitive control dependence. When initially adding control dependence to LICM I felt doing so (which would most likely require a full control dependence graph (CDG)) was a bit overkill as an initial control dependent LICM could simply check for control dependence after the loop header.

Summary*

A full CDG I think would still be overkill at the moment and I simply expanded our control dependence to check for whether the block we are processing is within a nested loop that is control dependent.

Changes:

  1. We now pass all the remaining loops we have yet to process
  2. When checking whether a block is control dependent we now iterate through all loops, skipping potential nested loops where the block being checked for control dependence is not in the nested loop (this simply means we do not have a relevant nested loop).
  3. If we find a nested loop we run the same check where we look at whether the current block is control dependent on any of the predecessors between itself and the outer loop header. Except now, we are checking the control dependence on the nested loop header rather than the current block.
  4. If we have a control dependent nested loop (like in the issue snippet), that block will be marked as control dependent for the outer loop LICM. Once we get to the inner loop LICM the control dependence checks will reset, and if appropriate we can still hoist out of the inner loop.
  5. I also added a cache of nested loop control dependent blocks. This is an optimization to remove redundant control dependence checks.
    As per the new field description:
    /// Caches all blocks that belong to nested loops determined to be control dependent
    /// on blocks in an outer loop. This allows short circuiting future control dependence
    /// checks during loop invariant analysis, as these blocks are guaranteed to be
    /// control dependent due to the entire nested loop being control dependent.
    ///
    /// Populated during analysis of outer loops and reset for each new loop.
    nested_loop_control_dependent_blocks: HashSet<BasicBlockId>,

Additional Context

I marked to show benchmarks as this PR's approach requires checking all loops to find nested loops. If we were to model nested loops explicitly using some kind of tree structure, the approach taken in this PR would be simplified and most likely much more efficient. See #8593 (comment). I felt an explicit nested loop structure is best left for follow-up or if this code becomes a bottleneck.

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@vezenovm vezenovm added the bench-show Display benchmark results on PR label May 20, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACVM Benchmarks

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
purely_sequential_opcodes 257808 ns/iter (± 1615) 266084 ns/iter (± 1966) 0.97
perfectly_parallel_opcodes 225618 ns/iter (± 1777) 232810 ns/iter (± 1166) 0.97
perfectly_parallel_batch_inversion_opcodes 3582833 ns/iter (± 9513) 3231134 ns/iter (± 53818) 1.11

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation Time

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
private-kernel-inner 2.624 s 2.326 s 1.13
private-kernel-reset 6.556 s 6.47 s 1.01
private-kernel-tail 1.162 s 1.128 s 1.03
rollup-base-private 15.92 s 16.4 s 0.97
rollup-base-public 12.88 s 13.2 s 0.98
rollup-block-root-empty 1.276 s 1.294 s 0.99
rollup-block-root-single-tx 128 s 132 s 0.97
rollup-block-root 139 s 124 s 1.12
rollup-merge 1.086 s 1.102 s 0.99
rollup-root 1.668 s 1.82 s 0.92
semaphore-depth-10 0.816 s 0.847 s 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution Time

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
private-kernel-inner 0.028 s 0.028 s 1
private-kernel-reset 0.162 s 0.162 s 1
private-kernel-tail 0.012 s 0.011 s 1.09
rollup-base-private 0.306 s 0.31 s 0.99
rollup-base-public 0.194 s 0.195 s 0.99
rollup-block-root 11.8 s 11.6 s 1.02
rollup-merge 0.004 s 0.004 s 1
rollup-root 0.013 s 0.014 s 0.93
semaphore-depth-10 0.019 s 0.02 s 0.95

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact Size

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
private-kernel-inner 1129.7 KB 1129.7 KB 1
private-kernel-reset 2050.5 KB 2050.5 KB 1
private-kernel-tail 583.5 KB 583.5 KB 1
rollup-base-private 5129.9 KB 5129.9 KB 1
rollup-base-public 3952.1 KB 3952.1 KB 1
rollup-block-root-empty 256.7 KB 256.7 KB 1
rollup-block-root-single-tx 25697.5 KB 25697.5 KB 1
rollup-block-root 25720.8 KB 25720.8 KB 1
rollup-merge 181.7 KB 181.7 KB 1
rollup-root 478.4 KB 478.4 KB 1
semaphore-depth-10 636.3 KB 636.3 KB 1

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Suite Duration

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
test_report_AztecProtocol_aztec-packages_noir-projects_aztec-nr 64 s 65 s 0.98
test_report_AztecProtocol_aztec-packages_noir-projects_noir-contracts 99 s 96 s 1.03
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_blob 44 s 43 s 1.02
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_private-kernel-lib 194 s 196 s 0.99
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_rollup-lib 200 s 189 s 1.06
test_report_AztecProtocol_aztec-packages_noir-projects_noir-protocol-circuits_crates_types 61 s 60 s 1.02
test_report_noir-lang_noir-bignum_ 378 s 379 s 1.00
test_report_noir-lang_noir_bigcurve_ 219 s 224 s 0.98
test_report_noir-lang_sha512_ 30 s 31 s 0.97
test_report_zkpassport_noir_rsa_ 23 s 23 s 1

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compilation Memory

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
private-kernel-inner 332.43 MB 332.45 MB 1.00
private-kernel-reset 571.71 MB 571.71 MB 1
private-kernel-tail 232.23 MB 232.23 MB 1
rollup-base-private 1440 MB 1440 MB 1
rollup-base-public 1480 MB 1480 MB 1
rollup-block-root-empty 399.73 MB 399.72 MB 1.00
rollup-block-root-single-tx 7210 MB 7210 MB 1
rollup-block-root 7220 MB 7220 MB 1
rollup-merge 384 MB 383.98 MB 1.00
rollup-root 446.09 MB 446.06 MB 1.00
semaphore_depth_10 128.35 MB 128.35 MB 1

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution Memory

Details
Benchmark suite Current: 330d430 Previous: 0e62fed Ratio
private-kernel-inner 240.77 MB 240.77 MB 1
private-kernel-reset 263.02 MB 263.02 MB 1
private-kernel-tail 216.27 MB 216.27 MB 1
rollup-base-private 547.95 MB 547.95 MB 1
rollup-base-public 540.41 MB 540.41 MB 1
rollup-block-root 1440 MB 1440 MB 1
rollup-merge 369.9 MB 369.9 MB 1
rollup-root 375.9 MB 375.9 MB 1
semaphore_depth_10 92.92 MB 92.92 MB 1

This comment was automatically generated by workflow using github-action-benchmark.

@vezenovm
Copy link
Contributor Author

rollup-block-root 9.85 s 11.6 s 0.85

Uhm, this looks to actually have improved execution quite a decent bit. We must have been hoisting instructions from inactive branches and that hoisted instruction was itself executed many times. This isn't totally surprising though as that program uses nested loops heavily.

@vezenovm vezenovm requested a review from a team May 20, 2025 20:16
@jfecher
Copy link
Contributor

jfecher commented May 20, 2025

rollup-block-root 9.85 s 11.6 s 0.85

Looks like it updated with numbers more similar to the previous values now unfortunately

Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vezenovm
Copy link
Contributor Author

vezenovm commented May 20, 2025

Looks like it updated with numbers more similar to the previous values now unfortunately

Ah dang. Yeah looks like we execute this circuit a single time:

This was set quite some time ago (rollup-block-root used to take >40s to execute). Perhaps it is time to bump this to 5 runs (or maybe 3 runs if we think 11s is still too long for 5 runs).

@vezenovm vezenovm added this pull request to the merge queue May 20, 2025
Merged via the queue into master with commit 99ac8a0 May 20, 2025
120 checks passed
@vezenovm vezenovm deleted the mv/check-for-control-dep-nested-loop branch May 20, 2025 21:15
github-merge-queue bot pushed a commit to AztecProtocol/aztec-packages that referenced this pull request May 22, 2025
Automated pull of nightly from the
[noir](https://github.com/noir-lang/noir) programming language, a
dependency of Aztec.
BEGIN_COMMIT_OVERRIDE
fix(licm): Account for nested loops being control dependent when
analyzing outer loops (noir-lang/noir#8593)
chore(refactor): Switch unreachable function removal to use centralized
call graph (noir-lang/noir#8578)
chore(test): Allow lambdas in fuzzing
(noir-lang/noir#8584)
chore: use insta for execution_success stdout
(noir-lang/noir#8576)
chore: use generator instead of zero for ec-add predicate
(noir-lang/noir#8552)
fix: use predicate expression as binary result
(noir-lang/noir#8583)
fix(ssa): Do not generate apply functions when no lambda variants exist
(noir-lang/noir#8573)
chore: put `nargo expand` snapshosts in the same directory
(noir-lang/noir#8577)
chore: Use FxHashMap for TypeBindings
(noir-lang/noir#8574)
chore(experimental): use larger stack size for parsing
(noir-lang/noir#8347)
chore: use insta snapshots for compile_failure stderr
(noir-lang/noir#8569)
chore(inlining): Mark functions with <= 10 instructions and no control
flow as inline always (noir-lang/noir#8533)
chore(ssa): Add weighted edges to call graph, move callers and callees
methods to call graph (noir-lang/noir#8513)
fix(frontend): Override to allow empty array input
(noir-lang/noir#8568)
fix: avoid logging all unused params in DIE pass
(noir-lang/noir#8566)
chore: bump external pinned commits
(noir-lang/noir#8562)
chore(deps): bump base-x from 3.0.9 to 3.0.11
(noir-lang/noir#8555)
chore(fuzz): Call function pointers
(noir-lang/noir#8531)
feat: C++ codegen for msgpack
(noir-lang/noir#7716)
feat(performance): brillig array set optimization
(noir-lang/noir#8550)
chore(fuzz): AST fuzzer to use function valued arguments (Part 1)
(noir-lang/noir#8514)
fix(licm): Check whether the loop is executed when hoisting with a
predicate (noir-lang/noir#8546)
feat: Implement $crate (noir-lang/noir#8537)
fix: add offset to ArrayGet
(noir-lang/noir#8536)
chore: remove some unused enum variants and functions
(noir-lang/noir#8538)
fix: disallow `()` in entry points
(noir-lang/noir#8529)
chore: Remove println in ssa interpreter
(noir-lang/noir#8528)
fix: don't overflow when casting signed value to u128
(noir-lang/noir#8526)
chore(performance): Enable hoisting pure with predicate calls
(noir-lang/noir#8522)
feat(fuzz): AST fuzzing with SSA interpreter
(noir-lang/noir#8436)
chore: Add u1 ops to interpreter, convert Value panics to errors
(noir-lang/noir#8469)
chore: Release Noir(1.0.0-beta.6)
(noir-lang/noir#8438)
chore(fuzz): AST generator to add `ctx_limit` to all functions
(noir-lang/noir#8507)
fix(inlining): Use centralized CallGraph structure for inline info
computation (noir-lang/noir#8489)
fix: remove private builtins from `Field` impl
(noir-lang/noir#8496)
feat: primitive types are no longer keywords
(noir-lang/noir#8470)
fix: parenthesized pattern, and correct 1-element tuple printing
(noir-lang/noir#8482)
fix: fix visibility of methods in `std::meta`
(noir-lang/noir#8497)
fix: Change `can_be_main` to be recursive
(noir-lang/noir#8501)
chore: add SSA interpreter test for higher order functions
(noir-lang/noir#8486)
fix(frontend)!: Ban zero sized arrays and strings as program input
(noir-lang/noir#8491)
fix!: remove `to_be_radix` and `to_le_radix` from stdlib interface
(noir-lang/noir#8495)
END_COMMIT_OVERRIDE

---------

Co-authored-by: AztecBot <tech@aztecprotocol.com>
Co-authored-by: Tom French <15848336+TomAFrench@users.noreply.github.com>
AztecBot added a commit to AztecProtocol/aztec-nr that referenced this pull request May 23, 2025
Automated pull of nightly from the
[noir](https://github.com/noir-lang/noir) programming language, a
dependency of Aztec.
BEGIN_COMMIT_OVERRIDE
fix(licm): Account for nested loops being control dependent when
analyzing outer loops (noir-lang/noir#8593)
chore(refactor): Switch unreachable function removal to use centralized
call graph (noir-lang/noir#8578)
chore(test): Allow lambdas in fuzzing
(noir-lang/noir#8584)
chore: use insta for execution_success stdout
(noir-lang/noir#8576)
chore: use generator instead of zero for ec-add predicate
(noir-lang/noir#8552)
fix: use predicate expression as binary result
(noir-lang/noir#8583)
fix(ssa): Do not generate apply functions when no lambda variants exist
(noir-lang/noir#8573)
chore: put `nargo expand` snapshosts in the same directory
(noir-lang/noir#8577)
chore: Use FxHashMap for TypeBindings
(noir-lang/noir#8574)
chore(experimental): use larger stack size for parsing
(noir-lang/noir#8347)
chore: use insta snapshots for compile_failure stderr
(noir-lang/noir#8569)
chore(inlining): Mark functions with <= 10 instructions and no control
flow as inline always (noir-lang/noir#8533)
chore(ssa): Add weighted edges to call graph, move callers and callees
methods to call graph (noir-lang/noir#8513)
fix(frontend): Override to allow empty array input
(noir-lang/noir#8568)
fix: avoid logging all unused params in DIE pass
(noir-lang/noir#8566)
chore: bump external pinned commits
(noir-lang/noir#8562)
chore(deps): bump base-x from 3.0.9 to 3.0.11
(noir-lang/noir#8555)
chore(fuzz): Call function pointers
(noir-lang/noir#8531)
feat: C++ codegen for msgpack
(noir-lang/noir#7716)
feat(performance): brillig array set optimization
(noir-lang/noir#8550)
chore(fuzz): AST fuzzer to use function valued arguments (Part 1)
(noir-lang/noir#8514)
fix(licm): Check whether the loop is executed when hoisting with a
predicate (noir-lang/noir#8546)
feat: Implement $crate (noir-lang/noir#8537)
fix: add offset to ArrayGet
(noir-lang/noir#8536)
chore: remove some unused enum variants and functions
(noir-lang/noir#8538)
fix: disallow `()` in entry points
(noir-lang/noir#8529)
chore: Remove println in ssa interpreter
(noir-lang/noir#8528)
fix: don't overflow when casting signed value to u128
(noir-lang/noir#8526)
chore(performance): Enable hoisting pure with predicate calls
(noir-lang/noir#8522)
feat(fuzz): AST fuzzing with SSA interpreter
(noir-lang/noir#8436)
chore: Add u1 ops to interpreter, convert Value panics to errors
(noir-lang/noir#8469)
chore: Release Noir(1.0.0-beta.6)
(noir-lang/noir#8438)
chore(fuzz): AST generator to add `ctx_limit` to all functions
(noir-lang/noir#8507)
fix(inlining): Use centralized CallGraph structure for inline info
computation (noir-lang/noir#8489)
fix: remove private builtins from `Field` impl
(noir-lang/noir#8496)
feat: primitive types are no longer keywords
(noir-lang/noir#8470)
fix: parenthesized pattern, and correct 1-element tuple printing
(noir-lang/noir#8482)
fix: fix visibility of methods in `std::meta`
(noir-lang/noir#8497)
fix: Change `can_be_main` to be recursive
(noir-lang/noir#8501)
chore: add SSA interpreter test for higher order functions
(noir-lang/noir#8486)
fix(frontend)!: Ban zero sized arrays and strings as program input
(noir-lang/noir#8491)
fix!: remove `to_be_radix` and `to_le_radix` from stdlib interface
(noir-lang/noir#8495)
END_COMMIT_OVERRIDE

---------

Co-authored-by: AztecBot <tech@aztecprotocol.com>
Co-authored-by: Tom French <15848336+TomAFrench@users.noreply.github.com>
Thunkar pushed a commit to AztecProtocol/aztec-packages that referenced this pull request May 23, 2025
Automated pull of nightly from the
[noir](https://github.com/noir-lang/noir) programming language, a
dependency of Aztec.
BEGIN_COMMIT_OVERRIDE
fix(licm): Account for nested loops being control dependent when
analyzing outer loops (noir-lang/noir#8593)
chore(refactor): Switch unreachable function removal to use centralized
call graph (noir-lang/noir#8578)
chore(test): Allow lambdas in fuzzing
(noir-lang/noir#8584)
chore: use insta for execution_success stdout
(noir-lang/noir#8576)
chore: use generator instead of zero for ec-add predicate
(noir-lang/noir#8552)
fix: use predicate expression as binary result
(noir-lang/noir#8583)
fix(ssa): Do not generate apply functions when no lambda variants exist
(noir-lang/noir#8573)
chore: put `nargo expand` snapshosts in the same directory
(noir-lang/noir#8577)
chore: Use FxHashMap for TypeBindings
(noir-lang/noir#8574)
chore(experimental): use larger stack size for parsing
(noir-lang/noir#8347)
chore: use insta snapshots for compile_failure stderr
(noir-lang/noir#8569)
chore(inlining): Mark functions with <= 10 instructions and no control
flow as inline always (noir-lang/noir#8533)
chore(ssa): Add weighted edges to call graph, move callers and callees
methods to call graph (noir-lang/noir#8513)
fix(frontend): Override to allow empty array input
(noir-lang/noir#8568)
fix: avoid logging all unused params in DIE pass
(noir-lang/noir#8566)
chore: bump external pinned commits
(noir-lang/noir#8562)
chore(deps): bump base-x from 3.0.9 to 3.0.11
(noir-lang/noir#8555)
chore(fuzz): Call function pointers
(noir-lang/noir#8531)
feat: C++ codegen for msgpack
(noir-lang/noir#7716)
feat(performance): brillig array set optimization
(noir-lang/noir#8550)
chore(fuzz): AST fuzzer to use function valued arguments (Part 1)
(noir-lang/noir#8514)
fix(licm): Check whether the loop is executed when hoisting with a
predicate (noir-lang/noir#8546)
feat: Implement $crate (noir-lang/noir#8537)
fix: add offset to ArrayGet
(noir-lang/noir#8536)
chore: remove some unused enum variants and functions
(noir-lang/noir#8538)
fix: disallow `()` in entry points
(noir-lang/noir#8529)
chore: Remove println in ssa interpreter
(noir-lang/noir#8528)
fix: don't overflow when casting signed value to u128
(noir-lang/noir#8526)
chore(performance): Enable hoisting pure with predicate calls
(noir-lang/noir#8522)
feat(fuzz): AST fuzzing with SSA interpreter
(noir-lang/noir#8436)
chore: Add u1 ops to interpreter, convert Value panics to errors
(noir-lang/noir#8469)
chore: Release Noir(1.0.0-beta.6)
(noir-lang/noir#8438)
chore(fuzz): AST generator to add `ctx_limit` to all functions
(noir-lang/noir#8507)
fix(inlining): Use centralized CallGraph structure for inline info
computation (noir-lang/noir#8489)
fix: remove private builtins from `Field` impl
(noir-lang/noir#8496)
feat: primitive types are no longer keywords
(noir-lang/noir#8470)
fix: parenthesized pattern, and correct 1-element tuple printing
(noir-lang/noir#8482)
fix: fix visibility of methods in `std::meta`
(noir-lang/noir#8497)
fix: Change `can_be_main` to be recursive
(noir-lang/noir#8501)
chore: add SSA interpreter test for higher order functions
(noir-lang/noir#8486)
fix(frontend)!: Ban zero sized arrays and strings as program input
(noir-lang/noir#8491)
fix!: remove `to_be_radix` and `to_le_radix` from stdlib interface
(noir-lang/noir#8495)
END_COMMIT_OVERRIDE

---------

Co-authored-by: AztecBot <tech@aztecprotocol.com>
Co-authored-by: Tom French <15848336+TomAFrench@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bench-show Display benchmark results on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loop invariant motion: failing constraint hoisted out of for/if/for

2 participants