feat(bb): extend_edges optimization for zero values past end_index#12703
feat(bb): extend_edges optimization for zero values past end_index#12703
Conversation
| for (auto [poly, full_poly] : zip_view(get_all(), full_polynomials.get_all())) { | ||
| // After the initial sumcheck round, the new size is CEIL(size/2). | ||
| size_t desired_size = std::min(full_poly.end_index() / 2 + full_poly.end_index() % 2, circuit_size / 2); | ||
| size_t desired_size = full_poly.end_index() / 2 + full_poly.end_index() % 2; |
There was a problem hiding this comment.
This was a change which was forgotten in a previous PR.
| extended_edge = edge.template extend_to<MAX_PARTIAL_RELATION_LENGTH>(); | ||
| if (multivariate.end_index() < edge_idx) { | ||
| extended_edge = | ||
| bb::Univariate<FF, MAX_PARTIAL_RELATION_LENGTH>(std::array<FF, MAX_PARTIAL_RELATION_LENGTH>{}); |
There was a problem hiding this comment.
(Just taking a peek out of curiosity). I think it would be clearer here to use something like bb::Univariate<FF, MAX_PARTIAL_RELATION_LENGTH>::zero() (which I believe exists). Possibly even better to construct such a thing statically once and for all but probably not a big difference
There was a problem hiding this comment.
I also wonder if this indicates there's a more fundamental optimization somewhere. E.g. does this imply that we're going on to do a bunch of computation with zero univariates?
There was a problem hiding this comment.
@ledwards2225 In accumulate_relation_univariates(), we then have computation on zero univariates but I think that this is strongly mitigated by the skippable mechanism. There might be some fine tuning but I think this is less trivial.
There was a problem hiding this comment.
@ledwards2225 I pushed the suggested changes on using zero() and using a static const variable.
534bdab to
575f636
Compare
|
Nice! I like these low hanging fruits. |
575f636 to
4af8259
Compare
lucasxia01
left a comment
There was a problem hiding this comment.
Nice! I wonder if there's any other juice to be squeezed by using end_index instead of round_size in some of these other functions. Seems harder though
* master: fix: filter for empty attestations when pulling from block (#12740) feat: one-way noir sync (#12592) feat(avm): Address derivation gadget (#12721) chore(ci): add workflow dispatch to masternet (#12739) feat: add default accounts (#12734) feat: gas reports and snapshots (#12724) fix(avm): fix vm1 fake proof size (#12733) feat(bb): extend_edges optimization for zero values past end_index (#12703) fix: remove hard coding of constructor for account manager (#12678) git subrepo push --branch=master noir-projects/aztec-nr git_subrepo.sh: Fix parent in .gitrepo file. [skip ci] chore: replace relative paths to noir-protocol-circuits git subrepo push --branch=master barretenberg fix: verify_honk_proof inputs generation in bootstrap (#12457) fix: Patches to cycle_group and cycle_group fuzzer (#12385)
…2707) This PR introduces a more even vertically distributed trace chunks processing among threads in the univariate computation as part of sumcheck. This leads to a substantial speed up of sumcheck. Let `t` be the number of threads. The processing of the rows of a circuit used to be * thread_1 (`round_size/t` rows) * ... * thread_t (`round_size/t` rows) [end of circuit] while this PR introduces the possibility of having the processing be interleaved, and therefore the load is more uniformly balanced across threads: * thread_1 (`chunk_thread_portion_size` rows) * ... * thread_t (`chunk_thread_portion_size` rows) * thread_1 (`chunk_thread_portion_size` rows) * ... * thread_t (`chunk_thread_portion_size` rows) * ... * [end of circuit] This PR improves #12703 measurement from 8.5 seconds to 2.4 seconds. --------- Co-authored-by: fcarreiro <facundo@aztecprotocol.com>
On 16 cores, avm bulk test v2, sumcheck time improves from 16 seconds to 8.5 seconds.