Conversation
With #44494, this cuts about 22M allocations (out of 59M) from the compiler benchmark in #44492. Without #44494, it still reduces the number of allocations, but not as much. This was originally optimized in 100666b, but the behavior of our compiler has changed to allow inling the Tuple{UseRef, Int} into the outer struct, forcing a reallocation on every iteration.
| return OOB_TOKEN | ||
| end | ||
| end | ||
| @inline getindex(x::UseRef) = _useref_getindex(x.urs.stmt, x.op) |
There was a problem hiding this comment.
I had hoped our calling convention would have been enough already to make this inlining awkwardness unnecessary. What causes it to be needed?
There was a problem hiding this comment.
We want UseRef to be SROA'd, so that UseRefIterator can be SROA'd. Without it UseRefIterator gets allocated.
vtjnash
left a comment
There was a problem hiding this comment.
It does seem slightly odd that that needs to be mutable, since that implies we eventually need to copy the stmt back to the Instruction steam.
Yes, that's how this API works. At the end you need to put |
|
Merging this now - the test for zero allocation will be added in #44557. |
With #44494, this cuts about 22M allocations (out of 59M) from the
compiler benchmark in #44492. Without #44494, it still reduces
the number of allocations, but not as much. This was originally
optimized in 100666b, but the
behavior of our compiler has changed to allow inling the Tuple{UseRef, Int}
into the outer struct, forcing a reallocation on every iteration.