Skip to content

Make UseRefs more memory efficient#44495

Merged
Keno merged 1 commit intomasterfrom
kf/userefsallocs
Mar 15, 2022
Merged

Make UseRefs more memory efficient#44495
Keno merged 1 commit intomasterfrom
kf/userefsallocs

Conversation

@Keno
Copy link
Member

@Keno Keno commented Mar 7, 2022

With #44494, this cuts about 22M allocations (out of 59M) from the
compiler benchmark in #44492. Without #44494, it still reduces
the number of allocations, but not as much. This was originally
optimized in 100666b, but the
behavior of our compiler has changed to allow inling the Tuple{UseRef, Int}
into the outer struct, forcing a reallocation on every iteration.

With #44494, this cuts about 22M allocations (out of 59M) from the
compiler benchmark in #44492. Without #44494, it still reduces
the number of allocations, but not as much. This was originally
optimized in 100666b, but the
behavior of our compiler has changed to allow inling the Tuple{UseRef, Int}
into the outer struct, forcing a reallocation on every iteration.
return OOB_TOKEN
end
end
@inline getindex(x::UseRef) = _useref_getindex(x.urs.stmt, x.op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had hoped our calling convention would have been enough already to make this inlining awkwardness unnecessary. What causes it to be needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want UseRef to be SROA'd, so that UseRefIterator can be SROA'd. Without it UseRefIterator gets allocated.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem slightly odd that that needs to be mutable, since that implies we eventually need to copy the stmt back to the Instruction steam.

@Keno
Copy link
Member Author

Keno commented Mar 7, 2022

It does seem slightly odd that that needs to be mutable, since that implies we eventually need to copy the stmt back to the Instruction steam.

Yes, that's how this API works. At the end you need to put urs[] back into the instruction stream (or whatever else you were modifying).

@Keno
Copy link
Member Author

Keno commented Mar 15, 2022

Merging this now - the test for zero allocation will be added in #44557.

@Keno Keno merged commit e082917 into master Mar 15, 2022
@Keno Keno deleted the kf/userefsallocs branch March 15, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

latency Latency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants