-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More thorough aliasing detection in nonscalar copy, broadcast and assignment #25890
Conversation
base/abstractarray.jl
Outdated
""" | ||
unalias(dest, A) | ||
|
||
Return either A or a copy of A, in a rough effort to prevent modifications to `dest` from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing backticks around A
.
base/abstractarray.jl
Outdated
|
||
This function must return an object of exactly the same type as A for performance and type-stability. | ||
|
||
See also `mightalias` and `dataids`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use @ref
.
base/abstractarray.jl
Outdated
""" | ||
unalias(dest, A) = A | ||
unalias(dest, A::Array) = mightalias(dest, A) ? copy(A) : A | ||
@inline unalias(dest, As::Tuple) = (unalias(dest, As[1]), unalias(dest, tail(As))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use varargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? I liked this symmetry in the calling conventions:
function setindex(dest, source, I...)
source′ = unalias(dest, source)
I′ = unalias(dest, I)
It's also nice in that the tuple case mostly fits the explanation above without needing to call it out — it'll either return exactly the tuple or a (partial) copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess you need to choose between symmetry and consistency with setindex
.
base/abstractarray.jl
Outdated
|
||
Perform a rudimentary test to check if arrays A and B might share the same memory. | ||
|
||
Defaults to false unless `dataids` is specialized for both `A` and `B`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When in doubt, wouldn't it be better to return true
? More generally, shouldn't the functions require/guarantee that any aliasing is detected and that in doubt a copy will be made? The use of terms like "rough" and "rudimentary" doesn't inspire confidence. ;-) We don't want silent corruption to happen by default...
Or is it frequently impossible to determine whether arrays alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I don't want to inspire confidence. That's why this isn't exported. I just want to do incrementally better than we had been doing before for the builtin array types. There's an asymmetry in costs here — returning true
by default would lead to lots of unnecessary copies. But we also need to perform this check quite frequently, so I don't want the check itself to be expensive. This is the tradeoff I came to.
I initially had default fallbacks for strided arrays based on pointer(A, firstindex(A))
and pointer(A, lastindex(A))
and all other arrays based on objectid(A)
— that led to #25858. We could try to extend this more generically later.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Thanks Nanosoldier. It had flagged bonus copies coming from |
Any further thoughts here? I think this infrastructure gives us a nice place to expand this functionality in the future. The biggest question I had in implementing it is if we should try harder to detect aliasing in the generic case. I had initially had these definitions: unalias(dest, A::AbstractArray) = mightalias(dest, A) ? deepcopy(A) : A
dataids(A::AbstractArray) = A === parent(A) ? (objectid(A):objectid(A),) : dataids(parent(A))
_increasingrange(a, b) = min(a,b):max(a,b)
dataids(A::StridedArray) = (_increasingrange(UInt(pointer(A, firstindex(A))), UInt(pointer(A, lastindex(A)))),) But we could add these at any point. In fact, we could eventually lean on the GC or more private struct construction internals to recursively identify and dealias fields if we wished. |
base/abstractarray.jl
Outdated
""" | ||
unalias(dest, A) = A | ||
unalias(dest, A::Array) = mightalias(dest, A) ? copy(A) : A | ||
@inline unalias(dest, As::Tuple) = (unalias(dest, As[1]), unalias(dest, tail(As))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would make more sense as a varargs function; otherwise it seems like you're unaliasing the tuple itself (which is somewhat array-like, so could be confusing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was Milan's comment, too. I can make that change. #25890 (comment)
|
base/array.jl
Outdated
@@ -694,22 +694,18 @@ function setindex! end | |||
# These are redundant with the abstract fallbacks but needed for bootstrap | |||
function setindex!(A::Array, x, I::AbstractVector{Int}) | |||
@_propagate_inbounds_meta | |||
A === I && (I = copy(I)) | |||
for i in I | |||
J = unalias(A, I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we were just calling copy
before, maybe we should just keep doing that, and just improve the A === I
test. We won't really need the unalias
function then (which looks rather complex to implement), and this would just be written mayalias(A, I) && (I = copy(I))
I assume that in cases, the code might also just instead branch to an alternative implementation, so the copy isn't always necessary, just a defensive algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a slightly more special case, since we know that A::Array
, so if A === I
then I::Array
, too. And so we know that copy(I)::typeof(I)
. Note that unalias(dest, A::Array)
is exactly this definition — it just calls copy(A)
.
In the general case, a branch with copy(I)
or I
will be type-unstable, necessitating loop switching or a function barrier or some such. That's why custom arrays need to implement it on their own.
Sure, you'd be free to call mightalias
instead of unalias
if you know about an alternative.
I still (see #25890 (comment)) find it unsatisfying that |
I've pushed a refactoring of the API to just use @nanosoldier |
So as a compromise I tried implementing the slightly better fallback methods I described in #25890 (comment), but I ran into performance issues since One reason not to do the |
99a0697
to
6bbaa50
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The performance regressions in
|
…ignment. This commit puts together the beginnings of an "interface" for detecting and avoiding aliasing between arrays before performing some mutating operation on one of them. `A′ = unalias(dest, A)` is the main entry point; it checks to see if `dest` and `A` might alias one another, and then either returns `A` or a copy of `A`. When it returns a copy of `A`, it absolutely must preserve the same type. As such, this function is designed to be specialized on `A`, typically like: Base.unalias(dest, A::T) = mightalias(dest, A) ? typeof(A)(unalias(dest, A.a), unalias(dest, A.b), ...) : A `mightalias(A, B)` is a quick and rough check to see if two arrays alias eachother. Instead of requiring every array to know about every other array type, it simply asks both arrays for their `dataids` and then looks for an empty intersection between every pair. Finally, `dataids(A)` returns a tuple of ranges that describe something about the region of memory the array occupies. It could be simply `(objectid(A):objectid(A),)` or for an array with multiple fields, `(dataids(A.a)..., dataids(A.b)..., ...)`.
And fixup the assumption that `broadcast!(f, C, A)` is safe
a builtin mapper method. Using varargs would require de-structuring the one arg case: `(a,) = unalias(dest, a)`.
…rray... to only look at the linear data segment of its parent that it can reach
6bbaa50
to
d97d0c3
Compare
Ok, I've added two commits that try to ameliorate those performance concerns (2254dc5 and d97d0c3 — GitHub is showing them out of order due to a rebase). Unfortunately, d97d0c3 isn't sufficient to fix the performance regression in It feels rather unsatisfactory, but I think I'll need to change |
Can we just have a way of expressing "just trust me, don't worry about aliasing here"? |
Alright, I'd like to merge this guy tonight or tomorrow so I can then merge #24368. |
base/abstractarray.jl
Outdated
unalias(dest, A) = mightalias(dest, A) ? copypreservingtype(A) : A | ||
|
||
copypreservingtype(A::Array) = copy(A) | ||
copypreservingtype(A::AbstractArray) = (@_noinline_meta; deepcopy(A)::typeof(A)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure deepcopy
is ever correct. Maybe better just to give a method error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, I didn't consider how this would change the identities of the values that get assigned into the array. Perhaps we should just fall back to copy(A)
. It'll be type-unstable in some cases, but this function exists to allow you to fix it.
This will do what we want in almost all cases. We only hit this method if `A` is aliasing a mutable array... which means that `A` is mutable as well, and thus in many cases it has defined an appropriate `similar(A)` method that returns the same type.
@nanosoldier |
1 similar comment
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
How fast do you want a garbage answer? |
I should be able to fix this without too much trouble. I had it working just fine a few commits ago, and the current design is even simpler. |
@nanosoldier |
Maybe I'll just try again? @nanosoldier |
Same error — this is very strange.
I don't touch |
But you do touch broadcasting I guess, which is used in the stackframe above? |
Turns out |
base/abstractarray.jl
Outdated
unalias(dest, A::AbstractRange) = A | ||
unalias(dest, A) = A | ||
|
||
copypreservingtype(A::Array) = copy(A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like "copy-preserving type" and too similar to eltype()
, keytype()
etc.
Maybe something along sametypecopy()
/keeptypecopy()
/unaliasingcopy()
(my fav)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I also like unaliasingcopy
for now — it's likely that we'll try to separate the two meanings of copy in the future. By naming this more narrowly it gives us space for that design process.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Ok, the other few regressions here are all due to taking views of complicated non- I pushed a final (I hope) commit here that renames |
This commit puts together the beginnings of an "interface" for detecting and avoiding aliasing between arrays before performing some mutating operation on one of them.
A′ = unalias(dest, A)
is the main entry point; it checks to see ifdest
andA
might alias one another, and then either returnsA
or a copy ofA
. When it returns a copy ofA
, it absolutely must preserve the same type. As such, this functionis designed to be specialized oncalls an internalA
, typically like:copypreservingtype
unaliascopy
function, which defaults tocopy
but ensures it returns the same type.mightalias(A, B)
is a quick androughconservative check to see if two arrays alias eachother. Instead of requiring every array to know about every other array type, it simply asks both arrays for theirdataids
and then looks for an empty intersection.Finally,
dataids(A)
returns a tuple ofrangesUInt
s that describe something about the region of memory the array occupies.It could be simplyIt defaults to simply(objectid(A):objectid(A),)
(objectid(A),)
, or for an array with multiple fields,(dataids(A.a)..., dataids(A.b)..., ...)
.Fixes #21693, fixes #15209, and a pre-req to get tests passing for #24368 (since we had spot checks for
Array
aliasing, but moving to broadcast means we need a slightly more generic solution). I haven't exported any of these functions since I'd like us to use them a bit more before we officially document and support them for all custom arrays.