-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid undefined checks for fields that are always initialized #8827
Conversation
This is great. I have almost nothing to add. You might want to use I think we will eventually want to use the minimum number of initialized fields instead of a boolean, but we can start with what you have here. |
If boolean flags were passed to jl_new_datatype using power-of-two enum then the signature wouldn't change every time you add some kind of metadata like this. Of course, if you were tracking minimum number of initialized fields, then this wouldn't be a boolean flag. So not sure if it's really a win. |
Did you run any simple benchmarks? I'm curious what the performance overhead of this check is. |
It's pretty bad sometimes – it often forces a GC frame because there might be an exception. |
The UndefRefError is a singleton so throwing it doesn't allocate:
|
That is what I thought, so it is just the cost of a pointer load / comparison? This is usually not that expensive. |
It will get rid of a lot of branches and calls, which is good for code size if nothing else. Several changes like this could eventually make a dent in compilation time and/or memory use. |
Here's an example where it makes a difference: type A
x::Vector{Int}
end # undef-free
type B
x::Vector{Int}
B() = new()
B(x) = new(x)
end # not undef-free
function f(x)
y = 0
for i = 1:length(x.x)
@inbounds y += x.x[i]
end
y
end
a = rand(0:1000, 100000000);
julia> @time for i = 1:10; f(A(a)); end
elapsed time: 0.637712013 seconds (160 bytes allocated)
julia> @time for i = 1:10; f(B(a)); end
elapsed time: 0.889516562 seconds (160 bytes allocated) In this case, getting rid of the undefined check allows the array pointer load to be hoisted, which lets the loop vectorizer do its magic. |
Yes, this can help a lot in simple loops on "wrapped" types. |
Cool! Just curious. |
In fact, #8809 might be a good real-world testcase... |
This alone is not quite enough to achieve good performance with ArrayViews in the test case from #8809. The other necessary optimization is "hoist access to fields in immutable values" from #3440. LLVM can hoist the pointer load in the case above because there are no stores, but it can't hoist the pointer load in the case from #8809 because it thinks the array can alias the ContiguousView object. |
I tried to add an OTOH, maybe both |
Bump. @JeffBezanson, what's the right approach here? |
I don't see why adding |
Done. |
Let's merge this unless there are objections. |
👍. But I am getting some warnings:
|
Thanks! |
The approach is as follows:
undeffree
field that specifies whether they may contain undefined references.new
in the type block where the number of arguments is not equal to the number of fields. (Not sure I did this right, since I don't really know Scheme.) If no such cases exist (or the type is a bits type),undeffree
is set to true.getfield
skips the undefined check for pointer fields ifundeffree
is true.Another option would be to count the minimum number of arguments passed to
new
and skip undefined checks for fields at indices <= the minimum the type can be instantiated with. I could probably do that (as long as the rest of this approach seems reasonable).Ref #3440