-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Create a function to use type inference for eltype #54157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As far as I remember, Apart from that, the run time cost of type inference wouldn't be acceptable for |
Consider that though Also I see very good performance using it on some micro-benchmarks: julia> using BenchmarkTools
julia> iter = (x for x in 1:10)
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:10)
julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.182 ns … 6.061 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.192 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.197 ns ± 0.056 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █▆ ▄▃ ▁ ▃▂ ▁ ▁
█▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁█ █
1.18 ns Histogram: log(frequency) by time 1.24 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.182 ns … 6.843 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.192 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.202 ns ± 0.085 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
1.18 ns Histogram: frequency by time 1.24 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> iter = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)
julia> @benchmark eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.182 ns … 22.753 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.192 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.203 ns ± 0.218 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▃▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▄▃▁▁▁▁▁▁▁▁▃ ▂
1.18 ns Histogram: frequency by time 1.24 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Base.@default_eltype($iter)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.182 ns … 6.622 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.193 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.209 ns ± 0.073 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▂▁▁▁▁▁▁▁▁█▆▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▇▄▁▁▁▁▁▁▁▁▅ ▂
1.18 ns Histogram: frequency by time 1.24 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
|
xref #54207 |
Indeed, it would be really okay if infer_eltype(itr) = (T = eltype(itr)) === Any ? @default_eltype(itr) : T |
Another possibility is to create a new function as above and export it or even something maybe a bit stronger as infer_eltype(itr) = ifelse((T2 = Base.@default_eltype(itr)) <: (T1 = eltype(itr)), T2, T1) with all the needed disclaimers in the docstring |
We might want to special case julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
sum([1]) == 0
end
Int64
julia> Base.@default_eltype Iterators.Filter([1,2,3]) do x
sum(1) == 0
end
Union{} |
A thing I don't understand is why in julia> s = (x < 5 ? x : 2.0*x for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#5#6"}(var"#5#6"(), 1:10)
julia> T = Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(s)})
Union{Float64, Int64}
julia> Base.promote_typejoin_union(T)
Real Why can't we use julia> collect(s)
10-element Vector{Real}:
... |
collect does not return a Union, as it gets the types from the real values and not from inference |
right, now I see, it widens the type. Nonetheless I guess with an |
@vtjnash this is a bit derailing the conversation, but I'd like to ask you: is it a bad idea to return a |
Yes. That will never be the type of the object when it actually contains data, so it slows down the case of when there is data to benefit the case where there is not data |
You typically want the static and dynamic types to nearly coincide, for optimal results |
It would be really unfortunate, especially if this avoidance of Do you suggest turning |
If we are going the route where we export a new function, which to me seems preferable, I (maybe mis)understood what @LilithHafner has suggested to be implemented as for example function infer_eltype(itr)
T1, T2 = eltype(itr), Base.@default_eltype(itr)
ifelse(T2 !== Union{} && T2 <: T1, T2, T1)
end |
Inference is tasked with finding the narrowest type it can come up with that is a supertype of all types of instances that may actually occur. Coming up with a supertype of all types that could actually occur is a correctness thing: it has to get that right every time or we have compiler bugs. Picking a narrow type is on a best-effort basis.
We cannot guarantee that in any case so long as we rely on inference because inference may decide to provide an arbitrarily wide eltype for A and/or B. Even turning |
if we want to return the narrowest type possible would then be fine to use (a macro version of) |
This is great – inference is improving!
And this seems to be the actual issue. |
Which raises the question: when would someone use this |
I would say at least whoever is using |
Seems like around 20 packages some of which very popular are using it: https://juliahub.com/ui/Search?q=@default_eltype&type=code |
Yeah, for example function _DisjointSet(xs, ::Base.EltypeUnknown)
T = Base.@default_eltype(xs)
(isconcretetype(T) || T === Union{}) || return Base.grow_to!(DisjointSet{T}(), xs)
return DisjointSet{T}(xs)
end to function _DisjointSet(xs, ::Base.EltypeUnknown)
T = infer_eltype(xs)
isconcretetype(T) || return Base.grow_to!(DisjointSet{T}(), xs
return DisjointSet{T}(xs)
end has one semantic change: when inference can infer that the iterator is empty you get at |
IMO a pull request adding |
I could try to make a PR but I'm still unsure about a thing, consider: julia> struct A end
julia> struct B end
julia> itr = (x < 5 ? A() : B() for x in 1:10)
Base.Generator{UnitRange{Int64}, var"#1#2"}(var"#1#2"(), 1:10)
julia> Base.@default_eltype(itr)
Any
julia> Core.Compiler.return_type(Base._iterator_upper_bound, Tuple{typeof(itr)})
Union{A, B} I would really prefer the second one, so I think a possible |
ok, maybe we could actually have two versions, with a |
Sure, IIUC that |
It was a recent performance improvement for collect, which is designed not to return random Union types normally (only having trouble with the empty case). The exact PR for it should be findable by git blame if interested in the full picture |
The PR that changed
|
Yes, looks like the main explanation of that PR is in #30485 though |
Base.@default_eltype
for iterators and generators eltype
function which currently returns Any
?
Tried to make a PR to create |
We don't plan on supporting this for reasons mentioned in #54909 (comment) |
I'm using this function to compute the type generically for iterators:
and I'm asking myself if this couldn't be applied by default to some generators and iterators which currently returns
Any
(actually if it is applied specifically to just those types it can be used onlyBase.@default_eltype(iter)
I think)For example consider
Performance-wise I see a cost of a couple of ns. I found that there is this other issue and associated PR about a special case of this: #48249 and #48277, but I think that a function as above could be applied more generally. Would this be okay? Or is this problematic?
The text was updated successfully, but these errors were encountered: