Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define last for non-O(1) types? #42943

Closed
MasonProtter opened this issue Nov 4, 2021 · 8 comments · Fixed by #42991
Closed

Define last for non-O(1) types? #42943

MasonProtter opened this issue Nov 4, 2021 · 8 comments · Fixed by #42991

Comments

@MasonProtter
Copy link
Contributor

MasonProtter commented Nov 4, 2021

Currently, our fallback definition of last is last(x) = x[end] and the docstring for it says

Get the last element of an ordered collection, if it can be computed in O(1) time. This is accomplished by calling lastindex to get the last index. Return the end point of an AbstractRange even if it is empty.

It was brought up on Zulip today (and I've also been personally bitten by this in the past) that this doesn't work for lazy iterators, which can be kinda annoying.

I was playing around and found to my surprise that the compiler is smart enough in every case I could find except two to make foldl((_, r) -> r, x) just as fast as last(x) for types with O(1) indexing, while also naturally generalizing to non-O(1) collections.

Here are the cases I tried where it gets this right:

julia> foldlast(x) = foldl((_, r) -> r, x);

julia> let x = Ref(1:10000)
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.255 ns (0 allocations: 0 bytes)
  1.256 ns (0 allocations: 0 bytes)

julia> let x = Ref(collect(1:10000))
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.500 ns (0 allocations: 0 bytes)
  1.501 ns (0 allocations: 0 bytes)

julia> let x = Ref(rand(1000, 1000, 1000))
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.499 ns (0 allocations: 0 bytes)
  1.499 ns (0 allocations: 0 bytes)

julia> let x = Ref(ntuple(_ -> rand((1, 2.0, "three", :four, (5, 6))), 20))
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.497 ns (0 allocations: 0 bytes)
  1.497 ns (0 allocations: 0 bytes)

and here are two cases where it loses:

julia> let x = Ref(ntuple(_ -> rand((1, 2.0, "three", :four, (5, 6))), 40))
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.497 ns (0 allocations: 0 bytes)
  1.245 μs (22 allocations: 976 bytes)

julia> let x = Ref(:a => 1)
           @btime     last($x[])
           @btime foldlast($x[])
       end;
  1.258 ns (0 allocations: 0 bytes)
  71.571 ns (3 allocations: 96 bytes)

The Pair case is quite bizarre and maybe should be a separate issue(?).

Anyways, this suggests to me that maybe the fallback definition of last should be to just foldl over the collection, allowing it to be used on types which don't support indexing, such as general iterators (and transducers!), and this can be done nearly effortlessly, seemingly just requiring that we define an overload for Tuple and Pair that use indexing.

Is this something we'd want to do, or is there a good reason to not let this work on non-indexable types?

@alecloudenback
Copy link
Contributor

Copying my original question on Zulip here:

Is this a case of "we don't know in advance what the latst element would be in the general case?" and that's why it errors? or should this be defined?

julia> Iterators.map(x -> x^2, 1:3)
Base.Generator{UnitRange{Int64}, var"#2#3"}(var"#2#3"(), 1:3)

julia> m = Iterators.map(x -> x^2, 1:3)
Base.Generator{UnitRange{Int64}, var"#4#5"}(var"#4#5"(), 1:3)

julia> first(m)
1

julia> last(m)
ERROR: MethodError: no method matching lastindex(::Base.Generator{UnitRange{Int64}, var"#4#5"})

Seems like it could be known without collecting all the values?

julia> dump(m)
Base.Generator{UnitRange{Int64}, var"#4#5"}
  f: #4 (function of type var"#4#5")
  iter: UnitRange{Int64}
    start: Int64 1
    stop: Int64 3

julia> m.f(m.iter.stop)
9

@CameronBieganek
Copy link
Contributor

CameronBieganek commented Nov 5, 2021

Do you want last(itr) == last(itr) to always be true? Because unfortunately that's not the case for stateful iterators:

julia> foldlast(x) = foldl((_, r) -> r, x);

julia> itr = Iterators.Stateful("abc");

julia> foldlast(itr)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)

julia> foldlast(itr)
ERROR: ArgumentError: reducing over an empty collection is not allowed

Though I'm not sure if that is a sufficient argument against this, because stateful iterators can often be counter-intuitive if you're not careful about the statefulness.

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Nov 5, 2021

Yeah, stateful iterators definitely make this dangerous, certainty it’d make it so that @alecloudenback ’s suggestion can’t be done in any but the most restrictive of cases, (though if it’s necessary, one can use MappedArrays.jl)

I think though that the existence of stateful iterators don’t stop this from being useful so long as it’s carefully documented. For instance, the first function already does this

julia> x = Iterators.Stateful("hello")
Base.Iterators.Stateful{String, Union{Nothing, Tuple{Char, Int64}}}("hello", ('h', 2), 0)

julia> first(x)
'h': ASCII/Unicode U+0068 (category Ll: Letter, lowercase)

julia> first(x)
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 5, 2021

first(x) on Stateful (or any iterator, I think) already does the "correct" thing by hitting the fallback definition for first, not because its special cased to work. IMO there can't be another/non-throwing behavior for a second call to last of a non-O(1) iterator, since by definition you have to iterate over the Stateful to get to its last element, precisely because there's no O(1) indexing. "Cheating" here would mean to iterate over a copy of the iterator instead, to preserve the original iterator and make a subsequent last work (which breaks if the iterator has a sideeffect and you rely on that sideeffect).

1.6 brought us last(itr, n), delivering the last n elements provided that iterate(::Reverse{typeof{itr}) is defined (which it can't be in general). So either way you end up having to special case things on the specific iterator type (and possibly its nested type parameter(s)), as e.g. you run into the same kinds of problems with infinite iterators, filters or other generic iterators that may depend on a lot of state (+sideeffects) to calculate the nth element. We sadly don't have a trait for "O(1) computation of nth element at any given point of iteration over a lazy iterator". To me, this indicates that there shouldn't be a generic fallback for last, but I'm open for special casing last for lazily mapped UnitRange.

Is there a motivating example for this, other than "we could do it for Generator{UnitRange{...},...}"?

@MasonProtter
Copy link
Contributor Author

first(x) on Stateful (or any iterator, I think) already does the "correct" thing by hitting the fallback definition for first, not because its special cased to work

Right, and I am suggesting a similar fallback (not a special case) for last that would also do the correct thing. If it's okay for first to consume one element of a Stateful iterator, I don't really see why it shouldn't be okay for last(::Stateful) to mean "consume the entire iterator".

There are cases with finite iterators (including stateful ones) where one might only be interested in the final element. E.g. one example I saw recently was an iterator that generates the first n fibonacci numbers.

struct Fibn
    n::Int
end
import Base: iterate, length
length(f::Fibn)=f.n
iterate(f::Fibn) = (0,(0, 1, 1))
iterate(f::Fibn, st) = st[3] < f.n ? (st[2], (st[2], st[1]+st[2],st[3]+1)) : nothing

This thing can be collected to return all of the first n Fibonacci numbers, but one could reasonably want to call last on this to just get the final one.

@simeonschaub
Copy link
Member

I don't think an O(1) fallback method for last is really a good idea, since such a fallback would make performance quite unpredictable for generic code that's written assuming last is O(1). You can always just write this as foldl((_, r) -> r, x) explicitly if you really need it.

We could use first(Iterators.reverse(x)) as a fallback though instead of assuming x is indexable, which would work for every iterator that supports fast reverse iteration.

@MasonProtter
Copy link
Contributor Author

That sounds reasonable to me.

@Seelengrab
Copy link
Contributor

I don't think an O(1) fallback method for last is really a good idea

I think you meant O(n) here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants