Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfold() is an iterable based on a transition function #44873

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

nlw0
Copy link
Contributor

@nlw0 nlw0 commented Apr 6, 2022

I'm not sure there's already and alternative for this available in Julia, so I came up with this proposal. Please let me know if there's another way available. I'm especially interested in having a way to use do-syntax to create an iterator.

The idea is that defining iterators using the Base.iterate is a little bit inconvenient. Then I noticed that it's pretty convenient to create a Channel. But there's no equally easy way to create a simpler iterator.

Some languages allow you to create an iterator based on a closure that returns an optional type. The proposed class and its corresponding iterate method should allow you to do that. Using this, it's easy to write a function that defines a closure and returns and iterator based on it, as the example illustrates.

julia> function myrange(a, b)
           state = a
           IterableClosure() do
               if state <= b
                   prevstate, state = state, state + 1
                   Some(prevstate)
               else
                   nothing
               end
           end
       end
myrange (generic function with 1 method)

julia> for x in myrange(3,5)
           println(x)
       end
3
4
5

@Seelengrab
Copy link
Contributor

Duplicate of #43203?

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 6, 2022

Thanks @Seelengrab , there are indeed some similarities. I hope we can talk about these ideas here and then maybe we can turn that frown upside-down!

I suppose what I implemented is indeed a bit similar to some patterns you can do in Python. Also there's the connection to Channel. The thing about Channel is that it is going to be async, as I understand. Usually, though, I would prefer to produce an iterator and only make it a Channel later if desired, like Jeff mentions in than PR. And I believe I have been seeing some uses of Channel that might not be synchronous iterators only because Channels are more handy to define, especially using do-syntax. Eg. #44845

I find iterators in Julia generally hard to write, unless you're writing a generator expression, which are pretty limited in what you can do. If you're doing anything more complex, you'll miss the flexibility of writing a do-block, or a for-loop. And writing a full struct plus Base.iterate(x,y) plus wonder about the fact that you need a separate Base.iterate(z) as well, it just feels overwhelming. This is not a dunk on the iterator API, I think it's just fine. I just feel a need for tools to enable writing more complex iterators. Just removing the need to have to pick a good name for your structure in order to create the iterator is already such a blessing.

With a tool like this, it becomes a lot easier to write code in the following fashion: first you write a while true that depends on a lot of variables everywhere, changes state everywhere, etc. Go crazy. Then you just "print" values and figure out if they are correct. Then you figure out how the body of the loop can be turned into this optional output, hopefully your break statements --- of which there will be many --- will directly turn into return nothing, and then you can just gather everything you were printing in your loop and return Some them.

Then you wrap this up in a function, add something like Iterator() do, and voilá, you transformed a complex, mutable block of imperative code into a beautiful encapsulated iterator. I've done it a thousand times in multiple languages. Only recently I have been doing more varied work in Julia, and started noticing I couldn't create iterators like that. In my opinion, it might be a big reason iterators aren't more popular. At least it's my impression. It's a hugely important tool, and I also find it's lacking support compared to asynchronous programming. Again, I don't mean any criticism, it's just my impression recently using Julia in practice. I'm just trying to scratch my own itch.

Regarding unfold, there's indeed the similarity with the use of the optional output, it's exactly the kind of thing I have in mind. Although I'm kind of more interested in being able to wrap a nameless closure / callback style of function with no arguments than having to think what should be the input arguments. Just assume this is a defined inside a wrapper function, and the "input arguments" are somewhere there in the outer scope of the do body. This kind of thing is hard to explain, I hope anyone who has an interest in writing iterators can appreciate how this can be handy. I can imagine other variations that work more like eg a fold, or how the iterator interface works, might be good as well. I think the proposed version here would be the most handy one, feeling pretty close to what you can do with Python generator functions or Scala Streams. The main variation I would also like to see is something that processes an input iterator, but of course you can already do it here just using "take"...

I can also imagine having a version where the function is more intended to be pure, and returns a value and "new state", which gets fed to iterate and becomes the new input. But that could be closure as well anyways, we could try to leverage the two approaches.

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 6, 2022

This is a version that still lets you create an iterator using do-syntax, but here we have an initial state, and the function takes the state as an input, and returns an ouptut value and the next state. I think it might be nice to have something like this as well, fine by me. My first wish is just being able to write a function straight away with do-syntax, and get an iterator. I still think the version with the closure is important to have, though. It's an implicit impure version that can be very handy, while this version here would be the necessary one for pure functions, which is neat to have if you wanna put on the work. And we can also have something more like a Mealy/Moore machine, that would also take input values, (input value, state) -> (output value, next state).

struct IterableClosure2
    f
    initialstate
end

function Base.iterate(it::IterableClosure2, state=nothing)
    valuestate = it.f(something(state, it.initialstate))
    if isnothing(valuestate)
        return nothing
    else
        value, state = valuestate
        return something(value), state
    end
end

myfibs() = IterableClosure2((1,0)) do state
    f1, f2 = state
    newf = f1 + f2
    # f1, (newf, f1)
    if f1<100
        f1, (newf, f1)
    else
        nothing
    end
end

# for x in zip(1:10, myfibs())
for x in myfibs()
    println(x)
end

@Seelengrab
Copy link
Contributor

I'm not opposed to making iterators easier to write, I've written a few a bit involved ones myself. I just think that this approach to making it easier has some disadvantages:

  • Wrapping the result in Some is imo redundant, union splitting should already take care of this (it's the very basis for why the iterator protocol works in the first place).
  • Having f untyped disallows specialization based on the function, so it won't be inlined and there will be a dynamic lookup for it. Depending on how complex the "iterable closure" is, this can be very detrimental to performance.
  • The eltype of the iterator is not specializable, in contrast to writing a custom one, so you'll always hit the IteratorEltype(...) = Any fallback (though this is possible to avoid by adding a type parameter, but then you rely on the user to specify the correct eltype and they're not saving anything compared to just implementing the iterator interface in the first place).
  • The length of the iterator is not specializable and has to fallback to SizeUnknown, disallowing a length and thus there's no possibility for loop unrolling.

All of these are fixable, of course, but they more or less amount to.. reimplementing the iterator protocol with a different interface. If you then decide to e.g. specialize iterate(::IterableClosure{MyFunc,MyRetType}, state=...) for performance reasons, you haven't saved anything code-wise and instead have mysterious type parameters that just don't exist in a "proper" iterator. After all, that's what you're bound to by the way loops are lowered.

On top of this - there is still #15276, which I don't dare assess the impact of on this.

In my opinion, it might be a big reason iterators aren't more popular. At least it's my impression.

I'm not sure how you got that impression - I have quite the opposite, there are iterators hiding absolutely everywhere in the ecosystem. There are just fewer functional ones, but I think that has more to the with the people using the language than the protocol or its interface. Most folks using julia aren't functional programmers or call themselves programmers at all.

voilá, you transformed a complex, mutable block of imperative code into a beautiful encapsulated iterator. I've done it a thousand times in multiple languages.
[...]
I'm just trying to scratch my own itch.

That's great! Please remember though that julia is not those other (possibly hardcore functional) languages like haskell. I've used a similar approach to turn imperative code into iterator structs - after all, there are only three things you need to know to implement an iterator:

  1. What do you need to get started?
    • This is what ends up in the fields of your iterator struct and is needed to either get the initial state or to compute the next state. This is the "setup" part/the captured variables in your closure (which is actually what happens! Closures in julia are callable structs and captured variables become fields in that struct). Sometimes this isn't even needed.
  2. What do you need to keep around for the next iteration?
    • This is the second part of the tuple that iterate returns and what gets passed to the iterate(itr, state) method. In IterableClosure you have to manage that manually and you don't get the advantage of having a function barrier & possible dispatch for specialization.
  3. How do you get to the next state from an initial state?
    • This is the part you have in your closure, minus the explicit state management.

Most of the time, this results in a iterate(itr::MyItr) = iterate(itr, initial_state(itr)) or just a single iterate(itr::MyItr, state=initial_state(itr)) = ... function. I think this & the disadvantages above (even when they're fixed) show that this is more brittle & won't lead to less code in the general case.

Personally, I don't think the iterator protocol is too complicated, it's just different from what functional languages do because you have to actively think about what you need & have to carry around.

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 6, 2022

And we can also have something more like a Mealy/Moore machine, that would also take input values, (input value, state) -> (output value, next state).

I mean.. to me that just already is the current iterator protocol. iterate(itr::MyItr, state) is a transition function for the automaton MyItr which currently is in state. It even has the same function signature as the one you propose.

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 6, 2022

Thanks for your insights, @Seelengrab . I'm aware there may be performance penalties. It's just part of the things that programmers must learn and take into account during their work, in my opinion. Just one of many compromises. On the other hand there can be practical matters like I brought up. The main concern here is being able to write code quickly to produce an iterator. Generator expressions are not powerful enough. A map with a do block can be great in this regard, but many times you don't really have the case for a map. Then this might push you to a foldl, but if I'm not feeling hard-core enough, this doesn't tend to be my choice. Then what we are left with is writing a big ol' for-loop, or you bite the bullet to define a whole new struct with a matching Base.iterate. I personally find it overwhelming. I beg you to consider that not every programmer out there might have the same skills, experience or requirements you personally have to be productive with this specific paradigm.

It should be made clear that I'm not proposing to replace the iterator protocol, or to use this construct while implementing core libraries, etc. I feel this is a simple thing that can be very handy to many people, especially for prototyping. If it's not available in Julia, I might have it in my ~/.julia/config/startup.jl, which I must say does possesses some amazing gems. This is just one that I'm eager to share with my fellow Julians, many of whom I believe might greatly appreciate the beauty and usefulness of it.

@JeffBezanson
Copy link
Sponsor Member

I do think we should have something like this. I like some variation of unfold.

The main problem here is, of course, the poor performance of mutable captured variables. I would not want to add a nice-looking feature that encourages people to fall into that performance trap. But, this should probably be added to IterTools.jl if it's not there already.

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 7, 2022

Ah shoot, did my reply sound too dismissive? Sorry, that was not my intention! I just think that Base should encourage good & performant code whenever possible, so when there's room for improvement I like to give reasons why.

As Jeff mentions, captured variables have that well-known performance trap that people run into all the time and it's really not transparent when it might occur, especially when you're not deeply involved with all these gotchas.

In any case, here's a version that would fix most of the other problems I mentioned above:

struct IterableClosure{FuncType, Eltype}
    f::FuncType
    IterableClosure{Eltype}(f::FuncType) where {Eltype, FuncType} = new{FuncType, Eltype}(f)
end
IterableClosure(f) = IterableClosure{Any}(f)

Base.eltype(::Type{IterableClosure{F, Eltype}}) where  {F, Eltype} = Eltype
Base.IteratorEltype(::Type{<:IterableClosure}) = Base.HasEltype()
Base.IteratorSize(::Type{<:IterableClosure}) = Base.SizeUnknown()

function Base.iterate(ic::IterableClosure{F, Eltype}, _=nothing) where {F, Eltype}
    nextvalue = ic.f()
    if isnothing(nextvalue)
        return nothing
    else
        return nextvalue::Eltype, nothing
    end
end

The eltype type parameter is required to make e.g. collect(myrange(3,5)) type stable (else it would return Vector{Any}, another common performance pitfall), if so desired & the return type is known. Specializing more is IMO not practical, since the type of the closure is generally not known, but performance wise this should recover most of it (aside from the captured variables issue). This does however introduce a new gotcha, in the form of the type parameter having to match what's actually returned by f when that performance is desired, effectively forcing the user to be inference themselves instead of letting the compiler do it.

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

Thank you @Seelengrab , this already seems a very productive collaboration so far! You definitely know a lot more about Julia type inference and iterators than I do, I'm surely at least going to learn a lot from this PR.

The point of using Union{Some{T},Nothing} is that it enables us to produce nothing values, otherwise use of nothing by the user interferes with the iteration mechanism.

I would imagine that the version that takes a pure function mapping state to value-state can perform better. Maybe we can try implementing that as well with proper typing and test the performance.

I completely understand being aware of the issues that may happen, and giving the user an indication of what should be preferred or not, give it an ugly name inside a module, etc. I think enabling the creation of iterators with do-syntax has a great potential, though, we just need to explore the idea and see where it goes. And I'm glad to see that there's actually hope for #15276. Maybe we should just keep that uncompromising attitude that Julia is known for, and then one day this will be just another barrier that was turned to rubble.

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

Adding a few more thoughts because otherwise I keep ruminating it the whole day.

First I don't really like the name unfold, because it actually is kind of a fold. Your state in the fold can be a list that you are accumulating values to, like

julia> iterations(n) = Iterators.take(Iterators.repeated(()), n);

julia> myfibs = foldl(iterations(7), init=([], (1,1))) do (list, (a,b)), _
           ([list; b], (a+b,a))
       end[1]
7-element Vector{Any}:
  1
  1
  2
  3
  5
  8
 13

I get it that there's a difference because it produces an iterator, you get the intermediary states as an iterator, not just the final "reduction". But it's still kind of a fold in my opinion, unfold seems to contradict it too strongly. I'm curious if @tkf might agree.

That piece of code contains a weird idea, an infinite generator of "unit" values. This basically makes one piece of the function moot. I find it pretty insightful, actually, to consider what happens when you do that. In fact, there seems to be a sort of taxonomy we can come up with when we iterate over a (pure) function

input output name
input, state -> output, newstate Mealy/Moore machine
input, state -> (), newstate fold
input, () -> output, () map
(), state -> output, newstate iterator
(), () -> (), newstate closure-based reduction/fold
(), () -> output, () closure-based iterator

The last two wouldn't make sense with pure functions, but work with argument-less closures modifying some inscrutable state.

What looks like to me is that Julia currently has great support for foldl and map, especially because of those amazing do blocks. Implementing a full Mealy/Moore machine is perhaps not a very popular demand. And iterators are available, just not exactly in the same fashion as foldl and map

Anyways, I think Scala's version of this unfold is what's called scanLeft. And this is confusing because in some languages scan is the name for fold. At least it's a more positive name.

A final thought is that looking at iterate with this state machine point of view, what we have is basically that the argument itr defines the transition function along with logic contained in the specific iterate implementation, and state is of course the state. This PR is perhaps about figuring out everything we can do if the itr is not a generic object defining the iterations, but specifically the transition function of the state machine. ie no extra logic contained in iterate apart from the necessary to make the API work.

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 7, 2022

The point of using Union{Some{T},Nothing} is that it enables us to produce nothing values, otherwise use of nothing by the user interferes with the iteration mechanism.

I'd much prefer having to unwrap at the use site of the IterableClosure, since that makes it explicit that the e.g. myrange(3,5) can produce nothing values without stopping iteration. Producing nothing values without stopping iteration seems extremely counterintuitive to me and not in line with what I'd expect from an iterator - I'd always expect to get something, since the meaning of nothing is there is no value in julia. If there is no value, why produce it in the first place? What would the meaning of producing nothing in the middle of an iterator be (it certainly shouldn't be a context dependent sentinel value)? In practice you'd have to use something in your user-code anyway, since presumably your iterator does not only produce nothing values.

I'm curious whether you've seen this blog post about the protocol or this post by @tkf (which mentions that fold and iterate are dual) already?

First I don't really like the name unfold, because it actually is kind of a fold. Your state in the fold can be a list that you are accumulating values to, like

Yes it can be, but that doesn't mean it has to or should be. Accumulating into a list is already handled by collect in an efficient manner, taking full advantage of existing iteration capabilities with an extremely minimal iterator. I certainly wouldn't call allocating in a loop over and over and over again very julian code, when the only purpose of that allocation is then to immediately throw it away again when accumulating.

In julia, iterators are lazy by design - they should do the minimal amount of work necessary to get from one state to the next and let the user handle how (or if!) to accumulate. For example, your fib code is imo written mach nicer like this:

julia> struct fib end

julia> Base.iterate(::fib, state=(1,1)) = state[1], (state[2], state[1]+state[2])

julia> Base.IteratorSize(::Type{fib}) = Base.IsInfinite()

julia> Base.eltype(::Type{fib}) = Int

julia> foldl(Iterators.take(fib(), 7); init=Int[]) do acc,el
    push!(acc, el)
end
7-element Vector{Int64}:
  1
  1
  2
  3
  5
  8
 13

# or just 
# collect(Iterators.take(fib(), 7))

which keeps the iterator generic & possible for (re-)use in other scenarios as well as keep performance how it should be: fast. Base should imo be fast & an example for how to write good julia code, not expose potential performance pitfalls for a naive user (and if that's not possible, at least document the caveats). As someone who often helps other people out when they encounter performance problems, I'd hate to tell them "yeah don't use this feature from ̀Base`, write it like this to make it fast".

Yes I know, explicitly modifying something is not very functional, but then again julia isn't a pure functional language.

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 7, 2022

I would imagine that the version that takes a pure function mapping state to value-state can perform better. Maybe we can try implementing that as well with proper typing and test the performance.

When you say "pure function", what do you mean by that? I'm asking because in the past there has been some confusion about what people mean by that in a julia context.

And I'm glad to see that there's actually hope for #15276. Maybe we should just keep that uncompromising attitude that Julia is known for, and then one day this will be just another barrier that was turned to rubble.

I'm not saying there's hope for that. I'm saying that until that issue is fixed, you have to be very careful with providing an interface in Base that's intended to be used through closures (and some problems just aren't suitable for that approach in julia, if you want to end up with some reasonable performance). People will run into this and then we'll have to say "yeah tough luck, use the iterator interface directly".

@tkf
Copy link
Member

tkf commented Apr 7, 2022

But it's still kind of a fold in my opinion, unfold seems to contradict it too strongly. I'm curious if @tkf might agree.

Actually, I disagree :) Your table also pretty much shows the duality of foldl ((input, state) -> newstate) and iterate (state -> (output, newstate)). "unfold" is a quite standard name for "iterator" in functional programmers. https://en.wikipedia.org/wiki/Anamorphism

Yes I know, explicitly modifying something is not very functional, but then again julia isn't a pure functional language.

I feel there's a quite strong benefit in using non-mutating API even though Julia is not a "pure functional language." For example, non-mutating API is much easier and more predictable for execution on GPU. The compiler's optimizations for non-mutating local functions are also much better than variable-mutating closures. These are big upsides in JuilaFolds packages that have been pursuing non-mutating implementations IMHO. So, I think #43203 is a better direction for supporting "ad-hoc iterators."

@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 7, 2022

I feel there's a quite strong benefit in using non-mutating API even though Julia is not a "pure functional language." For example, non-mutating API is much easier and more predictable for execution on GPU. The compiler's optimizations for non-mutating local functions are also much better than variable-mutating closures.

Absolutely! This was more related to requiring the init=Int[] in that example in the first place. As I understand it, the Iterators.take(fib(), 7) part is all that's needed in functional languages anyway, the fold around that is just so we can get it back as an array (with the init serving only as avoiding allocations).

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

Producing nothing values without stopping iteration seems extremely counterintuitive to me and not in line with what I'd expect from an iterator

Here's one example. My favorite function that returns optional values is match. If I'm returning the output of match in my iterator, I need to get the nothings later for processing. The iterator is supposed to just run and provide me every value that my function produced, it's meaning in my application is not relevant to the iterator. It must be possible to create an iterator of optionals, such as (match(r"([0-9]+)", x) for x in mystrings).

When you say "pure function", what do you mean by that? I'm asking because in the past there has been some confusion about what people mean by that in a julia context.

I mean the general meaning, not @pure. Opposite to closures, etc, pure functions should play very well with compiler optimizations. You are all concerned about this, and I'm trying to say "OK, let's focus on the cases that should not have such problems". My understanding is that Julia should probably do a great job optimizing the code for these functions, so let's focus on these first. How would you call the functions that would not cause problems for this application?

Yes I know, explicitly modifying something is not very functional, but then again julia isn't a pure functional language.

I'm not sure what you mean, your example, as well as moving away from closures that mutate external values, this all looks to me exactly like sticking to pure functions, and going for a more "functional" way of doing things.

The point with this fibo example is just to say that it's not like an iterator produces collections, and a fold produces a value. A fold can produce a collection, the best way to do it is a very minor point. Thanks for alerting me of the allocation, I would hope [A;b] can reuse the same vector, but we could always go with this as well

julia> myfibs = foldl(iterations(7), init=((), (1,1))) do (list, (a,b)), _
                  ((list, b), (a+b,a))
              end[1]
((((((((), 1), 1), 2), 3), 5), 8), 13)

Anyways, here's what I'm hoping now we could do. Forget about the closure stuff. Basing myself on your previous code:

struct Iterator{FuncType, Eltype}
    f::FuncType
    initialstate::Eltype
    Iterator{Eltype}(f::FuncType, initialstate) where {Eltype, FuncType} = new{FuncType, Eltype}(f, initialstate)
end
Iterator(f, initialstate) = Iterator{Any}(f, initialstate)

Base.eltype(::Type{Iterator{F, Eltype}}) where  {F, Eltype} = Eltype
Base.IteratorEltype(::Type{<:Iterator}) = Base.HasEltype()
Base.IteratorSize(::Type{<:Iterator}) = Base.SizeUnknown()

function Base.iterate(it::Iterator{F, Eltype}, state=nothing) where {F, Eltype}
    valuestate = it.f(something(state, it.initialstate))
    if isnothing(valuestate)
        return nothing
    else
        nextvalue, state = valuestate
        return nextvalue::Eltype, state
    end
end

and basing myself on your example:

julia> fib = Iterator((1,1)) do (a,b)
           a+b, (a+b, a)
       end
Iterator{var"#21#22", Any}(var"#21#22"(), (1, 1))

julia> foldl(Iterators.take(fib, 7); init=Int[]) do acc,el
           push!(acc, el)
       end
7-element Vector{Int64}:
  2
  3
  5
  8
 13
 21
 34

Do you see any big problems with that? Isn't it interesting that we can define the same iterator using just a Iterator(initialstate) do ...? I personally find it very interesting.

Thanks for the references, I'll try to check it out later.

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

Actually, I disagree :) Your table also pretty much shows the duality of foldl

Let me try to be clearer, my point is that small changes to how an iterator works transforms it into a fold that produces a collection. On the other hand, imagine you have an iterator that takes S and produces [a,b,c,...,z]. The way I see some people talking, is like you could easily produce a related fold that takes [a,b,c,...,z] and then produces S. This is rather unlikely in my opinion. That would mean a stronger "opposition" between the two, that's what "fold, unfold" suggests to me. I suppose I'm just quirky like that!

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

I think #43203 is a better direction for supporting "ad-hoc iterators."

Could you highlight the differences in direction you see right now? I understand talking about closures is probably not a good idea. Forget that. I'm talking about writing a transition function that outputs the same as iterate. What would be the main downfalls with that?

@JeffBezanson
Copy link
Sponsor Member

Great discussion. I think the Iterator in #44873 (comment) is very good. We can even get fancier: if no element type is specified, we can set that type parameter to nothing, and dispatch on that case to make it EltypeUnknown.

@nlw0 nlw0 changed the title IterableClosure is an iterator based on a closure Iterable is an iterator based on a function Apr 7, 2022
@Seelengrab
Copy link
Contributor

Seelengrab commented Apr 7, 2022

I mean the general meaning, not @pure. Opposite to closures, etc, pure functions should play very well with compiler optimizations. You are all concerned about this, and I'm trying to say "OK, let's focus on the cases that should not have such problems". My understanding is that Julia should probably do a great job optimizing the code for these functions, so let's focus on these first. How would you call the functions that would not cause problems for this application?

I wouldn't necessarily call them "pure", because that may be understood as "there's no I/O" or "there's no allocations" as well. Neither of those examples are relevant to this though - with "pure functional languages" I was only referring to languages like Haskell, where "state" is much less of a thing than in julia.

The point with this fibo example is just to say that it's not like an iterator produces collections, and a fold produces a value. A fold can produce a collection, the best way to do it is a very minor point.

I disagree! An iterator produces a number of values, while a fold always produces/reduces to a single value. That may be a list itself, but there's still only one list (or one tuple, one aggregate.. you get the idea). The expectation is different, in that you usually get more than one thing from an iterator & its state(s), while you only get one thing from a fold. In an automaton context you could say that an iterator produces all intermediary values, while a fold only produces a single, final value.

Anyways, here's what I'm hoping now we could do. Forget about the closure stuff. Basing myself on your previous code:

This is a version I can get behind and is imo the best version of this so far. It still has the same closure capturing problems, but you can't really get away from them with closures in the first place and it also allows to have a very straightforward "migration path" to an explicit iterator should it become a performance problem:

  1. Create a new struct MyIterator with the variables you capture in the closure
  2. Move your do block into its own iterate(itr::MyIterator, state=init_state) function, while having the captured variables as field accesses of the first argument to iterate instead
  3. Set the second argument to the initial state you passed to Iterable
  4. Define Base.IteratorSize(::Type{MyIterator}) appropriately (SizeUnknown, IsInfinite, HasLength or HasShape{N})
    [5. If you had a type parameter on Iterable, define that as the returned value of Base.IteratorEltype(::Type{MyIterator})]

This requires minimal changes to the function created in the do block! :)

Do you see any big problems with that? Isn't it interesting that we can define the same iterator using just a Iterator(initialstate) do ...? I personally find it very interesting.

No, I like it! It's basically taking the minimal possible interface for writing an iterator, except that you don't use the iterate function via dispatch and instead only have it take the state, giving it to Iterable as an argument. I really like passing that state to the next invocation explicitly, since it keeps the style of writing these two versions of implementing an iterator largely the same while still providing a different interface.

We can even get fancier: if no element type is specified, we can set that type parameter to nothing, and dispatch on that case to make it EltypeUnknown.

That sounds like a good idea.

The last two caveats from my POV are:

  • If you use a closure, you can't really specialize IteratorSize based on the function (there's no really good way to get the type & define that method at runtime). You can specialize if you pass a "proper" function that you own since you know the type at time of writing (if you don't own it, you'll be a pirate). This shouldn't be blocking this though, just a note for future people :)
  • I have no idea how the name currently discussed here would interact with proposals like Abstract iterator type? #23429 or Implement traits Iterable and Callable #34535, so maybe some bike shedding for the name is appropriate, to keep Iterable around for a possible interface type/trait later?

@nlw0
Copy link
Contributor Author

nlw0 commented Apr 7, 2022

I wouldn't necessarily call them "pure", because that may be understood as "there's no I/O" or "there's no allocations" as well. Neither of those examples are relevant to this though - with "pure functional languages"

I'm a scruffy FP programmer, you're way more rigorous!

I disagree! An iterator produces a number of values, while a fold always produces/reduces to a single value. That may be a list itself, but there's still only one list (or one tuple, one aggregate.. you get the idea). The expectation is different, in that you usually get more than one thing from an iterator & its state(s), while you only get one thing from a fold. In an automaton context you could say that an iterator produces all intermediary values, while a fold only produces a single, final value.

I actually think we're talking similar things.

This is a version I can get behind and is imo the best version of this so far.

Nice! I wish you had made a patch, hope you don't mind I mostly copy-pasted your code.

This requires minimal changes to the function created in the do block! :)

You have seen the Array{T, 2}...

I have no idea how the name currently discussed here would interact with proposals like Abstract iterator type? #23429 or Implement traits Iterable and Callable #34535, so maybe some bike shedding for the name is appropriate, to keep Iterable around for a possible interface type/trait later?

Very good point, I think it's definitely good to hear from more people. I went with Iterator in the patch right now, and I like it, but it's of course open for debate.

base/iterators.jl Outdated Show resolved Hide resolved
base/iterators.jl Outdated Show resolved Hide resolved
base/iterators.jl Outdated Show resolved Hide resolved
@nlw0 nlw0 changed the title Iterable is an iterator based on a function Iterator() is an iterator based on a function Apr 8, 2022
@nlw0
Copy link
Contributor Author

nlw0 commented Apr 8, 2022

@Seelengrab the original iterate was a little complicated because of the Some logic. Now we are pretty much returning exactly the same thing that iterate wants. Could the definition be now reduced to a one-liner?

@nlw0 nlw0 changed the title Iterator() is an iterator based on a function Iterator() is an iterable based on a transition function Apr 8, 2022
@nlw0
Copy link
Contributor Author

nlw0 commented Apr 8, 2022

Made the iterate as one-liners, seems to be working fine

@JeffBezanson
Copy link
Sponsor Member

I think the name will need to be more specific than Iterator; everybody will think that is the abstract type of all iterators. Can discuss on triage.

@JeffBezanson JeffBezanson added the status:triage This should be discussed on a triage call label Apr 8, 2022
Base.eltype(::Type{<:Unfold{nothing}}) = Any
Base.IteratorEltype(::Type{<:Unfold{nothing}}) = EltypeUnknown()
Base.IteratorEltype(::Type{<:Unfold}) = HasEltype()
Base.IteratorSize(::Type{<:Unfold}) = SizeUnknown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SizeUnknown() not the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with how this works, but there are many similar definitions in this file, including for Filter and TakeWhile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the default is HasLength(). This is ok.

@cmcaine
Copy link
Contributor

cmcaine commented Jul 9, 2022

I'm fine with my PR being mentioned in the News file. I think the docs I wrote in my PR are maybe better than in this one? Maybe take a look and see if you agree?

In particular, I was careful to include the string unfoldr in the docs so that people would be more likely to find this function with apropos(); and I was quite happy with the explanatory comment explaining how unfold is to while as generators are to for.

No pressure, tho.

My PR stalled cos I never got a clear answer on whether Jeff et al wanted unfold(f) as well as unfold(f, init). Your PR provides Unfold(f) and unfold(f, init), so maybe that will satisfy those who wanted both?

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 9, 2022

@cmcaine ok thanks for agreeing, and thanks for your work! I think we might take at least your whole Extended help section, would you mind creating a suggestion-patch here, or can I just copy it myself?

@cmcaine
Copy link
Contributor

cmcaine commented Jul 11, 2022

I'm on holiday now, but I could do the patches later this week when I have access to a computer. Also happy for you to do it.

@nlw0
Copy link
Contributor Author

nlw0 commented Jul 24, 2022

I went ahead and copied the extended help, only modified the example code to rely on isnothing which I feel is kind of nicer, I don't even know what that operator is called...

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 17, 2022

Bumping this, what's the next step? @Seelengrab @tkf

@Seelengrab
Copy link
Contributor

It looks good from my POV, so 👍 from me - I can't merge things though :)

@cmcaine
Copy link
Contributor

cmcaine commented Mar 13, 2023

I just came across another example where I wanted this function. Can we get this merged by triage if there are no blockers?

@nlw0, you said:

I went ahead and copied the extended help, only modified the example code to rely on isnothing which I feel is kind of nicer, I don't even know what that operator is called...

about this code: x ≡ nothing. The triple bar is called equivalence and it's a unicode alias for ===, you write it with \equiv<tab>. I agree that isnothing is easier and nicer, though.

Copy link
Contributor

@cmcaine cmcaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function unfold(f, initialstate, eltype::Type{Eltype}) where {Eltype}

I think eltype should be a keyword argument in case we want to add support for IteratorSize at some later date. A keyword argument will also lead to code that self-documents better.

The tests should also check this version of the function. At the moment only unfold(f, initialstate) is tested.

base/iterators.jl Outdated Show resolved Hide resolved
base/iterators.jl Outdated Show resolved Hide resolved
test/iterators.jl Outdated Show resolved Hide resolved
@oscardssmith oscardssmith added the status:triage This should be discussed on a triage call label May 28, 2023
@oscardssmith
Copy link
Member

triage added to get eyes on this

@aplavin
Copy link
Contributor

aplavin commented May 28, 2023

I think eltype should be a keyword argument in case we want to add support for IteratorSize at some later date

Totally. Also, I find it strange that IteratorEltype and IteratorSize are handled differently - either both should have their argument like unfold(...; eltype, size) or none. Fundamentally, they are very similar.

@nlw0
Copy link
Contributor Author

nlw0 commented May 29, 2023

I have changed eltype to a keyword argument. Regarding size, I'm not sure how the whole Base.IteratorSize thing works in the first place. The priority in my opinion would really be infinite or unknown-length iterators. If we can implement this nicely I'm all for it, but I'd definitely need help. Would this size parameter be merely a boolean indicating a length definition is available, or something like that?

How would we ensure this plays well with Iterators.rest anyways? It seems tricky.

1:11

julia> length(aa)
11

julia> length(Iterators.rest(aa,5))
ERROR: MethodError: no method matching length(::Base.Iterators.Rest{UnitRange{Int64}, Int64})```

@cmcaine
Copy link
Contributor

cmcaine commented May 29, 2023

Here's an implementation of unfold and rest that supports specifying an iterator size or length as unknown, infinite, an integer length or as a shape. I've also included tests at the end. If you like it, please use it in the PR.

I also wrote a curried version of unfold, because I would use that a lot.

module Unfolds

using Base:
    SizeUnknown, HasLength, HasShape, IsInfinite, EltypeUnknown, HasEltype,
    @propagate_inbounds

import Base:
    length, size, eltype, IteratorSize, IteratorEltype, iterate, isdone

const SizeTypes = Union{SizeUnknown, IsInfinite, <:Integer, <:NTuple{N, <:Integer} where {N}}
size_type_to_iteratorsize(T::Type{<:Union{SizeUnknown, IsInfinite}}) = T()
size_type_to_iteratorsize(::Type{<:Integer}) = HasLength()
size_type_to_iteratorsize(::Type{<:NTuple{N, <:Integer}}) where {N} = HasShape{N}()

"""
    unfold(f, initialstate; [eltype], [size])

Iterable object that generates values from an initial state and a transition
function `f(state)`. The function must follow the same rules as `iterate`.
It returns either `(newvalue, newstate)` or `nothing`, in which case the
sequence ends.

The optional parameters `eltype` and `size` specify the element type and size of the iterator.

If `size` is specified it must be one of:

- an integer, representing the length of the iterator
- a tuple of integers, representing the `size` of the iterator (length will be defined as `prod(size)`)
- `Base.IsInfinite()`, meaning that the iterator is of infinite length
- `Base.SizeUnknown()`, if the iterator has an unknown length (this is the default).

See also: [`iterate`](@ref), [the iteration interface](@ref man-interface-iteration)

!!! compat "Julia 1.10"
    This function was added in Julia 1.10.

# Examples

```jldoctest
julia> fib = Iterators.unfold((1,1)) do (a,b)
           a, (b, a+b)
       end;

julia> reduce(hcat, Iterators.take(fib, 7))
1×7 Matrix{Int64}:
 1  1  2  3  5  8  13

julia> frac(c, z=0.0im) = Iterators.unfold((c, z); eltype=ComplexF64) do (c, z)
           if real(z * z') < 4
               z, (c, z^2 + c)
           else
               nothing
           end
       end;

julia> [count(Returns(true), frac(-0.835-0.2321im, (k+j*im)/6)) for j in -4:4, k in -8:8]
9×17 Matrix{Int64}:
  2   2   2   3   3   3   5  41   8   4   3   3   2   2   2   2   1
  2   3   5   4   5   8  20  11  17  23   4   3   3   3   2   2   2
  4  10  17  12   7  56  18  58  33  22   6   5   4   5   4   3   2
 26  56  15  13  18  23  13  14  27  46   8   9  16  12   8   4   3
 10   7  62  17  16  23  11  12  39  12  11  23  16  17  62   7  10
  3   4   8  12  16   9   8  46  27  14  13  23  18  13  15  56  26
  2   3   4   5   4   5   6  22  33  58  18  56   7  12  17  10   4
  2   2   2   3   3   3   4  23  17  11  20   8   5   4   5   3   2
  1   2   2   2   2   3   3   4   8  41   5   3   3   3   2   2   2
```

# Extended help

The interface for `f` is very similar to the interface required by `iterate`, but `unfold` is simpler to use because it does not require you to define a type. You can use this to your advantage when prototyping or writing one-off iterators.

You may want to define an iterator type instead for readability or to dispatch on the type of your iterator.

`unfold` is related to a `while` loop because:
```julia
collect(unfold(f, initialstate))
```
is roughly the same as:
```julia
acc = []
state = initialstate
while true
    x = f(state)
    isnothing(x) && break
    element, state = x
    push!(acc, element)
end
```
But the `unfold` version may produce a more strictly typed vector and can be easily modified to return a lazy collection by removing `collect()`.

In Haskell and some other functional programming environments, this function is known as `unfoldr`.
"""
function unfold(f, initialstate; eltype=nothing, size::SizeTypes=SizeUnknown())
    rest(Unfold(f, eltype), initialstate; size)
end

"""
    unfold(f; [eltype], [size])

Create a function that will return an iterator unfolded by `f` when given an initial state. Equivalent to `initial -> unfold(f, initial; eltype, size)`.

# Example

```jldoctest
julia> const collatz_path = Iterators.unfold() do n
           if isnothing(n)
               n
           elseif isone(n)
               (n, nothing)
           else
               (n, iseven(n) ? n÷2 : 3n+1)
           end
       end
#1 (generic function with 1 method)

julia> collatz_path(3) |> collect
8-element Vector{Int64}:
  3
 10
  5
 16
  8
  4
  2
  1
```
"""
function unfold(f; eltype=nothing, size::SizeTypes=SizeUnknown())
    initial -> unfold(f, initial; eltype, size)
end

struct Unfold{Eltype, FuncType}
    f::FuncType

    Unfold{E, F}(f::F) where {E, F} = new{E, F}(f)
    Unfold(f::F, eltype) where {F} = new{eltype, F}(f)
end
Unfold(f) = Unfold(f, nothing)

eltype(::Type{<:Unfold{Eltype}}) where {Eltype} = Eltype
eltype(::Type{<:Unfold{nothing}}) = Any
IteratorEltype(::Type{<:Unfold{nothing}}) = EltypeUnknown()
IteratorEltype(::Type{<:Unfold}) = HasEltype()

IteratorSize(::Type{<:Unfold}) = SizeUnknown()

@propagate_inbounds iterate(it::Unfold, state) = it.f(state)

# Iterators.Rest, but it can know how big the iterator will be.
struct Rest{I,S,Z<:SizeTypes}
    itr::I
    st::S
    size::Z
end

"""
    rest(iter, state; [size])

An iterator that yields the same elements as `iter`, but starting at the given `state`.

If `size` is specified it must be one of:

- an integer, representing the length of the returned iterator
- a tuple of integers, representing the `size` of the returned iterator (length will be defined as `prod(size)`)
- `Base.IsInfinite()`, meaning that the returned iterator is of infinite length
- `Base.SizeUnknown()`, if the returned iterator has an unknown length

!!! compat "Julia 1.10"
    The `size` parameter was added in Julia 1.10.

See also: [`Iterators.drop`](@ref), [`Iterators.peel`](@ref), [`Base.rest`](@ref).

# Examples
```jldoctest
julia> collect(Iterators.rest([1,2,3,4], 2))
3-element Vector{Int64}:
 2
 3
 4
```
"""
rest(itr, state; size=rest_iteratorsize(itr)) = Rest(itr, state, size)
rest(itr::Rest, state; size=rest_iteratorsize(itr)) = Rest(itr.itr, state, size)
rest(itr) = itr

@propagate_inbounds iterate(i::Rest, st=i.st) = iterate(i.itr, st)
isdone(i::Rest, st...) = isdone(i.itr, st...)

eltype(::Type{<:Rest{I}}) where {I} = eltype(I)
IteratorEltype(::Type{<:Rest{I}}) where {I} = IteratorEltype(I)

rest_iteratorsize(a) = SizeUnknown()
rest_iteratorsize(::IsInfinite) = IsInfinite()

IteratorSize(::Type{<:Rest{<:Any, <:Any, Z}}) where {Z} = size_type_to_iteratorsize(Z)
length(u::Rest{<:Any, <:Any, <:Integer}) = u.size
size(u::Rest{<:Any, <:Any, <:NTuple{N, <:Integer}}) where {N} = u.size
length(u::Rest{<:Any, <:Any, <:NTuple{N, <:Integer}}) where {N} = prod(u.size)

end

module UnfoldsTests

using ..Unfolds: Unfolds, unfold
using Test

@testset "unfold" begin
    @testset "eltype" begin
        @test eltype(unfold(x -> nothing, 1; eltype=String)) == String

        function fib_int(x)
            Iterators.take(unfold((1, 1); eltype=Int) do (a, b)
                a, (b, a+b)
            end, x)
        end

        @test eltype(fib_int(1000)) == Int
        @test eltype(collect(fib_int(4))) == Int
        @test collect(fib_int(4)) == [1, 1, 2, 3]
    end

    @testset "size" begin
        bad_one_to(n, size) = Unfolds.unfold(x -> x > n ? nothing : (x, x+1), 1; size)
        @test Base.IteratorSize(bad_one_to(10, 10)) == Base.HasLength()
        @test Base.IteratorSize(bad_one_to(10, (10,))) == Base.HasShape{1}()
        @test Base.IteratorSize(bad_one_to(10, Base.SizeUnknown())) == Base.SizeUnknown()
        @test collect(bad_one_to(10, 10)) == 1:10
        @test collect(bad_one_to(10, (10,))) == 1:10
        @test collect(bad_one_to(10, Base.SizeUnknown())) == 1:10

        infinite_itr = Unfolds.unfold(x -> (x, x), 1; size=Base.IsInfinite())
        @test Base.IteratorSize(infinite_itr) == Base.IsInfinite()
        # collect refuses to try and collect iterators of infinite size
        @test_throws MethodError collect(infinite_itr)

        shaped_itr1 = bad_one_to(9, (3, 3))
        @test collect(shaped_itr1) == reshape(1:9, (3, 3))
    end

    @testset "size and eltype" begin
        itr1 = Unfolds.unfold(x -> x > 9 ? nothing : (x, x+1), 1; eltype=Int, size=9)
        @test collect(itr1) == 1:9

        itr2 = Unfolds.unfold(x -> x > 9 ? nothing : (x, x+1), 1; eltype=Int, size=(3, 3))
        @test collect(itr2) == reshape(1:9, (3, 3))
    end
end

end

Todo: examples of specifying size and eltype; tests for curried unfold; performance review; maybe move the tests of size to be testing rest directly; maybe test that length and size give methoderrors if size is unknown or infinite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants