Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widening-based map() to remove return_type #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

c42f
Copy link

@c42f c42f commented Apr 17, 2020

You earn two points https://xkcd.com/356/

It's insane that the compiler removes all the ugly crap I've put in here.

julia> v = SArrayLite{Tuple{3}, Float64, 1, 3}((1,2,3))
3-element SArrayLite{Tuple{3},Float64,1,3}:
 1.0
 2.0
 3.0

julia> @code_native map(x->x^2, v)
	.text
; ┌ @ mapreduce.jl:2 within `map'
	movq	%rdi, %rax
; │ @ mapreduce.jl:6 within `map'
; │┌ @ REPL[11]:1 within `#15'
; ││┌ @ intfuncs.jl:261 within `literal_pow'
; │││┌ @ float.jl:405 within `*'
	vmovupd	(%rsi), %xmm0
	vmulpd	%xmm0, %xmm0, %xmm0
; │└└└
; │ @ mapreduce.jl:8 within `map'
; │┌ @ mapreduce.jl:21 within `_map_widen!'
; ││┌ @ REPL[11]:1 within `#15'
; │││┌ @ intfuncs.jl:261 within `literal_pow'
; ││││┌ @ float.jl:405 within `*'
	vmovsd	16(%rsi), %xmm1         # xmm1 = mem[0],zero
	vmulsd	%xmm1, %xmm1, %xmm1
; │└└└└
	vmovupd	%xmm0, (%rdi)
	vmovsd	%xmm1, 16(%rdi)
	retq
	nop
; └

@c42f
Copy link
Author

c42f commented Apr 17, 2020

lol

julia> v = SArrayLite{Tuple{3}, Float64, 1, 3}((1,2,3))
3-element SArrayLite{Tuple{3},Float64,1,3}:
 1.0
 2.0
 3.0

julia> map(x->Meta.parse(readline()), v)
1
1
1
3-element SArrayLite{Tuple{3},Int64,1,3}:
 1
 1
 1

@andyferris
Copy link
Owner

Lol on both comments!

@c42f
Copy link
Author

c42f commented Apr 17, 2020

This isn't so good though:

julia> map(x->Meta.parse(readline()), v)
1
2.0
ERROR: setindex!() with non-isbitstype eltype is not supported by StaticArrays. Consider using SizedArray.

Basically there's a lot less value in the widening machinery transformation unless we can setindex! for general MArray. Until then we may as well fail outright if the eltype of the first resulting element isn't the same as the others (or I guess we could fall back to Array :-/)

@andyferris
Copy link
Owner

Something like setindex!! is what you want, right?

@c42f
Copy link
Author

c42f commented Apr 17, 2020

True though if only it could be called setindex!?.

Anyway that brings to mind @tkf's package https://github.com/tkf/BangBang.jl which I'd forgotten about. It's clearly related; maybe it even has all the pieces which are needed for the more general version of this.

@andyferris
Copy link
Owner

Oh sorry I just assumed you saw andyferris/Freeze.jl#1

Anyway that’s what I was referring to, yes.

@tkf
Copy link

tkf commented Apr 18, 2020

Thanks for the ping :)

I believe there is a much better approach to implement map and alike. I'd suggest

map(f, xs) = mapfoldl(f, push!!, xs; init = empty(xs, Union{}))
map(f, xs, args...) = mapfoldl(f, push!!, zip(xs, args..); init = empty(xs, Union{}))

(maybe with freeze at the end).

With this approach, you only need to write a good foldl once for each collection type (note: zip-of-your-things is also one collection type). Writing a good function with widening approach is hard. Let's do it once and use it everywhere.

I agree thaw+freeze is a very convenient tool for writing functions with predictable element types (e.g., matrix-matrix multiplication). But, if you want to take composability of collections seriously, I'd strongly recommend building them around foldl, append!! (from which push!! can be derived) and (say) empty. Both of them are "primitive" functions in the sense they have to be implemented by the authors of the collection types (so BangBang.jl is not so useful here; although some traits and internal machinery may be useful). Of course, they are just the most important and composable primitives and I'm not against at all for implementing functions using different routes (e.g., setindex!!) for performance.

@c42f
Copy link
Author

c42f commented Apr 18, 2020

Yes that's cool, thanks!

I do like the idea of foldl as a fundamental abstraction... but as a fundamental unit of implementation I get confused about how it's meant to lower to efficient code.

Or is this missing the point? Are you arguing that

  • For collection users foldl should be the fundamental unit of abstraction
  • Library implementers should write a specialized set of foldl implementations including whatever performance hacks for their container types. For example we might implement mapfoldl(f, ::typeof(push!!), xs::MyContainerType), which essentially is map but with the dispatch inverted?

@andyferris
Copy link
Owner

andyferris commented Apr 18, 2020

I do like the idea of foldl as a fundamental abstraction... but as a fundamental unit of implementation I get confused about how it's meant to lower to efficient code.

I agree. On the implementation level @tkf (ignoring types and just thinking of length information) pre-allocating an Array instead of using push! is significantly faster. For dictionaries it is the same - Dictionaries.jl gets a huge speedup over Base for a lot of things purely out of the equivalent thing (knowing that the keys won't change through a map). And a recursive mapfoldl implementation for building static arrays is at least O(N^2) work for the compiler, which has to be smart enough to elide a lot of code to possibly make the run-time O(N) - I recall the general observations during v0.7 development was for recursion over larger tuples changing lengths seemed to cost O(N^3) in the compiler and generates bad code anyway.

For containers where the output keys are known in advance (like for map), it would seem to me that a combination of similar and setindex!! is the right abstraction. When the length is unknown (like for filter) empty and push!! are the right abstraction. Thoughts?

@andyferris
Copy link
Owner

(Also an aside I'd like to note - the generic code I wrote here not only is as good for AbstractArray as it is for StaticArray, but it's also generic code that would work correctly and efficiently for a Dictionaries.jl AbstractDictionary. That is, I'm even shooting towards the goal of sharing these generic definitions for all indexable containers, and have Julia specialize and make perfectly efficient code in all cases).

Copy link
Owner

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the implementation, is there a way to make widening code more generic, like the idea of using setindex!!? I suppose you actually want to introduce a function barrier whenever the eltype changes, and continue the loop from there.

Actually... are there other functions (like collect or copy) that create a new container and preserve the output keys but aren't somehow equivalent to map? copy is like map(identity), right? I'm questioning my assertion above that there is a need to make this pattern generic and reusable (via setindex!! or something).

@tkf
Copy link

tkf commented Apr 19, 2020

  • For example we might implement mapfoldl(f, ::typeof(push!!), xs::MyContainerType), which essentially is map but with the dispatch inverted?

@c42f OK, let me try to explain this more clearly. No, this is not what I meant [*1]. I did mean to specialize foldl only on the data source collection.

I agree the semantics of map let us assume some invariance for the data sink collection and we must use it for optimizing map. This may mean to use (say) setindex!! instead of push!!. But, for data source collection, I think we still should use foldl because the author of the source collection type knows the best iteration strategy [*2]; not us, the implementer of map.

I think we need an abstraction and a set of primitive functions that cleanly separate the properties of data sink and source collections. I think this point is blurred in map example because the source and sink are the "same type."

[*1] OK, it's a very side-note but there is an interesting possibility for optimization to do the inverse of what I call adjoint transformation (convert an iterator transform to transducer; JuliaLang/julia#33526). For example, a partition transducer can be much more efficient when executed as a partition iterator. This is yet another optimization we can do if we build our iteration facility around foldl.

[*2] foldl allows us to use many more optimization strategies than iterate. I have some examples to demonstrate this for persistent vectors (bit mapped trie) and a bit mask-based implementation of AbstractVector{Union{T,Missing}} that I hope I can share soon. Meanwhile, there is a discussion on CartesianIndices in JuliaLang/julia#9080 (comment)

pre-allocating an Array instead of using push! is significantly faster.

@andyferris Totally agree! I didn't suggest that foldl+push!! would generate the fastest code; it's the most generic and composable set of primitives (AFAIK). For example, I think this is the best [*3] fallback strategy for implementing MyCollection(iterator) when length is not defined for iterator. Using foldl on iterator let us use the best possible iteration strategy for iterator.

Note that foldl and setindex!! compose well. It's perfectly fine [*4] to do

foldl(pairs(1:3); init=resize!(Union{}[], 3)) do ys, (i, x)
    setindex!!(ys, x, i)
end

foldl is a primitive for the source collection and push!! is a primitive for the sink collection. When the sink collection has a special property (indexability), we can use setindex!! instead.

[*3] OK, this may not be the case if MyCollection has some complex inner structures (e.g., block based data storage) and iterate of the source collection is relatively simple. But, I think it's still possible to come up with some general solution (e.g., inverse adjoint I mentioned above). Of course, it's always possible to dispatch to a specialized method.

[*4] As you know, Union{}[] is not a very good initial value. Until it's fixed in Julia, I guess we need something like Empty(Vector). Note also that the definition of setindex!! is to copy all the elements when widening. I think we need to have a better interface to copy only the elements that are already set: JuliaFolds/BangBang.jl#70

For dictionaries it is the same - Dictionaries.jl gets a huge speedup over Base for a lot of things purely out of the equivalent thing (knowing that the keys won't change through a map).

As I just explained, I totally agree that exploiting the property that index set is shared between the data source and sink is important.

But, for composable optimal iteration (not just map), I think it is also beneficial to define foldl for hash table implementations as well. Though I guess you don't need it after andyferris/Dictionaries.jl#13

And a recursive mapfoldl implementation for building static arrays is at least O(N^2) work for the compiler, which has to be smart enough to elide a lot of code to possibly make the run-time O(N)

IIUC the ability for the compiler to do the escape analysis and robustly elide mutable structs is introduced much recently than the ability to elide recursive calls. It seems what is hard for compiler is not a simple question.

Anyway, we don't have a hard choice here because we can just use setindex!! for data sink and use foldl for data source. But I don't think we need return_type as the primary tool here.

For containers where the output keys are known in advance (like for map), it would seem to me that a combination of similar and setindex!! is the right abstraction. When the length is unknown (like for filter) empty and push!! are the right abstraction. Thoughts?

Yes, setindex!! is more appropriate when the source collection has the HasLength/HasShape trait.


Just to clarify what I meant by foldl/push!!/map/filter/etc., here is a self-contained minimal implementation of what I'm talking about:

module Demo

foldl(op, xs; init) = _foldl(op, init, xs)

_foldl(op, acc, ::Tuple{}) = acc
_foldl(op, acc, xs::Tuple) = _foldl(op, op(acc, xs[1]), Base.tail(xs))

function _foldl(op, acc, xs::AbstractVector)
    isempty(xs) && return acc
    acc = op(acc, @inbounds xs[begin])
    T = typeof(acc)
    for i in firstindex(xs)+1:lastindex(xs)
        nxt = op(acc, @inbounds xs[i])
        if !(nxt isa T)
            return _foldl(op, nxt, @inbounds @view(xs[i+1:end]))
        end
        acc = nxt
    end
    return acc
end

push!!(xs::Tuple, x) = (xs..., x)

function push!!(xs::Vector, x)
    T = Base.promote_typejoin(eltype(xs), typeof(x))
    if T <: eltype(xs)
        return push!(xs, x)
    else
        return push!(copyto!(similar(xs, T), xs), x)
    end
end

empty(xs, T) = Base.empty(xs, T)
empty(::Tuple, ::Any) = ()

mapping(f, op) = (acc, x) -> op(acc, f(x))
filtering(f, op) = (acc, x) -> f(x) ? op(acc, x) : acc

map(f, xs) = foldl(mapping(f, push!!), xs; init = empty(xs, Union{}))
filter(f, xs) = foldl(filtering(f, push!!), xs; init = empty(xs, Union{}))

end  # module


using Test
using UnPack

@testset begin
    @unpack map, filter = Demo
    @test map(x -> x + 1, 1:3) == 2:4
    @test map(x -> x + 1, (1, 2, 3)) == (2, 3, 4)
    @test filter(isodd, 1:3) == [1, 3]
    @test filter(isodd, (1, 2, 3)) == (1, 3)
end

@c42f
Copy link
Author

c42f commented Apr 20, 2020

Thanks for the great explanation, especially including the minimal implementation which was really useful to understand what you're getting at.

One thing I really like about the minimal implementation you have here is that the recursion in _foldl is based on the container type of the accumulator. That's really clever and neat, somehow much more satisfying than messing around with eltype as I've done in map.

Note that foldl and setindex!! compose well. It's perfectly fine [*4] to do

foldl(pairs(1:3); init=resize!(Union{}[], 3)) do ys, (i, x)
    setindex!!(ys, x, i)
end

So to summarize, is the following roughly the default implementation you think should be in Base.map?

map(f, xs) = _map(IteratorSize(xs), f, xs)

# Iterators of unknown length
_map(::SizeUnknown, f, xs) = freeze(foldl(mapping(f, push!!), xs; init = empty(xs, Union{})))

# Iterators with length (also for HasSize?)
function _map(::HasLength, f, xs)
    freeze(foldl(pairs(1:3); init=similar(xs, Union{})) do ys, (i, x)
        setindex!!(ys, x, i)
    end)
end

Then in principle we only need to implement setindex!! in StaticArraysLite, provided Base contains the foldl implementation for AbstractArray?

The distinction here between init=empty(xs, Union{} without freeze vs the paired init=similar(xs, Union{}) and freeze is interesting.

@tkf
Copy link

tkf commented Apr 20, 2020

Yeah, something like that. Though I don't think you need to use intermediate mutable collections in the HasLength/HasSize case as setindex!! would work for immutable collections (my definition is that it calls setndex whenever setindex! does not work in a type-stable fashion). Of course, what is better depends on what the compiler at the time is good at. I guess the mutation-based strategy is better for MArray of isbits elements ATM?

BTW, I just posted my attempt to make foldl more "user-friendly": https://discourse.julialang.org/t/37876

@andyferris
Copy link
Owner

andyferris commented Apr 20, 2020

@tkf neat :). foldl is interesting - I always considered foreach to be the higher-order function (note: not functional) version of for. Is that right? Is the problem that foreach doesn't have documented semantics that it follows the order of iterate? (e.g. by allowing for parallelism). Or were you aiming for something strictly functional/pure?

@tkf
Copy link

tkf commented Apr 21, 2020

@andyferris Good to know that you find foldl interesting! Yes, foreach is semantically as powerful as for loop as we have impure functions (i.e., we can just use Ref/Core.Box to keep the state). But, ATM, I don't think you can write a lot of efficient functions using foreach, especially when the type of the state (e.g., output collection in case of map) has to change at the middle of the loop.

I there is a related discussion with @c42f in Discourse:

the usual optimization difficulties with lexical scoping of captured variables

Are you referring to that it’s hard to efficiently mutate variables of the outer scope? I think this is why it is important to use foldl rather than foreach as the lowering target. Since foldl takes the function of the form (state, input) -> state′ , there is no need to mutate the variables of outer scope via the boxing used by the usual lowering of closure. All the changes during the loop can stay in state (a named tuple) and assigned back to the original variables after the loop is finished. OTOH, using foreach would have this problem.

@tkf
Copy link

tkf commented May 21, 2020

I came up with this thin-wrapper that I call BangBang.collector for abstracting out the implementation of collect-like functions for index-based and push!-based approaches. If you use "unsafe" collector, the performance is equivalent to hand-written map (compare :impl => "man" and :impl => "coll" under :comp => "map" JuliaFolds/BangBang.jl#143 (comment)). This lets you write map only using push!!.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants