Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map for Dict #5794

Open
filmackay opened this issue Feb 13, 2014 · 37 comments
Open

map for Dict #5794

filmackay opened this issue Feb 13, 2014 · 37 comments
Assignees
Labels
collections Data structures holding multiple items, e.g. sets

Comments

@filmackay
Copy link

I noticed there wasn't any map methods that generated Dicts. Which function/name - comments please:

map{K,V}(f::Callable, d::Associative{K,V}) = [ f(k,v) for (k,v)=d ]
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(k,v)) for (k,v)=d ] # or this?
map{K,V}(f::Callable, d::Associative{K,V}) = [ (k, f(v)) for (k,v)=d ] # ..or this?

To distinguish the above from the current map (which you still may want to use), they will need new names - suggestions?

I noticed that map for Arrays handles multiple arrays - presumably this is more of a join-by-key semantic for Dicts and therefore not needed in map for Dicts?

@mbauman
Copy link
Member

mbauman commented Feb 13, 2014

Currently, map for dictionary arguments falls back to the general iterable case. I'm not sure I see a difference between the status quo and what you are looking for. Does this not work the way you'd like it to?

julia> f(x) = x;
julia> @which map(f,[1=>"1",2=>"2"])
map(f::Union(Function,DataType),iters...) at abstractarray.jl:1265

julia> map(f,[1=>"1",2=>"2"])
2-element Array{Any,1}:
 (2,"2")
 (1,"1")

This is very nearly your first suggestion, effectively like … = [ f(x) for x in d ]

@StefanKarpinski
Copy link
Member

It would be more convenient if the key and value were provided as separate arguments instead of as a pair. Of course, then mapping with the identity function doesn't produce an equal Dict, but I think that may well be ok. Another issue is what to do about key collisions.

@mbauman
Copy link
Member

mbauman commented Feb 13, 2014

Yeah, I see. And this was a little surprising to me:

julia> f(x1,x2) = (x1,x2);
julia> map(f,1:3,1:2)
ERROR: dimensions must match # as expected
julia> map(f,Dict(1:3,map(string,1:3)),Dict(11:12,map(string,11:12)))
2-element Array{Any,1}:
 ((2,"2"),(11,"11"))
 ((3,"3"),(12,"12"))

Perhaps the iters… map method should check to see if they're all done at the end?

Oh! I see this came from the mailing list: https://groups.google.com/forum/?fromgroups=#!topic/julia-users/naGp9DUdSq8

@filmackay
Copy link
Author

How should we distinguish between a map-I-want-a-Dict-returned and map-just-a-tuple-array-is-fine?

Should map stay returning an Array, and amap return an Association? It seems in some cases you want either.

The more general problem is: what about functions where you want to specify the return type - but the parameter types are identical? Is there a Julian solution for this problem - one being the convert convention of specifying the target type. But in this case you don't necessarily want to verbosely specify the exact Dict{K,V} that it is to return - just distinguish between "array" and "association please".

@kmsquire
Copy link
Member

I believe there is a dictionary constructor from an iterable of tuples now, so while the whole thing could be done more efficiently, constructing a Dict is pretty straightforward if that's what you want.

Might need better docs (I haven't checked).

@filmackay
Copy link
Author

So we definitely need a specific associative method, since currently the type information gets dropped:

function f() # to avoid global-induced Any types
       local a = ["a"=>1, "b"=>2] # Dict{ASCIIString,Int64}
       which(map, kv->kv, a) # map(f::Union(DataType,Function),iters...) at abstractarray.jl:1265
       local m = map(kv->kv, a) # Array{Any,1}
end

I was actually expecting map to return an Array{(ASCIIString,Int64},1}. Why not? Answer: Associative is not an AbstractArray. Why is there not an Iterable abstract type that is a parent of both Associative and AbstractArray Should we have a typed iterator to handle both cases?

(edit: it seems that anonymous functions are another reason the type information gets dropped)

Would Associative support be required in map anyway? We could do:

Dict(map(kv->(k[1], v[2]+1), ["a"=>1, "b"=>2]))
Dict(map(k->k, v->v+1, ["a"=>1, "b"=>2]))
Dict(map(v->v+1, ["a"=>1, "b"=>2]))

@filmackay
Copy link
Author

Is it possible to constrain Function parameters? Looking at doing a map for AssociativeArray I want to be able to constrain the mapping function signature:

Base.map{K,V,KR,VR}(f::((K,V)->(KR,VR)), d::Associative{K,V}) # produces an Associative{KR,VR}

Note: not saying this specific example is a good idea, just asking generally about constraining functions.

@filmackay
Copy link
Author

This function would at least get typed correctly (as opposed to the iter-based function)

map{K,V}(f::Callable, d::Associative{K,V}) = [f(k,v) for (k,v) in d]

Then you wrap the return in a Dict if that's what you want.

Of course anonymous functions have issues with preserving type information in comprehensions, which seems why map is not currently implemented with one.

My leaning is to implement the above with a comprehension, and use inline functions (not anonymous syntax). Then I have a hope of getting correctly types out even when an empty array is fed in. The current map can't do this.

@bryaan
Copy link

bryaan commented Jan 11, 2016

I just ran into this problem. Given the functional nature of Julia and its native collections, this should be implemented ASAP:

mapValues(dict::Dict{T, U}, fun::Function{U}) # should return a new Dict{T, V}.
# also implement mapValues!(...)

map(dict::Dict{T, U}, fun::Function{(T,U)}) # should return a new Array{V}

I don't think the syntax is perfect, but you get the idea. You should look to Scala, as those guys have gotten it down, Map (equivalent to Julia's Dict) in particular. The other available functions that are missing in Julia should also be added, like flatMap which excludes all null value Nullable objects (Scala's equivalent is Option).

http://www.scala-lang.org/api/current/scala/collection/Map.html

http://www.scala-lang.org/api/current/index.html#scala.Option

@kmsquire
Copy link
Member

Thanks for the input, @BryanARivera. Any chance you could start a pull request? That would be the best way for this to get implemented ASAP. Most people working on Julia right now aren't focused on this area, but most are willing to review pull requests.

Also, consider opening an issue (and/or pull request) at DataStructures.jl. This is rather less organized (currently) than Scala's Collections, but it would initially be a more appropriate home for some of the other dictionary types.

Cheers!

@bryaan
Copy link

bryaan commented Jan 15, 2016

@kmsquire Hopefully I will have the chance to delve into it soon. I will try to implement everything I can as Scala's collections are quite central to the language. I envision it would accelerate Julia dev as well.

Should be relatively straightforward. Scala has got the naming and function conventions down, we just need to rebuild it in Julia. Would DataStructures.jl be the equivalent of Collections (meaning this is where PRs should be targeted) ?

@kmsquire
Copy link
Member

@BryanARivera, that sounds fantastic. I'm also a huge fan of Scala's Collections framework.

DataStructures is probably the right place, but if what you want to implement is very different than what is already there, it might turn out that a new package would be better. But let's start in DataStructures.jl and see where that goes. Cheers!

@andyferris
Copy link
Member

@BryanARivera I'm very happy to have read this since I too have been looking for a mapvalues function. The current implementation of map seems quite bizarre after you see the example with filter! which can return an intuitively filtered Dict, but I suppose it may be undesirable to many if we changed map! to behave more like filter!, as in:

x = ["a" =>1, "b"=>2, "c"=>3]
map!((k,v)->v+1, x)  # x = ["a" =>2, "b"=>3, "c"=>4]

since it could break existing code (unless someone like @StefanKarpinski thinks this is a good move?). In any case, functions which return Arrays and functions which return Dicts may both be needed.

We'll probably need both mapvalues and mapvalues! (note no capital 'V') (or possibly mapdict?? probably less generic...). Did you make a start on an implementation?

@bryaan
Copy link

bryaan commented Jan 31, 2016

I would be happy to write a PR. Does this look OK:

extractValue(fun) = (pair) -> fun(pair.second)

mapvalues(fun, dict) = map(extractValue(fun), dict)

It won't have the best performance until Jeff is done with his anon functions improvements.

@tkelman
Copy link
Contributor

tkelman commented Jan 31, 2016

We don't usually camel-case function names in base.

@nalimilan
Copy link
Member

Also, you don't really need that second function. Just do:

mapvalues(f, d::Associative) = map(p->f(p.second), d)

Sounds like a nice addition to me (as well as mapvalues!). I guess you can rely on anonymous functions being fast as this will only be added in 0.5 anyways.

But that wouldn't return a Dict unless you also add a special map method for Associative that does so: you'd have to include both in your PR.

@bryaan
Copy link

bryaan commented Feb 5, 2016

@nalimilan I see - it's on my todo list.

@bryaan
Copy link

bryaan commented Feb 5, 2016

This works, but doesn't look optimized:

mapvalues(f, d::Associative) = Dict( zip( keys(d), map(p->f(p.second), d)) )

@bryaan
Copy link

bryaan commented Feb 6, 2016

Is there any way to specify a function return type so that this will work?

map(p->f(p.second), d)::Array{AbstractString}

@nalimilan
Copy link
Member

As I said, I think you should first write map, and then implement mapvalues on top on that. Something like this seems to work. It's inspired by map, in particular as regards type inference: it computes the element type from the first element.

Base.similar{K,V}(d::Dict, ::Type{Pair{K,V}}) = Dict{K,V}()

function Base.map(f, a::Associative)
    if isempty(a)
        return similar(a)
    end
    st = start(a)
    a1, st = next(a, st)
    first = f(a1)
    dest = similar(a, typeof(first))
    sizehint!(dest, length(a))
    push!(dest, first)
    while !done(a, st)
        el, st = next(a, st)
        push!(dest, f(el))
    end
    dest
end

mapvalues(f, a::Associative) = map(p->p.first=>f(p.second), a)

I'm not sure about the case where the dict is empty: the existing map for arrays has a special-case for when f is a type, but that doesn't work here.

@nalimilan
Copy link
Member

My code would have to be adapted a bit if we want to write map((k,v)->.., d) instead of map(p->..., d).

@rfourquet
Copy link
Member

FWIW, Haskell defines its map (or rather fmap) as what was named mapvalues above, which is more sound if you share @timholy's view on "what an array means":

it's an associative container for which the space of "keys" can be determined from the array type and its dimensions.

In both cases (array or associative), the function takes only the value (in the key-value pairs) and returns a new value for the same key.

@toivoh
Copy link
Contributor

toivoh commented May 22, 2016

I was surprised recently by the current behavior of map on Dict. When you iterate over a Dict you get (key, value) pairs, so I was expecting my function f to receive a single such pair each time, not one argument for the key and another for the value.

So while I agree that the view on arrays as associative containers is nice, I'm not sure if this should take precedence over the differences between how they are iterated. (Or would we want to bring that in line between dicts and arrays? It's a major breaking change)

@rfourquet
Copy link
Member

rfourquet commented May 22, 2016

I agree that consistency with what iteration over a Dict yields may matter more.
For reference, in the context of Haskell, having map working with (key, value) pairs would not satisfy the following functor rule: map(f, map(g, d)) == map(x->f(g(x)), d) (where d::Dict).

@pabloferz
Copy link
Contributor

This is now fixed by fc3cbc9 and #17968 right?

@JeffBezanson JeffBezanson added this to the 1.0 milestone Aug 17, 2017
@JeffBezanson JeffBezanson added the collections Data structures holding multiple items, e.g. sets label Aug 17, 2017
@JeffBezanson JeffBezanson self-assigned this Aug 17, 2017
@JeffBezanson
Copy link
Member

Part of #20402

@JeffBezanson
Copy link
Member

Deprecation is in; a new map method can be added any time post-0.7.

@ndinsmore
Copy link
Contributor

@JeffBezanson as you are the assignee of this, I have reviewed it an I think that it is covered by current functionality so it should likely be closed.
I think that Dict with a generator covers all the use cases of this issue

@oxinabox
Copy link
Contributor

oxinabox commented Oct 24, 2019

We should go back to the old behavour.
I am going to make the claim that the definition of map
is that it takes in an iterator, and returns a Vector.

Dicts iterate as Pairs.
map on dict should be defined and should iterate on pairs,
returning a vector.

If we wanted map to work on values
then we do map(f, values(dict))

if we want a map that returns a dict
then we do Dict(map(f, dict))

I think it is good to be able to thing of map
as the do-blockable form of a Vector comprehansion (assuming no filter)
and this is a case that [f(x) for x in xs] differs from map(f, xs)
And I often want to write:

map(xs) do (key, val)
 ...
end

when [... for (key, val) in xs] would be too long

@StefanKarpinski
Copy link
Member

What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict, etc.

@oxinabox
Copy link
Contributor

What if it was lazy and produced a pair generator? My reasoning is that you really never want a vector of pairs, you always want to do something with that vector of pairs: put it in a Dict, etc.

It is an interesting point,
but I don't know that it works out.
the eltype of the return type of map(f, xs)
is determineed not by the eltype of xs
but of the return type of f(::eltype(xs))

For example

map(dict) do (kk, vv)
   val = round(vv, sigdigits=2)
   return " - $(kk): $val  (2 sig figs)"
end

I don't think that should return anything with the eltype of Pair

@mbauman
Copy link
Member

mbauman commented Oct 28, 2019

If we wanted map to work on values then we do map(f, values(dict))

If we wanted a map to work on pairs then we should be able to do map(f, pairs(dict)) — oh but wait, that just returns a Dict (because dicts do iterate pairs) and still doesn't work!

IMO, we'd ideally just iterate/map/etc over values, and require pairs to do the for (k, v) in thing, but there's no way we're changing iteration in 1.x. Much of this discussion predates pairs.

So now we're stuck in this weird never-never land where map is available to do what I'd want it to do, but iteration isn't doing what I want. So do we just wait for 2.0 to unify everything?

@oxinabox
Copy link
Contributor

oxinabox commented Oct 28, 2019

Options include:

  1. make Pairs not subtype AbstractDict but that is breaking

  2. We could make the Pairs type an exception to the rule that AbstractDict can't be used in map. Maybe confusing, but nonbreaking.

  3. Or we can say that AbstractDict iterate as pairs and thus should map as a pairs

  4. we say that you can map AbstractDict but not Dict. i.e. the generalization of 2. This avoids any weird cases with Dict(map(f, dict)) that can maybe happen with hash collisions and it becoming order sensitive. (I can't remember the full logic but I know it results in some weirdness for Sets)


Any of these options also open us up to having map(f, ::NamedTuple) work on the values (as per the fact that NamedTuples iterate values already.
And then to having map(f, pairs(nt)) for iterating namedtuples as pairs

@andyferris
Copy link
Member

I kind of think we should clean this up in Julia 2.0. It appears to me that a unified solution is likely to involve at least some breaking changes, and adding new functionality to Julia 1.x only increases the impact of any potential changes to the API.

(Also - I haven't been following closely for a while, but was there any thoughts on how many more 1.x releases there will be before we work on 2.0?)

@tkf
Copy link
Member

tkf commented Nov 1, 2019

What if it was lazy and produced a pair generator?

@StefanKarpinski Can we start this by defining

map(f, itrs...) = Base.Generator(f, itrs...)

in Base.Iterators? This way, people can use

using Base.Iterators: map, filter, flatten

as kind of using Future.

If we wanted map to work on values
then we do map(f, values(dict))

@oxinabox Does it return a Vector (or some plain iterable)? Maybe it should produce a ValueIterator so that we can get back a dictionary via Dict(zip(keys(dict), map(f, values(dict)))). Maybe it's nice to have parent(::AbstractDict) s.t. parent(values(dict)) === parent(keys(dict)) === dict. This way, we can do parent(map(f, values(dict))) to iterate over values of a dictionary and get another dictionary sharing keys with the original dictionary.

@goretkin
Copy link
Contributor

I am going to make the claim that the definition of map
is that it takes in an iterator, and returns a Vector.

(Sorry to kick up dust)
I think "take iterator, return Vector" should probably be the definition of collect, not map. See here: #39628

@garrison
Copy link
Member

garrison commented Oct 1, 2021

For others reading this thread, I want to point out what ideally should have been mentioned explicitly in a comment above, namely, that map!(fun, values(dict)) has been implemented beginning with julia 1.2 (#31223).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests