-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map over a KeySet unexpectedly returns a Set #26359
Comments
As with other languages, Dict key iteration order is undefined and unstable (permitted to change from one call to the next). |
Yet: values(a::AbstractDict)
Return an iterator over all values in a collection. collect(values(a))
returns an array of values. Since the values are stored internally in a hash
table, the order in which they are returned may vary. But keys(a) and
values(a) both iterate a and return the elements in the same order. EDIT: IOW, we're supposed to be able to call Also FWIW the original code in DataFrames did this: function $funname($(values(membernames)...))
$body
end
$funname($(map(keys(membernames)) do key
:($d[$key]) |
The observed result is from |
Ah, good catch, so it's just due to the fact that calling So I guess there's nothing to change here and DataFramesMeta should just use a comprehension? It's a bit annoying that |
It's not entirely clear to me that map over a set should produce a set. |
Possibly introduced by #15146? I don't think there's any advantage to losing the iteration order either, but that's the way it seems to be going for AbstractSet and AbstractDict. |
Actually part of the issue seems to be that |
That's basically right; I envisaged the change to To clarify what my intentions were (and the behavior I still strongly desire): I want Thus, what I want is that
Exactly - how do we know that the mapped values won't overlap? We didn't end up going that way, which is fine, but it would be nice to have a look over this again and figure out what makes sense. For the record, my recommendations: d = Dict("key1" => "value1", "key2" => "value2")
keys(d) == Dict("key1" => "key1", "key2" => "key2") # Not necessarily an actual `Dict`, of some other dictionary type analogous to how `OneTo` is an `AbstractArray`, etc.
pairs(d) == Dict("key1" => ("key1" => "value1"), "key2" => ("key2" => "value2")) # Not necessarily an actual `Dict`, just some "view" like we could do for arrays I'll give an example of what this allows us to do, for example for # A dictionary matching name and occupation
d = Dict("Alice" => "Physicist", "Eve" => "Spy")
# Map the dictionary values and make a new dictionary with the same keys
map(job -> "I am a $job", d) == Dict("Alice" => "I am a Physicist", "Eve" => "I am a Spy")
# Map the dictionary keys and make a new dictionary with the same keys
map(name -> "My name is $name", keys(d)) == Dict("Alice" => "My name is Alice", "Eve" => "My name is Eve")
# Map the dictionary key-value pairs and make a new dictionary with the same keys
map(pair -> "My name is $(pair.first) and I am a $(pair.second)", pairs(d)) == Dict("Alice" => "My name is Alice and I am a Physicist", "Eve" => "My name is Eve and I am a Spy") A final example of why I like my proposed behavior is grouping (aka "groupby"). Say I have a table, dictionary, or whatever with peoples names, gender and height like this:
I can perform a grouping dicarding the names and collecting the ages by gender. I played with prototypes in SplitApplyCombine.jl and found I could make a nice function to turn this into something like: gender_vs_heights == Dict("Female" => [1.58, ...], "Male" => [1.78, ...]) I then realized - what then? I couldn't do non-scalar indexing on the resulting container (hence #24019 and Indexing.jl), I couldn't use map in an ergonomic way (hence #25013 with a view to iterate values in the future), I couldn't use broadcast in an ergonomic way (hence #25904). In this example, what I really, desperately, want to do is this: gender_vs_mean_height = mean.(gender_vs_heights) I feel this example is literally screaming out for this syntax and behavior, but maybe that's just me. I don't really know where we'll go from here, but I do feel |
Thanks for the explanation! I'm not sure I can discuss the overall design right now, but speaking specifically of the present issue, how would you envision fixing the annoying fact that map(name -> "My name is $name", keys(d)) == Dict("Alice" => "My name is Alice", "Eve" => "My name is Eve") then the resulting |
What's missing from
For most cases (including Dict) we can do it, the question is really whether it should be mandated of all containers. The downside I see is that it might be less composable, since we'd be requiring that all iterable containers have indices to be used with map, ruling out the ability to map over a Generator / Zip / Product / other lazy structure (as seen with #18618, the counter-proposal to Andy's above issue numbers) |
One nice thing about having I like the picture painted by @andyferris , but I don't fully understand the thing about index sets mapping values to themselves. For example, an OffsetArray can have |
That's precisely how we represent
|
We don't do that for
We've committed to using the rule that the
|
Isn't Slice representing the inverse operation (remapping 1:N to the parent indices, rather than mapping N:M to 1:(M - N), or are those equivalent since you shouldn't be trying to index into the array axes anyways)?
I don't think that's the relevant issue here. The gotcha from your OP was that
Full grid of these at #25904 (comment) – if I missed a case, would be good to add it, or move that to a gist to extend it more. :) |
We should perhaps have a rule that |
Like mentioned, this is in analogy to arrays (which are particularly awesome in Julia, BTW). Why is I would relate these questions directly to #24019, where the proposed semantics for generalized non-scalar getindex is that the output of Regarding
and
My opinion on these isn't quite as specific. @nalimilan For the example I gave, it is feasible to copy the internal structure of the input dictionary to preserve the ordering (it's faster than reconstructing a new hash map anyway, for example). My vague preference is we can enforce that
Again, this is mostly the analogy to |
As I brought up on slack, here's another example where
Not sure I fully understood all of the back and forth, but this seems like an argument for
|
I still think we should consider the "map preserves indices for collections with indices and preservers iteration order for collections without indices" rule. It seems like the one that works the best to me. |
Triage thinks that at the very least, we should deprecate map on Set, tagging as 1.0. |
`map` on sets previously returned a `Set`, possibly changing the order or number of elements. This behavior is deprecated and in the future `map` will preserve order and number of elements. Fixes #26359.
Fascinating. I came via #42132 through #5794 through a6c4691 to here. I sympathise with the dream of more array like collections. What I fear might have been delayed along the way is the principle of least surprise that makes composability worth using. I say "delayed" because the paths anticipated by this issue might not be undermined by just patching up what we have now. By that, I mean that currently My bias is a long history of bare-metal type C programming, only really getting into collections and iterators with maybe Rust, then seeing the forest for the trees with Haskell, and finally falling in love with multiple dispatch in Julia. So I find it jarring that The Haskell docs on mapping a set may be illuminating here. For those unfamiliar with the highly unintuitive Haskell type signatures, So given that history, I've come to expect that iterating a vector/list/array gives elements of the array type in a certain order; iterating a set gives elements of the set type in an unpredictable order, and iterating a dict gives (key, value) pairs in an unpredictable order (although often the key is suppressed, which is consistent with iterating an array not returning index/value pairs). Starting with the set case, because the implications are much simpler, this would definitely provide the least surprise for me:
That this is may result in a smaller number of elements seems entirely normal to me, since I know that when I use Sets, adding an element may not change the Set size (and may re-order the collection!). The implications of uniqueness are front of mind, and desirable. If I wanted instead the results of the function applied to every element, as described in @kescobo 's example, I would happily make that explicit:
For a dict there's a couple of options, but only those that return a dict seem reasonable to me. Again, it may be useful to check the Haskell docs. Mind the confusion over the fact that the Haskell dict is called a The examples from @andyferris would then look much the same:
So I guess what I'm saying is that restoring the equivalent of:
seems to me to be an improvement right now, that doesn't interrupt progress on other ideas in this issue, provided there's agreement that Edit: argh, just noticed this issue is closed. Happy to pick it up somewhere else? |
Appologies, I realize this issue is closed but it seemed most straightforward to reply to @hraftery here. I think it's desirable that users easily have access to both of those semantics (mapping sets to sets, and mapping sets to dictionaries). We now have
Note that the meaning of |
Heh, interesting @andyferris . Funnily enough I did turn to I already find the subtle differences between
Oh I have to admit this is beyond me - I only pulled the example from the changelog, not fully understanding what it does. |
This is a regression from 0.6.2 which affects DataFramesMeta (JuliaData/DataFramesMeta.jl#88). Under some very particular circumstances, expression interpolation reverses the order of keys in a dictionary, compared with the order obtained in other places inside the same function. This is a problem when the function relies on the order of keys matching that of values (as with DataFrames macros).
In the following example,
:a
appears before:b
in the two first lines inside the quoted block, but that's the contrary for the third line.The text was updated successfully, but these errors were encountered: