Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keys(::Array) and values(::Array) #12811

Closed
wants to merge 2 commits into from
Closed

Conversation

joehuchette
Copy link
Member

This PR adds methods for keys and values for Arrays. The keys implementation is merely a CartesianRange, while the values implementation uses a new lightweight wrapper around the Array. I did it this way because just using values(d::Array) = d means you could change the contents of d by what is returned from values, while values(d::Array) = copy(d) seemed wasteful. This could potentially be widened to AbstractArray, but this doesn't seem like the right way to do it for something like a sparse matrix, for example.

@kmsquire
Copy link
Member

While I can see the analogy with keys and values in dictionaries, I'm wondering what your use case is...?

@joehuchette
Copy link
Member Author

In JuMP, we have a macro that returns either an Array, or a custom type that wraps around an Array or Dict. For the custom types, computing keys can be more expensive than values, so it's nice to be able to decouple them. Ideally we'd like to make it possible to program generically, regardless of what type the macro gives you back, hence the new methods, which didn't seem appropriate to put in a package.

@joehuchette
Copy link
Member Author

Also, the docs are written like

help?> values
search: values

  values(collection)

  Return an iterator over all values in a collection. collect(values(d)) returns an array of
  values.

so if this isn't added to Base the help entries should probably be updated to be more specific.

@kmsquire
Copy link
Member

Makes sense, thanks!

@nalimilan
Copy link
Member

Would that mean that eachindex and keys would be almost always equivalent? (Cf. #11708 about extending eachindex to iterables, including dicts).

@StefanKarpinski
Copy link
Member

I wouldn't hate using keys instead of eachindex. cc @JeffBezanson

@ivarne
Copy link
Member

ivarne commented Aug 26, 2015

I agree that keys and eachindex have kind of overlapping functionality, and that they possibly could be merged.

Defining something like

keys(a::AbstractArray) = eachindex(a)
eachindex(d::Associcative) = keys(d)

Seems rather silly though.

@nalimilan
Copy link
Member

The problem is, neither keys nor eachindex is a natural term in each other's context. Maybe indexes would suit both uses.

@StefanKarpinski
Copy link
Member

indexes and values is a pretty good pair of names.

eltype{T,N}(::Type{ArrayIterator{T,N}}) = T

keys(x::Array) = CartesianRange(size(x))
values(x::Array) = ArrayIterator(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this equivalent to just values(x::Array) = x? (e.g. an array is its own iterator?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is not to allow mutation of the underlying array with something like values(x)[2] = 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have this at all, I would prefer values(x::AbstractArray) = x; if the main purpose is to use this as an iterator, then I don't understand why you would need to prevent mutation. for x in values(a) or for x in a already doesn't allow mutation (unless x is mutable, of course).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its main purpose is for iteration, but that's certainly not the only way it could be used. Maybe this approach is too paternalistic, though.

@timholy
Copy link
Member

timholy commented Aug 27, 2015

What would keys return for a DataFrame? Column names? Or column indexes?

isempty(v::ArrayIterator) = isempty(v.array)
eltype{T,N}(::Type{ArrayIterator{T,N}}) = T

keys(x::Array) = CartesianRange(size(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keys(x::AbstractArray) = eachindex(x)?

And since arrays are already iterable, why not just values(x::AbstractArray) = x? (discussed below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait, eachindex(x) just gives the linear indexing 1:n. I prefer the multidimensional Cartesian range since you can use the individual components directly:

for I in keys(x)
    x[I] += c[I[1]]
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eachindex chooses the most efficient index iterator for the given array. For Array and other contiguous arrays (LinearFast()), that's simply 1:n. But for SparseMatrixCSC and other non-contiguous arrays (LinearSlow()), it's a cartesian range.

@joehuchette
Copy link
Member Author

@timholy it seems keys/values were deprecated from DataFrames in JuliaData/DataFrames.jl@13b4776.

@stevengj
Copy link
Member

So is the proposal to deprecate both keys and eachindex in favor of indexes (or indices?). I'm not sure I see the advantage of that over eachindex, and keys is pretty widespread in a dictionary context.

@jdlangs
Copy link
Contributor

jdlangs commented Aug 27, 2015

It makes sense for eachindex (or indexes) to be the most generic method that covers all collections. I don't think that implies keys should be dropped for associative collections though, with all the API breakage that would entail.

Using keys for non-associative types such as Array feels like it's stretching the meaning a little too much. Collection-agnostic code should probably be encouraged to use the index iterator.

@timholy
Copy link
Member

timholy commented Aug 27, 2015

@joehuchette my point was more that this seems more ambiguous for anything which can be indexed in more than one way. (DataFrames would tend to follow master, of course.)

@johnmyleswhite
Copy link
Member

I think DataFrames can be misleading since it's unclear whether you should care about rows or columns. It's not really amenable to any kind of generic programming.

@joehuchette
Copy link
Member Author

I've updated the PR to just

keys(  x::Array) = CartesianRange(size(x))
values(x::Array) = x

@timholy I take your point about ambiguity, but I'm not sure it's much of an issue for Arrays. Though linear indexing may be possible (and how it's actually represented in memory), the user-facing interface treats it as a multidimensional object (the dimension N parameterizes the type, after all!).

@mbauman
Copy link
Member

mbauman commented Aug 27, 2015

I can see the case for this — there are times when you want an array to serve as a LUT, so you think of the indices more as keys, and want to be able to generically switch to a dictionary if you need. That said, using for i in keys(A) in mathy code seems rather oblique. And keys is definitely the term of art for associative types.

It seems like a failure of dispatch that keys(x) would almost always equal eachindex(x)… unless some custom type has forgotten to define one or the other, in which case you get a method error. If we want to support both, I think one should be considered the "canonical" method, and the other have a fallback for all types, e.g., keys(x) = eachindex(x).

@joehuchette
Copy link
Member Author

@mbauman you did a nice job articulating my particular use case: I want to treat an Array as a look-up table or associative object. This is in DSL, not math-y, code, and being able to destructure the "key" into components is important. Hence my preference for returning CartesianRange over 1:n.

I think the keys(x) = eachindex(x) fallback seems like a reasonable approach, and then custom types can provide specialized methods if there is a more natural associative mapping.

@Keno
Copy link
Member

Keno commented Sep 3, 2017

This will happen as part of #22907.

@Keno Keno closed this Sep 3, 2017
@DilumAluthge DilumAluthge deleted the jh/keys-and-vals branch March 25, 2021 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.