Add keys(::Array) and values(::Array) #12811

joehuchette · 2015-08-26T03:46:22Z

This PR adds methods for keys and values for Arrays. The keys implementation is merely a CartesianRange, while the values implementation uses a new lightweight wrapper around the Array. I did it this way because just using values(d::Array) = d means you could change the contents of d by what is returned from values, while values(d::Array) = copy(d) seemed wasteful. This could potentially be widened to AbstractArray, but this doesn't seem like the right way to do it for something like a sparse matrix, for example.

kmsquire · 2015-08-26T04:15:58Z

While I can see the analogy with keys and values in dictionaries, I'm wondering what your use case is...?

joehuchette · 2015-08-26T04:33:46Z

In JuMP, we have a macro that returns either an Array, or a custom type that wraps around an Array or Dict. For the custom types, computing keys can be more expensive than values, so it's nice to be able to decouple them. Ideally we'd like to make it possible to program generically, regardless of what type the macro gives you back, hence the new methods, which didn't seem appropriate to put in a package.

joehuchette · 2015-08-26T04:36:33Z

Also, the docs are written like

help?> values
search: values

  values(collection)

  Return an iterator over all values in a collection. collect(values(d)) returns an array of
  values.

so if this isn't added to Base the help entries should probably be updated to be more specific.

kmsquire · 2015-08-26T05:05:10Z

Makes sense, thanks!

nalimilan · 2015-08-26T12:57:14Z

Would that mean that eachindex and keys would be almost always equivalent? (Cf. #11708 about extending eachindex to iterables, including dicts).

StefanKarpinski · 2015-08-26T13:34:02Z

I wouldn't hate using keys instead of eachindex. cc @JeffBezanson

ivarne · 2015-08-26T18:46:31Z

I agree that keys and eachindex have kind of overlapping functionality, and that they possibly could be merged.

Defining something like

keys(a::AbstractArray) = eachindex(a)
eachindex(d::Associcative) = keys(d)

Seems rather silly though.

nalimilan · 2015-08-26T19:15:20Z

The problem is, neither keys nor eachindex is a natural term in each other's context. Maybe indexes would suit both uses.

StefanKarpinski · 2015-08-26T20:03:10Z

indexes and values is a pretty good pair of names.

vtjnash · 2015-08-27T00:24:46Z

base/array.jl

+eltype{T,N}(::Type{ArrayIterator{T,N}}) = T
+
+keys(x::Array) = CartesianRange(size(x))
+values(x::Array) = ArrayIterator(x)


isn't this equivalent to just values(x::Array) = x? (e.g. an array is its own iterator?)

The idea is not to allow mutation of the underlying array with something like values(x)[2] = 3

If we have this at all, I would prefer values(x::AbstractArray) = x; if the main purpose is to use this as an iterator, then I don't understand why you would need to prevent mutation. for x in values(a) or for x in a already doesn't allow mutation (unless x is mutable, of course).

Its main purpose is for iteration, but that's certainly not the only way it could be used. Maybe this approach is too paternalistic, though.

timholy · 2015-08-27T09:15:54Z

What would keys return for a DataFrame? Column names? Or column indexes?

stevengj · 2015-08-27T15:59:35Z

base/array.jl

+isempty(v::ArrayIterator) = isempty(v.array)
+eltype{T,N}(::Type{ArrayIterator{T,N}}) = T
+
+keys(x::Array) = CartesianRange(size(x))


Why not keys(x::AbstractArray) = eachindex(x)?

~~And since arrays are already iterable, why not just values(x::AbstractArray) = x?~~ (discussed below)

I'm fine with that as well.

Actually wait, eachindex(x) just gives the linear indexing 1:n. I prefer the multidimensional Cartesian range since you can use the individual components directly:

for I in keys(x) x[I] += c[I[1]] end

eachindex chooses the most efficient index iterator for the given array. For Array and other contiguous arrays (LinearFast()), that's simply 1:n. But for SparseMatrixCSC and other non-contiguous arrays (LinearSlow()), it's a cartesian range.

joehuchette · 2015-08-27T16:33:08Z

@timholy it seems keys/values were deprecated from DataFrames in JuliaData/DataFrames.jl@13b4776.

stevengj · 2015-08-27T16:33:36Z

So is the proposal to deprecate both keys and eachindex in favor of indexes (or indices?). I'm not sure I see the advantage of that over eachindex, and keys is pretty widespread in a dictionary context.

jdlangs · 2015-08-27T16:59:13Z

It makes sense for eachindex (or indexes) to be the most generic method that covers all collections. I don't think that implies keys should be dropped for associative collections though, with all the API breakage that would entail.

Using keys for non-associative types such as Array feels like it's stretching the meaning a little too much. Collection-agnostic code should probably be encouraged to use the index iterator.

timholy · 2015-08-27T18:16:26Z

@joehuchette my point was more that this seems more ambiguous for anything which can be indexed in more than one way. (DataFrames would tend to follow master, of course.)

johnmyleswhite · 2015-08-27T18:27:00Z

I think DataFrames can be misleading since it's unclear whether you should care about rows or columns. It's not really amenable to any kind of generic programming.

joehuchette · 2015-08-27T19:16:22Z

I've updated the PR to just

keys(  x::Array) = CartesianRange(size(x))
values(x::Array) = x

@timholy I take your point about ambiguity, but I'm not sure it's much of an issue for Arrays. Though linear indexing may be possible (and how it's actually represented in memory), the user-facing interface treats it as a multidimensional object (the dimension N parameterizes the type, after all!).

mbauman · 2015-08-27T19:33:13Z

I can see the case for this — there are times when you want an array to serve as a LUT, so you think of the indices more as keys, and want to be able to generically switch to a dictionary if you need. That said, using for i in keys(A) in mathy code seems rather oblique. And keys is definitely the term of art for associative types.

It seems like a failure of dispatch that keys(x) would almost always equal eachindex(x)… unless some custom type has forgotten to define one or the other, in which case you get a method error. If we want to support both, I think one should be considered the "canonical" method, and the other have a fallback for all types, e.g., keys(x) = eachindex(x).

joehuchette · 2015-08-27T19:47:46Z

@mbauman you did a nice job articulating my particular use case: I want to treat an Array as a look-up table or associative object. This is in DSL, not math-y, code, and being able to destructure the "key" into components is important. Hence my preference for returning CartesianRange over 1:n.

I think the keys(x) = eachindex(x) fallback seems like a reasonable approach, and then custom types can provide specialized methods if there is a more natural associative mapping.

Keno · 2017-09-03T00:26:23Z

This will happen as part of #22907.

add keys(::Array) and values(::Array)

ddee557

vtjnash reviewed Aug 27, 2015
View reviewed changes

stevengj reviewed Aug 27, 2015
View reviewed changes

simplify values(::Array) to the identity

2003c88

Keno closed this Sep 3, 2017

DilumAluthge deleted the jh/keys-and-vals branch March 25, 2021 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keys(::Array) and values(::Array) #12811

Add keys(::Array) and values(::Array) #12811

joehuchette commented Aug 26, 2015

kmsquire commented Aug 26, 2015

joehuchette commented Aug 26, 2015

joehuchette commented Aug 26, 2015

kmsquire commented Aug 26, 2015

nalimilan commented Aug 26, 2015

StefanKarpinski commented Aug 26, 2015

ivarne commented Aug 26, 2015

nalimilan commented Aug 26, 2015

StefanKarpinski commented Aug 26, 2015

vtjnash Aug 27, 2015

joehuchette Aug 27, 2015

stevengj Aug 27, 2015

joehuchette Aug 27, 2015

timholy commented Aug 27, 2015

stevengj Aug 27, 2015

joehuchette Aug 27, 2015

joehuchette Aug 27, 2015

mbauman Aug 27, 2015

joehuchette commented Aug 27, 2015

stevengj commented Aug 27, 2015

jdlangs commented Aug 27, 2015

timholy commented Aug 27, 2015

johnmyleswhite commented Aug 27, 2015

joehuchette commented Aug 27, 2015

mbauman commented Aug 27, 2015

joehuchette commented Aug 27, 2015

Keno commented Sep 3, 2017

Add keys(::Array) and values(::Array) #12811

Add keys(::Array) and values(::Array) #12811

Conversation

joehuchette commented Aug 26, 2015

kmsquire commented Aug 26, 2015

joehuchette commented Aug 26, 2015

joehuchette commented Aug 26, 2015

kmsquire commented Aug 26, 2015

nalimilan commented Aug 26, 2015

StefanKarpinski commented Aug 26, 2015

ivarne commented Aug 26, 2015

nalimilan commented Aug 26, 2015

StefanKarpinski commented Aug 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy commented Aug 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joehuchette commented Aug 27, 2015

stevengj commented Aug 27, 2015

jdlangs commented Aug 27, 2015

timholy commented Aug 27, 2015

johnmyleswhite commented Aug 27, 2015

joehuchette commented Aug 27, 2015

mbauman commented Aug 27, 2015

joehuchette commented Aug 27, 2015

Keno commented Sep 3, 2017