Add range indexing for NamedTuple #31423

eulerkochy · 2019-03-21T10:10:43Z

base/namedtuple.jl

Co-Authored-By: eulerkochy <[email protected]>

andyferris · 2019-03-22T09:23:26Z

This is not what I expected. I feel it makes sense for the output container to match the type and keys of the indexer, for various reasons.

base/namedtuple.jl

andyferris · 2019-03-22T21:59:12Z

Sorry @eulerkochy I didn't have much time to elaborate in more detail yesterday, so I suspect what I wrote was confusing! Basically, I'd like to see us preserve a certain relationship that we current have with (possibly offset) array indexing as we introduce non-scalar indexing for more containers. Firstly, if c = a[b] then we have c[i] = a[b[i]]. The indexes of the output container c come from b, but the values come from a. Actually, this is preserved in this PR, but there is another nuance - the type of container of c should match that of b. There are a few good reasons for that. Perhaps b is a distributed array and can't fit on a single computer - then the output c should also be distributed or the operation is not safe. Perhaps b is a static array and the compiler is able to use inference to statically infer the indices of c, so c could be a (faster) static array. Or the other way around - if the output c were a tuple but the indexer b was a vector, the compiler cannot infer the length (and type) of the tuple, making your code slow. (In addition to benchmarking a[b], you also need to consider the performance of using c later on in the same function by checking the accuracy of type inference - there are tools like @code_warntype and the Test module's @inferred assertion that can help you determine this).

For more context may I point you at #24019 and Indexing.jl for some examples. (I may have written more details elsewhere, I can't remember... I also might have mentioned this at JuliaCon 2018).

Anyway, choosing the output container type is a semantic choice. We should also allow indexing of named tuples with arrays if you want the output to be an array. I feel there should be a full set of indexing relationships in Base for AbstractArray, AbstractDict, Tuple and NamedTuple - what's here is just one of many possible combinations.

Finally, due to some ambiguities in dictionaries (and setindex!) I believe there was some plan to have a "non-scalar indexing" notation or operation or something - perhaps @mbauman knows what the plan is these days. (Matt - I kind-of assumed any new non-scalar indexing might end up there, is that right?)

andyferris · 2019-03-22T22:10:01Z

Regarding the last point, the pertinent issue is #30845

eulerkochy · 2019-03-23T05:10:08Z

the type of container of c should match that of b

What do you propose for this x = NamedTuple{(:a, :b, :c)}([[1, 2], 3, 4]) and then x[1:2].

andyferris · 2019-03-23T11:41:20Z

Possibly Any[[1,2], 3] since eltype(x) === Any

eulerkochy · 2019-03-23T13:49:21Z

@andyferris using the map implementation, we can achieve that, but it doesn't seem quite intuitive

andyferris · 2019-03-25T03:44:59Z

What would you intuit, @eulerkochy?

eulerkochy · 2019-03-25T06:46:44Z

I couldn't quite understand the need to make the outer container to be of the same type of the inner one, ofc there's no harm in doing that, but returning a Tuple also should have served the purpose. However, it seems, on benchmarking, making the container type same results in lesser allocations than returning a Tuple, which is the primary reason why I changed the implementation.

mbauman · 2019-03-25T13:55:22Z

This is not what I expected. I feel it makes sense for the output container to match the type and keys of the indexer, for various reasons.

The keys, yes, I agree. But I'm not as certain on the type of the indexer as a general rule.

Should this return an array or a tuple? Currently, indexing a tuple by a vector returns a tuple. It's a bit of a trap as it's type-unstable without using static arrays, but that may improve with constant propagation in the future (#31138). I think I'd expect NamedTuples to behave like Tuples here.

Even if we do decide that (1,2,3)[1:2] should return an array, that'd be a breaking change and would likely have a bigger impact than the NamedTuple behavior we're considering here. So I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

andyferris · 2019-03-25T21:02:06Z

@mbauman I feel there is much to be gained by giving the power to decide to the user. We currently need to support things like

x[::Vector] --> Vector
x[::StaticArray] --> StaticArray
x[::OffsetArray] --> OffsetArray

and it would be useful to generalize the pattern such that

x[::Tuple] --> Tuple
x[::NamedTuple] --> NamedTuple
x[::AbstractDict] --> AbstractDict

in which case the user is able to decide. If the user requires, for some reason, to get a tuple out they can just index with a tuple. And so-on. If they need something else they can use that. For the user to be empowered I think the pattern needs to be consistent. There are loads of cool stuff you can do with indexing - one of my favorite examples is indexing tables with other tables with a foreign key relationship to perform a relational inner join (using structures like NDSparse).

At the moment we support non-scalar indexing with arrays and kind-of, slightly, a little bit with tuples. (IIRC you can do tuple[1:2] but not (1:2)[tuple]...). So mostly this is new behavior. Regarding the breaking change you mention - I actually think we should defer this PR until #30845 is resolved and instead use the syntax a.[b] for all of this.

(In practice for 1D containers we possibly only need the surface syntax a.[b] to lower to broadcast(i -> getindex(a, i), b) which is pretty intuitive and easy to explain and has precisely the consequences I propose, and can participate in fusion and so-on. In my view the remaining work is deciding what to do about multidimensional non-scalar indexing).

mbauman · 2019-03-25T22:02:40Z

Yes, I agree with the end goal, but I just see a lot of breakage getting there. Thus:

I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

This is just one more method that'd need to be deprecated and an easy transition (especially compared to everything else).

andyferris · 2019-03-26T00:40:54Z

Maybe I wasn't clear - I really don't want to see any breakage. :)

I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

My view is that we should avoid adding any new non-scalar indexing behavior to getindex in Base, and that includes this PR. Doing so would just mean there is more user code out there that may need to be updated to whatever resolves #30845 at some point. My preference is to focus efforts on getting #30845 into the next Julia v1.x release.

add range indexing for NamedTuple

3945678

dkarrasch reviewed Mar 21, 2019

View reviewed changes

base/namedtuple.jl Outdated Show resolved Hide resolved

vtjnash reviewed Mar 21, 2019

View reviewed changes

base/namedtuple.jl Outdated Show resolved Hide resolved

Update base/namedtuple.jl

3fec24f

Co-Authored-By: eulerkochy <[email protected]>

andyferris reviewed Mar 22, 2019

View reviewed changes

base/namedtuple.jl Outdated Show resolved Hide resolved

Use map

6a7c986

vtjnash mentioned this pull request Jul 25, 2019

Support vectors as indices to NamedTuple #32664

Open

mcabbott mentioned this pull request Jan 6, 2021

Tuple and NamedTuple indexing corner cases #29417

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add range indexing for NamedTuple #31423

Add range indexing for NamedTuple #31423

eulerkochy commented Mar 21, 2019

andyferris commented Mar 22, 2019

andyferris commented Mar 22, 2019

andyferris commented Mar 22, 2019

eulerkochy commented Mar 23, 2019 •

edited

Loading

andyferris commented Mar 23, 2019

eulerkochy commented Mar 23, 2019

andyferris commented Mar 25, 2019

eulerkochy commented Mar 25, 2019

mbauman commented Mar 25, 2019

andyferris commented Mar 25, 2019

mbauman commented Mar 25, 2019

andyferris commented Mar 26, 2019 •

edited

Loading

Add range indexing for NamedTuple #31423

Are you sure you want to change the base?

Add range indexing for NamedTuple #31423

Conversation

eulerkochy commented Mar 21, 2019

andyferris commented Mar 22, 2019

andyferris commented Mar 22, 2019

andyferris commented Mar 22, 2019

eulerkochy commented Mar 23, 2019 • edited Loading

andyferris commented Mar 23, 2019

eulerkochy commented Mar 23, 2019

andyferris commented Mar 25, 2019

eulerkochy commented Mar 25, 2019

mbauman commented Mar 25, 2019

andyferris commented Mar 25, 2019

mbauman commented Mar 25, 2019

andyferris commented Mar 26, 2019 • edited Loading

eulerkochy commented Mar 23, 2019 •

edited

Loading

andyferris commented Mar 26, 2019 •

edited

Loading