Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range indexing for NamedTuple #31423

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

eulerkochy
Copy link
Contributor

Fixes #27021 .

base/namedtuple.jl Outdated Show resolved Hide resolved
base/namedtuple.jl Outdated Show resolved Hide resolved
Co-Authored-By: eulerkochy <[email protected]>
@andyferris
Copy link
Member

This is not what I expected. I feel it makes sense for the output container to match the type and keys of the indexer, for various reasons.

base/namedtuple.jl Outdated Show resolved Hide resolved
@andyferris
Copy link
Member

Sorry @eulerkochy I didn't have much time to elaborate in more detail yesterday, so I suspect what I wrote was confusing! Basically, I'd like to see us preserve a certain relationship that we current have with (possibly offset) array indexing as we introduce non-scalar indexing for more containers. Firstly, if c = a[b] then we have c[i] = a[b[i]]. The indexes of the output container c come from b, but the values come from a. Actually, this is preserved in this PR, but there is another nuance - the type of container of c should match that of b. There are a few good reasons for that. Perhaps b is a distributed array and can't fit on a single computer - then the output c should also be distributed or the operation is not safe. Perhaps b is a static array and the compiler is able to use inference to statically infer the indices of c, so c could be a (faster) static array. Or the other way around - if the output c were a tuple but the indexer b was a vector, the compiler cannot infer the length (and type) of the tuple, making your code slow. (In addition to benchmarking a[b], you also need to consider the performance of using c later on in the same function by checking the accuracy of type inference - there are tools like @code_warntype and the Test module's @inferred assertion that can help you determine this).

For more context may I point you at #24019 and Indexing.jl for some examples. (I may have written more details elsewhere, I can't remember... I also might have mentioned this at JuliaCon 2018).

Anyway, choosing the output container type is a semantic choice. We should also allow indexing of named tuples with arrays if you want the output to be an array. I feel there should be a full set of indexing relationships in Base for AbstractArray, AbstractDict, Tuple and NamedTuple - what's here is just one of many possible combinations.

Finally, due to some ambiguities in dictionaries (and setindex!) I believe there was some plan to have a "non-scalar indexing" notation or operation or something - perhaps @mbauman knows what the plan is these days. (Matt - I kind-of assumed any new non-scalar indexing might end up there, is that right?)

@andyferris
Copy link
Member

Regarding the last point, the pertinent issue is #30845

@eulerkochy
Copy link
Contributor Author

eulerkochy commented Mar 23, 2019

the type of container of c should match that of b

What do you propose for this x = NamedTuple{(:a, :b, :c)}([[1, 2], 3, 4]) and then x[1:2].

@andyferris
Copy link
Member

Possibly Any[[1,2], 3] since eltype(x) === Any

@eulerkochy
Copy link
Contributor Author

@andyferris using the map implementation, we can achieve that, but it doesn't seem quite intuitive

@andyferris
Copy link
Member

What would you intuit, @eulerkochy?

@eulerkochy
Copy link
Contributor Author

I couldn't quite understand the need to make the outer container to be of the same type of the inner one, ofc there's no harm in doing that, but returning a Tuple also should have served the purpose. However, it seems, on benchmarking, making the container type same results in lesser allocations than returning a Tuple, which is the primary reason why I changed the implementation.

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 25, 2019

This is not what I expected. I feel it makes sense for the output container to match the type and keys of the indexer, for various reasons.

The keys, yes, I agree. But I'm not as certain on the type of the indexer as a general rule.

Should this return an array or a tuple? Currently, indexing a tuple by a vector returns a tuple. It's a bit of a trap as it's type-unstable without using static arrays, but that may improve with constant propagation in the future (#31138). I think I'd expect NamedTuples to behave like Tuples here.

Even if we do decide that (1,2,3)[1:2] should return an array, that'd be a breaking change and would likely have a bigger impact than the NamedTuple behavior we're considering here. So I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

@andyferris
Copy link
Member

@mbauman I feel there is much to be gained by giving the power to decide to the user. We currently need to support things like

x[::Vector] --> Vector
x[::StaticArray] --> StaticArray
x[::OffsetArray] --> OffsetArray

and it would be useful to generalize the pattern such that

x[::Tuple] --> Tuple
x[::NamedTuple] --> NamedTuple
x[::AbstractDict] --> AbstractDict

in which case the user is able to decide. If the user requires, for some reason, to get a tuple out they can just index with a tuple. And so-on. If they need something else they can use that. For the user to be empowered I think the pattern needs to be consistent. There are loads of cool stuff you can do with indexing - one of my favorite examples is indexing tables with other tables with a foreign key relationship to perform a relational inner join (using structures like NDSparse).

At the moment we support non-scalar indexing with arrays and kind-of, slightly, a little bit with tuples. (IIRC you can do tuple[1:2] but not (1:2)[tuple]...). So mostly this is new behavior. Regarding the breaking change you mention - I actually think we should defer this PR until #30845 is resolved and instead use the syntax a.[b] for all of this.

(In practice for 1D containers we possibly only need the surface syntax a.[b] to lower to broadcast(i -> getindex(a, i), b) which is pretty intuitive and easy to explain and has precisely the consequences I propose, and can participate in fusion and so-on. In my view the remaining work is deciding what to do about multidimensional non-scalar indexing).

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 25, 2019

Yes, I agree with the end goal, but I just see a lot of breakage getting there. Thus:

I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

This is just one more method that'd need to be deprecated and an easy transition (especially compared to everything else).

@andyferris
Copy link
Member

andyferris commented Mar 26, 2019

Maybe I wasn't clear - I really don't want to see any breakage. :)

I'm leaning to favor a consistency with tuples in the near term (1.x) over the potential breaking change in the long term (2.0+).

My view is that we should avoid adding any new non-scalar indexing behavior to getindex in Base, and that includes this PR. Doing so would just mean there is more user code out there that may need to be updated to whatever resolves #30845 at some point. My preference is to focus efforts on getting #30845 into the next Julia v1.x release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants