Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate conversion to Int for "non-obvious" index types #10458

Merged
merged 1 commit into from
Mar 12, 2015

Conversation

jakebolewski
Copy link
Member

  • Deprecate to_index(::Real) in favor of to_index(::Integer).
  • In the future require index values which are floating point, rationals, etc. to be explicitly converted to Int.
  • Delete tests that currently test for floating point indexing support.

Closes #10154

@jakebolewski jakebolewski changed the title Deprecate conversion of indexes not subtypes of Integer Deprecate conversion to Int for "non-obvious" index types Mar 9, 2015
@@ -451,3 +451,27 @@ function push!(A)
depwarn("push!(A) has been deprecated", :push!)
A
end

typealias DepIdxTypes Union(FloatingPoint,MathConst,Rational)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you don't actually use this alias?

@mbauman
Copy link
Member

mbauman commented Mar 9, 2015

👍

deprecate `to_index(::Real)` in favor of `to_index(::Integer)`
remove tests that exercise floating point indexing
minor doc change, add NEWS item
@mbauman
Copy link
Member

mbauman commented Mar 9, 2015

Also, perhaps you should leave a note in deprecated.jl to "tighten up" the signatures of get/setindex once the Real to_index methods are removed. E.g., ack '(get|set)index.*\bReal\b' shows 52 lines to change.

@timholy
Copy link
Member

timholy commented Mar 9, 2015

I think I like this idea, but I'd ask to hold off on merging it. It may indeed be that this is the right direction to go in, but I'd like to make sure that folks have thought this through very carefully. In particular, I strongly advise against some vague notion of "cleanliness" or "minimalism" in making this decision: the much more important goal is handling ambiguities that arise for package authors when making argument 1 more specialized but argument 2 less specialized. Related: #10440.

In other words, I think we have to consider how do/will package authors overload getindex and setindex!? Can we set up base in a manner that decreases the likelihood of packages needing a half-screenful of

# Avoid ambiguity warning with blah blah
getindex(A::MyArray, ::Int) = ...
# Avoid ambiguity warning with ...
getindex(A::MyArray, ...)
# Avoid ...
...
# Now here is finally the definition I really wanted:
getindex(A::MyArray, ...)

Some of those "interception" definitions will be unavoidable, but requiring a whole heap of them gets pretty annoying.

As one example of where I think this gets complicated, for an InterpolationArray you'd want to be able to define indexing with FloatingPoint types. But you'd also like to allow DualNumbers as a really sweet way of testing your calculation of derivatives, and a DualNumber is not even a Real. So you'd want to define a method

getindex(A::InterpolationArray, x::Number...)

That works fine if Base defines a method

getindex(A::AbstractArray, x::Number...)

because argument 1 is more specialized, and the remaining arguments are "equally specialized", so this one definition suffices to control dispatch. But it will be ambiguous against

getindex(A::AbstractArray, x::Int...)

because the arguments have opposite specialization behavior. You could solve that particular ambiguity with a

getindex(A::InterpolationArray, x::Int...)

but having

getindex(A::AbstractArray, x::Real...)

is probably the worst of all worlds.

What makes this problem somewhat harder is that there is another goal that seems, in practice, to be partially competing. It would be really nice if most package authors only had to define indexing for scalar types, and that Base handles "hard stuff" like this:

getindex(A::AbstractArray, x::Union(ScalarType, AbstractVector)...)   # mixed scalar/vector indexing
getindex(A::AbstractArray, x::Union(ScalarType, CartesianIndex)...)   # fancy scalar indexing, coming soon

automatically for most array types. To disambiguate those, you need to have

getindex(A::AbstractArray, x::ScalarType...)

defined. If we make ScalarType=Int, this is likely to reduce some class of ambiguities but forces authors of an InterpolationArray, because they need to support something broader than Int, to write definitions to handle these "hard" cases. In other words, indexing over regions or lumping multiple axes into a single object are difficult tasks that, if made too narrow, cannot apply broadly to all AbstractArrays.

In other words, I think this is a complicated problem 😄. I've only just begun to think about this seriously, but perhaps someone out there has a clear analysis or argument that points the way to the fewest ambiguities for package authors, without making Base overly fussy for users? If so, please do share 😄.

Again, any change we make here is guaranteed to cause ambiguity warnings for package authors, no matter what kind of change we make. So, let's make the change just once in the 0.4 cycle, because (in my opinion) ambiguities are in practice among the more annoying problems to cope with.

@timholy
Copy link
Member

timholy commented Mar 9, 2015

I should add that my basic intuition is that we want to migrate Base to the extremes: for any methods for a specific array type, the type of the indexes should also be as narrowly-defined as is reasonable. For any method that we'd like to define as a fallback for all AbstractArrays (for example mixed scalar/vector indexing or very fancy scalar indexing), we should make the scalar type be as broad as possible (e.g., Number). That way it seems certain that package authors can overload without fear of ambiguity.

@JeffBezanson
Copy link
Member

JeffBezanson commented Mar 10, 2015 via email

@timholy
Copy link
Member

timholy commented Mar 10, 2015

I think it's worth considering whether we should define

getindex(A::AbstractArray, indexes::Union(ScalarType,AbstractVector)...)

because that would mean users wouldn't have to figure out how to implement indexing over a region---most users would only have to define scalar indexing and the rest would follow automatically from this fallback. But if we do that, we probably need to at least specify a "stub" method

getindex(A::AbstractArray, indexes::ScalarType...) = error("getindex not defined for ", typeof(A))

to avoid ambiguities with

getindex(A::MyArray, indexes::Union(ScalarType, SomeOtherType)...).

Having thought about this a little more: I think the design principle is to minimize the number of nodes in the method graph that possess branches.

@timholy
Copy link
Member

timholy commented Mar 10, 2015

In particular, the definition will soon be

getindex(A::AbstractArray, indexes::Union(ScalarType,AbstractVector)...) = slice(A, indexes...)

(or sub depending on people's taste).

@mbauman
Copy link
Member

mbauman commented Mar 10, 2015

Very sensible and thoughtful, as usual, @timholy. Since packages (and base) have stabilized on ScalarType == Real, defining getindex(::AbstractArray, ::Union(Integer, AbstractVector, Colon)...) yields many pages of ambiguities against just about every single AbstractArray out there since we've just upped the precedence of Arg2 over Real.

Which is why I actually think the course of action here (doing the deprecation within to_index, without sharpening the method signatures) is very sensible.

I also really like the idea of giving AbstractArrays fancy indexing behavior by default, only requiring getindex(x, ::Int) for fast linear indexing arrays, or with the exact number of dimensions otherwise. Perhaps the best way to achieve this would be to only define getindex(A::AbstractArray, idxs...), which would dispatch to an internal function that we can more freely define methods for without worrying about ambiguity.

@timholy
Copy link
Member

timholy commented Mar 10, 2015

I initially assumed the scope of this PR was bigger, which is part of why I was a bit concerned. I agree that doing the deprecation via to_index is great, and that this PR is actually quite minimalistic in what it changes. Now that I've had a close look at the implementation, I have no objections to this being merged. But I also think we may want to take some additional steps.

I'm in the middle of creating a PR that changes scalar indexing to use ScalarIndex, currently typealiased to Real. We can encourage package authors to write their indexing in terms of ScalarIndex, and then during the 0.5 dev period switch it to typealias Integer. For any package that has already made the switch, they shouldn't see any ambiguity warnings.

I'm also trying to implement the fallback fancy indexing methods for a broader class of array types. This might not be very useful if 0.4 returns views from indexing, but to me that's beginning to look a bit unlikely, so perhaps we should at least fix up our current getindex behaviors.

It's an interesting idea to dispatch to functions not named getindex. We do some of that already, but indeed perhaps we should do more. EDIT: on the other hand, that's reminiscent of #10312, which in the end had a better solution.

@timholy
Copy link
Member

timholy commented Mar 11, 2015

@mbauman,

only requiring getindex(x, ::Int) for fast linear indexing arrays, or with the exact number of dimensions otherwise

definitely hit a chord with me---it seems like it might save a lot of duplicated code in trying to handle situations in which the number of indexes does not match the dimensionality of the array. I would be tempted to make the linear part optional, to be called if the user set the trait linearindexing(::MyArrayType) = LinearFast(). Otherwise, base indexing would "hand off" scalar indexing by filling out the dimensions with 1s or ind2sub as needed.

I don't think we have a good way to dispatch splats based on the number of arguments, and this little test:

f1(x...) = f2(x)

@noinline f2{N,T}(x::NTuple{N,T}) = N

function callf1(n, x...)
    s = 0
    for i = 1:n
        s += f1(x...)
    end
    s
end

callf1(1, 2, 5)
@allocated callf1(10^6, 2, 5)

does seem to suggest that life will not be all roses if we use a tuple as an intermediary. (At least, with our current tuples.)

So it almost seems we'd have to establish a new convention, for example that the new functions
getindexN and setindexN! get called only when the number of indexes matches the array dimensionality. People who write their own AbstractArray types might only have to define these two functions, plus size, ndims, and eltype, and yet have indexing that is as rich as that provided by Array, BitArray, and SubArray; they can of course more heavily overload getindex/setindex! when it makes sense.

Would introducing getindexN and setindexN! be reasonable? Or, is there another way to do this?

@jakebolewski
Copy link
Member Author

@timholy I thought that this was the right balance between what we want (constraining the index types to integers for AbstractArray subtypes in Base) but leaves the door open to broader design decisions you talked about.

I am skeptical as well to defining one catchall fallback for AbstractArray's. Ambiguity warnings will eventually be shifted to runtime so making design decisions based on current behavior does not seem to be the best way to go. Whatever default behavior lands in Base will be hugely breaking to change at a later date so I agree that caution is warranted here.

@timholy
Copy link
Member

timholy commented Mar 11, 2015

Yep, I'm fine with merging this; if you're ready, I say go for it. My not yet submitted PR is based on this one; I'll need to think a bit more about whether that PR is a direction we want to go in.

Ambiguity warnings will eventually be shifted to runtime so making design decisions based on current behavior does not seem to be the best way to go.

The ambiguity warnings that would arise here are not the "useless" category like Warning: I have no clue how to add a DataArray{Int} and Image{Int} (not a problem unless I intended to try that very operation). In this case it's a question of precedence of the container type (the AbstractArray) and the index type, and that's a genuine ambiguity that affects dispatch on everyday operations.

There was a "baby" version of this in #10314, where we had to delete a class of conflicting methods, filter(r::Regex, container), because we decided that the container needed to have precedence. @mbauman's suggestion is basically applying the same logic here---don't put any types on the indexes whatsoever. That way getindex is almost infinitely overloadable. I think the key question is whether we can provide some convenient services to package authors and users without typing the index arguments. It would appear that this requires dispatching to internal hidden functions, or one giant stagedfunction.

@mbauman
Copy link
Member

mbauman commented Aug 6, 2015

@ScottPJones moving the discussion here.

I know I was a pushing force for this, but I'm waffling on this a bit more. There are two reasons I liked this change.

  1. It makes errors of the sort A[end/2] much more obvious — you should most likely be using A[end÷2].
  2. I had been thinking it'd be nice to have a greater separation here so custom types could do something different when indexed by a float.

I still think 1 is a pretty good motivation. But I'm coming around to the fact that 42 and 42. are just too darn similar to have very different effects, even in a custom package. And if that's true, then the converse is as well.

@toivoh
Copy link
Contributor

toivoh commented Aug 7, 2015 via email

stevengj added a commit to stevengj/julia that referenced this pull request Nov 18, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Nov 19, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Nov 19, 2015
stevengj added a commit to stevengj/julia that referenced this pull request Nov 19, 2015
zhmz90 pushed a commit to zhmz90/julia that referenced this pull request Nov 21, 2015
stevengj added a commit that referenced this pull request Nov 29, 2015
@mbauman mbauman mentioned this pull request May 24, 2016
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we expect floating point indexing to be implemented?
5 participants