Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate (then remove) generalized linear indexing #14770

Closed
IainNZ opened this issue Jan 23, 2016 · 109 comments · Fixed by #23628
Closed

deprecate (then remove) generalized linear indexing #14770

IainNZ opened this issue Jan 23, 2016 · 109 comments · Fixed by #23628
Assignees
Labels
arrays [a, r, r, a, y, s] breaking This change will break code help wanted Indicates that a maintainer wants help on an issue or pull request
Milestone

Comments

@IainNZ
Copy link
Member

IainNZ commented Jan 23, 2016

I'm sure this has been discussed elsewhere, but maybe not recently or in a focussed/isolated fashion. If this shouldn't be an issue report, I'll move it to julia-users.

Essentially, I would say this behavior is surprising:

julia> x = reshape(1:3^3, (3,3,3));

julia> size(x)
(3,3,3)

julia> x[3,4]
12

This was a cause of a hard-to-find bug for me recently, where I accidentally forgot an index.
I suppose this is in some way a generalization of the fact that x[12] works, but I'm not really sure why or where that behavior is useful for 2D-or-higher indices. Is there some logic for the current behavior?

EDIT: most similar discussion: #5396

@kshyatt kshyatt added the arrays [a, r, r, a, y, s] label Jan 23, 2016
@timholy
Copy link
Member

timholy commented Jan 23, 2016

The last index is effectively interpreted as a linear index, if the dimensionality is insufficient. (More properly, for LinearFast AbstractArrays, everything gets converted to an linear index in the end, although there is intermediate bounds-checking.) Extra 1s are also dropped. It's been this way "forever," but I won't close this issue if you're hoping for reconsideration. Suffice it to say that I've sometimes found this behavior useful, but I also understand how it could be a source of bugs.

Relevant code:
LinearFast (converts all to a linear index):

unsafe_getindex(A, sub2ind(size(A), J...))

LinearSlow (expands to full dimensionality):

julia/base/abstractarray.jl

Lines 525 to 551 in 0c20e64

elseif N > AN
# Drop trailing ones
Isplat = Expr[:(I[$d]) for d = 1:AN]
Osplat = Expr[:(to_index(I[$d]) == 1) for d = AN+1:N]
quote
$(Expr(:meta, :inline, :propagate_inbounds))
(&)($(Osplat...)) || throw_boundserror(A, I)
getindex(A, $(Isplat...))
end
else
# Expand the last index into the appropriate number of indices
Isplat = Expr[:(I[$d]) for d = 1:N-1]
i = 0
for d=N:AN
push!(Isplat, :(s[$(i+=1)]))
end
sz = Expr(:tuple)
sz.args = Expr[:(size(A, $d)) for d=N:AN]
szcheck = Expr[:(size(A, $d) > 0) for d=N:AN]
quote
$(Expr(:meta, :inline, :propagate_inbounds))
# ind2sub requires all dimensions to be > 0:
(&)($(szcheck...)) || throw_boundserror(A, I)
s = ind2sub($sz, to_index(I[$N]))
getindex(A, $(Isplat...))
end
end

@eschnett
Copy link
Contributor

Whether the functionality is useful is not quite the same question as whether this should be the default behaviour. There could be a function linearindexing with this behaviour, or one could create a subarray...

@johnmyleswhite
Copy link
Member

FWIW, I tend to feel like this behavior is too odd to be the default.

@IainNZ
Copy link
Member Author

IainNZ commented Jan 23, 2016

I would say that I'm looking for reconsideration, yes. I think its got a high surprising-ness to it, relative to its (minimal-but-nonzero) utility. I guess my main initial goal was understanding how much utility it does have, and to whether "fixing it" would have nontrivial performance applications.

In terms of options for "fixing it", after I admit relatively little consideration, it doesn't seem like anything but requiring all dimensions would be consistent. e.g. consider

julia> x = reshape(1:3^3, (3,3,3));
julia> x[3,3]
9

In some ways I find that even more surprising than my other example (but would have found it returning x[3,3,:] less so).

@BobPortmann
Copy link
Contributor

I agree and feel strongly that it should be an error to index an array with fewer indices than the rank of the array. It just makes it too easy to introduce hard to find errors into code when using higher rank arrays (say 4-10 dimensional). It is easy enough to use reshape to make a copy free reference if one wants to use the trailing linear indexing feature (and this leaves the intent clear in the code). And one could easily write some helper functions to make trailing linear indexing easier to use. I would also note that other languages (e.g., IDL and Fortran) make this an error so the present behavior will be surprising to many.

The case of 1-D linear indexing is perhaps OK since indexing as a one dimensional array is visually distinct (compared with, e.g., indexing a 8-D array as 7-D). However, since it is so easy to use vec I am not sure even this is necessary.

This was discussed a bit in #4774 but I don't know how to link to specific comments and that is a long issue.

@rfourquet
Copy link
Member

@BobPortmann you can link to specific comments by right clicking on the timestamp (next to the author name in the header of the comment), and selecting "copy link location" or equivalent.

@toivoh
Copy link
Contributor

toivoh commented Jan 23, 2016 via email

@johnmyleswhite
Copy link
Member

I kind of love the idea of giving linear indexing a special method.

@timholy
Copy link
Member

timholy commented Jan 23, 2016

What about A[i, LinearIndex(j)]?

@johnmyleswhite
Copy link
Member

I'd be ok with that.

@JeffBezanson
Copy link
Member

+1 to dropping this behavior and having separate syntax of some kind for linear indexing. This behavior was just taken as the default choice since it's what some other environments do. I would describe it as an over-generalization of a premature optimization.

@StefanKarpinski
Copy link
Member

I would describe it as an over-generalization of a premature optimization.

Precisely. Let's ditch this for sure.

@tknopp
Copy link
Contributor

tknopp commented Jan 24, 2016

-1 for dropping the 1D version. Its very natural to loop over all elements in a multidimensional array using

for n=1:length(x)
  x[n] = ...
end

I know this can be achieve in other ways (e.g. enumerate) but the above is quite easy to remember.

@lobingera
Copy link

@JeffBezanson, i disagree with the statement about over-generalization of a premature optimization.

Linear indexing is the natural indexing scheme and anything providing multi dimensional access is actually some infrastructure to make it easier for the person writing code or design an algorithm.
Taking linear indexing away completely just feels wrong to me, introducing additional syntax (which means to me, i have to hit more keys on the REPL before getting output) seems to me additional burden; a[i] is just linear, a[i,j] is dimensional; where is the problem?

Now to the interesting problem that started this: Having a a[i,j] access when a is actually defined 3D. I was not aware that this existed, but actually this is something that i'd like to have in a language.

The initial problem for me (and i think the title of the issue is missleading) is rather in the area of a compiler warning or some code checking (like lint).

@toivoh
Copy link
Contributor

toivoh commented Jan 24, 2016

The way I see it, linear indexing is the natural indexing scheme in the same sense that machine code is the natural programming language: it is the fundamental one.

But Julia is in the business of providing abstractions. In fact, it provides abstractions such as subarrays, where linear indexing is not the natural indexing scheme.

I don't think that linear indexing should be removed completely, especially not for the array types where it is the fundamental scheme. But seeing as subarray performance is going to be increasingly important, it might make sense to make it a bit more cumbersome to use linear indexing, in order to nudge people into using eachindex etc to get good performance across a greater number of array types.

@eschnett
Copy link
Contributor

Just thinking out loudly: In addition to eachindex, there could also be an eachelement function that returns a reference to an array element, as in

for xn in eachelement(x)
    xn = ...
end

This reference could be something like a zero-dimensional subarray.

@timholy
Copy link
Member

timholy commented Jan 24, 2016

Wouldn't that have to be xn[] = ...?

To just access the value, we already have that from for x in X....

@eschnett
Copy link
Contributor

Right, should have been xn[].

Yes, this is about avoiding having to handle indices at all if one wants to iterate over a whole array, assigning to elements.

@toivoh
Copy link
Contributor

toivoh commented Jan 24, 2016 via email

@mbauman
Copy link
Member

mbauman commented Jan 24, 2016

👍 to deprecating this. I've called this "partial" linear indexing before — not sure where I picked up that terminology. It's been discussed in #13015 and #5396, and it's a bullet point that I listed as a possibility in #13157.

@JaredCrean2
Copy link
Contributor

That kind of iterator would be very useful for sparse matrices where accessing non-structural locations is not allowed (for example, PETSc matrices).

@lobingera
Copy link

i'm just wondering, how and why this was brought in, anyway? Someone found it helpful and i guess it was happening <v0.2.

Maybe i'm biased because my matlab coding includes very often linear indexing although the data is organized in 2or3D and i have a mental model to deal with it.

@KristofferC
Copy link
Member

Another -1 for dropping the 1D version for same reason as #14770 (comment).

Just keep the 1D version of linear indexing and remove the "partial" linear indexing imo.

@ViralBShah
Copy link
Member

I too would love to have the 1d linear indexing version and getting rid of the partial linear indexing. That is quite a sane solution.

@toivoh
Copy link
Contributor

toivoh commented Jan 25, 2016 via email

@lobingera
Copy link

@toivoh, No, i'm not for getting rid of this partial linear indexing.

@toivoh
Copy link
Contributor

toivoh commented Jan 25, 2016

That's why I said almost everyone :)
Is there anyone else who wants to keep partial linear indexing?

There is of course always a tension between catching bugs and convenience,
personally I just think that this would catch a lot more bugs than the
times the functionality would be needed. I don't even know when I would use
partial linear indexing.

@toivoh
Copy link
Contributor

toivoh commented Jan 25, 2016 via email

@tknopp
Copy link
Contributor

tknopp commented Jan 25, 2016

Well, although I do not want to vote in a particular direction, there are use cases and I even have uses it in some situations. Imagine you have some plotting tool that is capable of displaying slices within a 3D dataset c. So it will display

c[:,:,k]

where k is the slice number. Now I am confronted with 4D data (3D+time) and would like to reuse my existing infrastructure. Currently, when everything is properly duck typed it would be very simple to push a 4D dataset an allow k to be beyond size(c,3).

@StefanKarpinski StefanKarpinski removed the needs decision A decision on this change is needed label Sep 13, 2016
@mbauman
Copy link
Member

mbauman commented Dec 9, 2016

For what it's worth, the most straight-forward implementation here requires #18457 since it fixes a method sorting bug with Varargs of a bound length.

@StefanKarpinski
Copy link
Member

Since #18457 is almost ready (can we merge it yet?), let's wait for that and then do that.

@mbauman
Copy link
Member

mbauman commented Jan 18, 2017

An update is in order here. We've been talking about two completely different things in this thread, with vastly different ramifications:

  1. Deprecate the linearization of any dimension beyond the first. This is what I've done in RFC: Deprecate partial linear indexing #20079. Note that it still allows indexing into an N dimensional array with fewer than N indices. Once this change goes through, we will be able to simplify the lowering of A[i, end] to simply use size(A, 2) — and I think we can ditch Base.trailingsize altogether. Similarly, we'll no longer need to fuss with special cases for trailing :s beyond the single A[:] case. Notably, this only impacted test code that was specifically crafted to test this behavior.
  2. Deprecate indexing into N dimensional arrays with anything but 1 or N indices. This is what I attempted in WIP/RFH: Deprecate generalized linear indexing #20040, and it is massively disruptive because it removes the ability to index into vectors with trailing 1s. This change requires bending over backwards to ensure that we only define indexing with 1 or N dimensions. While I suppose we could theoretically deprecate indexing with 1 < n < N indices but keep trailing singleton dimensions, specifying that with dispatch would require support for a new signature: getindex{N}(A::AbstractArray{T,N} where T, I::Vararg{Int, N}, trailing_singletons::Int…).

I'm for the first change, but against the second.

@Sacha0
Copy link
Member

Sacha0 commented Jan 22, 2017

Re. indexing with trailing singletons: Though indexing with trailing singletons is sometimes convenient when writing code, that style is often confusing / obfuscating when reading code. Code being read more than written, the benefit of disallowing indexing with trailing singletons (clarity) seems worth the cost (minor convenience reduction).

Deprecating indexing with anything other than 1 or N dimensions received broad support in this thread. The primary obstacle to completing this deprecation seems to be the volume of work involved in removing indexing with trailing singletons from existing base code. Particularly, the linear algebra code seems to be the primary consumer of indexing with trailing singletons. (Reading the linear algebra code is what convinced me of the above clarity point :).) If that is correct, and there otherwise remains broad support for deprecating indexing with anything other than 1 or N dimensions, I would be happy to primarily shoulder the necessary linear algebra related work over the next release cycle. Best!

@mbauman
Copy link
Member

mbauman commented Jan 23, 2017

I'm not as certain on the consensus since the thread is a little meandering… and I know I have been imprecise in my language when talking about these two options. It'd be great to hear updated opinions now that we have two very concrete (and implemented) options. You can try them out! CC @andreasnoack, who is conspicuously absent here.

My hesitancy about only-1-or-N-indices isn't just that it requires a lot of changes to existing code, but that it is also more difficult to implement and enforce. Were this more consistently a net simplification (on either side), then I think I'd find it more attractive.

@BobPortmann
Copy link
Contributor

@mbauman Either of your 2 options above will be a big improvement but I think having stricter rules would be better (i.e., deprecating indexing with anything other than 1 or N dimensions). It is not clear to me why this would be "more difficult to implement and enforce". Seems naively that it would be easier, not harder. In any case, in my experience this distinction only becomes an issue when working with higher-dimensional arrays (say larger that 5 or so) where being off by one index is less visually distinctive and getting yelled at by the compiler really helps in the long run. I suppose the linear-algebra folks usually use lower dimensional arrays and thus have no issue.

@mlubin
Copy link
Member

mlubin commented Jan 23, 2017

I can't speak to the difficulty of implementation, but if we're prioritizing user experience, unintentionally providing the wrong number of indices is currently an unfortunate trap. We get complaints about this in JuMP: jump-dev/JuMP.jl#937

@JaredCrean2
Copy link
Contributor

I'm also in in favor of 1 or N. One of the other people in my lab spent nearly a week tracking down a bug caused by this behavior.

@mbauman
Copy link
Member

mbauman commented Jan 25, 2017

Moving milestone now that #20079 is merged.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 20, 2017

What remains to be done here? Now is probably a good time to take this out entirely and try to simplify all the array indexing infrastructure.

@mbauman
Copy link
Member

mbauman commented Jul 20, 2017

I need to spend some time to get #21750 through. Then we can talk about the next steps here. I'd be in favor of one final round of bounds check tightening here, such that we only allow omitted trailing dimensions when the size of those omitted dimensions are all 1.

@StefanKarpinski
Copy link
Member

With #21750 merged, I believe there are only a few small indexing cases that still need to be deprecated here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.