Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Drop dimensions indexed by scalars #13612

Merged
merged 2 commits into from
Nov 9, 2015
Merged

RFC: Drop dimensions indexed by scalars #13612

merged 2 commits into from
Nov 9, 2015

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Oct 14, 2015

This is the first step towards APL indexing semantics. It does not yet allow indexing with multiple multi-dimensional arrays (the new feature with the catchy "rank of the result is the sum of the rank of the indices" semantic); it simply drops all scalar dimensions (a major breaking change).

There is an interesting interaction between these array semantics and transposes that I didn't fully appreciate until looking at this more in-depth. The tricky part about APL indexing is that it loses the orientation of row vectors, and so we can no longer define complex matrix multiplication in terms of slices directly:

  • This makes A[1,:] * B[:,1] an illegal (n,) × (n,) operation.
  • Similarly, dot(A[1,:], B[:,1]) and A[1,:]' * B[:,1] erroneously conjugate the row.

Working with complex row slices becomes a little dicier. When do you conjugate? Or (c)transpose?

Fortunately, the behavior here is a strict superset of our current indexing rules. You can explicitly request that the shapes of the slices be maintained in both dimensions: A[1:1,:] * B[:,1:1] if you wish to use the result as a linear algebra construct.

@mbauman mbauman added breaking This change will break code linear algebra Linear algebra labels Oct 14, 2015
@mbauman
Copy link
Member Author

mbauman commented Oct 14, 2015

AppVeyor failure looks like #9572.

@@ -121,7 +121,10 @@ function A_ldiv_B!{T}(D::Diagonal{T}, V::AbstractMatrix{T})
if d == zero(T)
throw(SingularException(i))
end
V[i,:] *= inv(d)
d⁻¹ = inv(d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐨

@andreasnoack
Copy link
Member

@mbauman Would you rebase this one. I think we should consider merging this soon such that people will start using it.

@mbauman
Copy link
Member Author

mbauman commented Nov 4, 2015

Done.

@mschauer
Copy link
Contributor

mschauer commented Nov 5, 2015

Thanks for keeping it rolling here

@andreasnoack
Copy link
Member

Okay to merge this?

@mbauman
Copy link
Member Author

mbauman commented Nov 5, 2015

Sure, let's give it a shot.

@timholy
Copy link
Member

timholy commented Nov 6, 2015

Sorry I've not had time to review this. Looks fine to me. As you noted, it depends on a lot more splatting (esp in checksize), but all that is O(1) and not O(N) in the array elements. So I'm fine with seeing how it goes (and it might be further incentive to fix the splatting penalty), though if you've run any benchmarks it might be interesting to learn more about any performance hits.

andreasnoack added a commit that referenced this pull request Nov 9, 2015
RFC: Drop dimensions indexed by scalars
@andreasnoack andreasnoack merged commit 12dbaab into master Nov 9, 2015
@tkelman tkelman deleted the mb/drop-bear branch November 10, 2015 14:06
@yuyichao
Copy link
Contributor

Should we also drop the scalar dimension for SubArray (especially if we'd like to return SubArray as the result of indexing).

This breaks the equivalence between getindex and sub

julia> a = rand(2, 2)
2x2 Array{Float64,2}:
 0.377862   0.85667 
 0.0177678  0.695366

julia> a[2, 1:2] == sub(a, 2, 1:2)
false

@timholy
Copy link
Member

timholy commented Nov 16, 2015

slice already does that---essentially, now getindex is equivalent to slice rather than sub.

@mschauer
Copy link
Contributor

mschauer commented Jan 7, 2016

Are there plans to adapt mapslices to slice rather than to sub?

mbauman added a commit to mbauman/julia that referenced this pull request Jan 8, 2016
In JuliaLang#13612 I converted checksize from a generated function to a recursive lispy definition, but the methods are too complicated to be automatically inlined. Manually adding the inline annotation fixes this performance regression JuliaLang#14594. Master is now faster than 0.4.0 on most of the array perf tests.
@andreasnoack
Copy link
Member

I don't think it has been discussed. Can you explain further?

@mschauer
Copy link
Contributor

mschauer commented Jan 9, 2016

Currently, mapslices and special cased reductions over dimensions keep the reduced dimension as singleton dimension. As the indexing behavior now changed I thought mapslices and friends should follow soonish. This would mean changing the standard behavior of mapslices to dimension dropping as hinted at by

# TODO: maybe support removing dimensions

@timholy
Copy link
Member

timholy commented Jan 9, 2016

I'm not sure I agree that those two design decisions are coupled. Meaning, I don't think it's inconsistent for getindex to drop dimensions but reductions to retain them: they are different function calls, and reductions are obviously different from indexing.

It's so common to say

Xnorm = X ./ maximum(X, 2)

that it would be a shame to make that any harder. (And dropping dimensions definitely does make that harder.) When you want to drop the dimensions, it's easier than ever to do so.

@StefanKarpinski
Copy link
Member

I agree – I would prefer that reductions keep reduced dimensions for exactly this reason.

@mschauer
Copy link
Contributor

I kind of agree for the monadic case, for a dyadic mapslices function the current scheme does not translate well. I now recall why I noted this, I have even RFC #13317 but I forgot about it. Maybe someone could comment there? Some feedback would be appreciated.

@stevengj
Copy link
Member

stevengj commented Aug 2, 2016

I think this should be moved to the "Breaking Changes" part of the NEWS file. This just caused silently incorrect results in one of my packages (Hadamard.jl)

tkelman added a commit that referenced this pull request Aug 14, 2016
that have non-Int indices, introduced by #13612
and which has caused `make -C test/perf` to fail for the last 9 months

searchsortedfirst does not have methods for general Integer indices
KristofferC pushed a commit that referenced this pull request Aug 17, 2016
that have non-Int indices, introduced by #13612
and which has caused `make -C test/perf` to fail for the last 9 months

searchsortedfirst does not have methods for general Integer indices
@davidavdav
Copy link
Contributor

Hello,

For NamedArrays I need to copy the exact behaviour of the dropping of dimensions (for julia-0.4 and 0.5). The AbstractArray contained in a n::NamedArray is sliced by calling getindex(n.array, index...)---so there I get the same behaviour automatically. But for the names of the dimensions in n, I need to know exactly which types of index... I need to keep.

Is there a Base function that will tell me this? I was looking at Base.index_shape but that isn't defined for all index types (e.g., CartesianIndex). Currently I use a helper function

dimkeepingtype(x) = false
dimkeepingtype(x::Vector) = true
dimkeepingtype(x::Range) = true
dimkeepingtype(x::BitVector) = true

that tells me if an index type x (∈ index) is non-scalar or not. I am not sure this covers all index types in the same way that AbstractArray keeps dimensions, so I'd rather refer similar functionality in Base.

@timholy
Copy link
Member

timholy commented Aug 19, 2016

Can't you add the missing index_shape_dim method(s) to NamedArrays? For julia-0.5 we could add such a method to Base, though---probably a good idea.

tkelman added a commit that referenced this pull request Aug 20, 2016
that have non-Int indices, introduced by #13612
and which has caused `make -C test/perf` to fail for the last 9 months

searchsortedfirst does not have methods for general Integer indices

(cherry picked from commit b1d5321)
ref #18022
@davidavdav
Copy link
Contributor

Thanks. But I do not really understand the logic in index_shape_dim---it seems a recursive helper function for index_shape. It doesn't look like the way towards telling me which dimensions are kept in slices. I need these to know which dimension names I need to select, and from which dimensions I have to slice the names of the indices in the NamedArray.

Somewhat related, I saw that in Julia-0.5 the dimension of a sliced array is the sum of the dimensions of the indices. That is great, but I don't really know yet how to come up with dimension and index names, with cases like NamedArray[Array] and NamedArray[NamedArray]. Does anybody know of a precedence for this case? I only know of R named arrays, but R doesn't have this fancy indexing scheme.

@mbauman
Copy link
Member Author

mbauman commented Aug 21, 2016

I don't know about precedence in other languages, but perhaps you can take some inspiration from AxisArrays:

  • AxisArray[Array1, Array2] returns an AxisArray with axis names row_1, row_2, …, row_$(ndims(Array1)), and col_1, col_2, …, col_$(ndims(Array2)). The left-hand side of the _ comes from the names of the parent AxisArray.
  • AxisArray[AxisArray] is similar, except instead of using numbers on the right-hand side of the _, it uses the axis names of the index. This is where the axis names time_sub and time_rep come from on the README example.

@davidavdav
Copy link
Contributor

Thanks, that is indeed a useful way of naming the dimensions.

For the names of the indexes along the dimensions, though, things are more complicated. I think I will opt for copying the names of the indexes of the indexing array (if this has ndims() > 1), or defaulting to 1..size(index) if the indexing array itself doesn't have names.

Further challenges will occur when indexing a NamedArray with an AxisArray or vice versa. I don't even want to think about this at this moment...

mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016
mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016
that have non-Int indices, introduced by JuliaLang#13612
and which has caused `make -C test/perf` to fail for the last 9 months

searchsortedfirst does not have methods for general Integer indices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants