Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed tensor-like objects as higher-dimensional objects with trailing singleton dimensions #3262

Closed
lsorber opened this issue May 31, 2013 · 46 comments
Labels
arrays [a, r, r, a, y, s] needs decision A decision on this change is needed
Milestone

Comments

@lsorber
Copy link

lsorber commented May 31, 2013

There have been/are a multitude of issues and discussions on dimensionality compatibility between Arrays: #113, #141, #142, #231, #721, #1953, #2472, #2686, #2936, #3253, julia-dev threads one, two, and probably many others.

The core problem is that Julia's rules for interacting with Arrays of different dimensions are just too difficult. In the following, let A be a Matrix and x be a Vector. For example, indexing can drop dimensions (e.g., A[:,i] is a Vector), yet reduction operators such as sum, min, std do not (e.g., sum(A,2) is a Matrix). Numbers are broadcasted, yet length 1 Vectors and Matrices are not (e.g., A+5 works, but A+x'*x does not). A solution to the latter example is to use A+dot(x,x), but this is neither intuitive nor convenient. By now it is clear that the current situation is far from ideal. What Julia needs is a small and consistent set of rules for interacting with multidimensional arrays. Below I propose such a list, but I certainly do not wish to imply that this is a best solution.

First, I am of the opinion that Julia's decision to have each object have a dimensionality (a number of dimensions) is the correct one. In MATLAB, trailing singleton dimensions are ignored, yet there are many cases in which it is important to consider how many dimensions an Array has. So let's assume that each object should definitely keep track of its dimensionality.

Second, let's consider broadcasting. In #1953, it was suggested that making the + and - operators broadcast for things other than Numbers is "a little too magical". I agree with this statement and think that .+ and .- are a good solution (so that + and - can throw dimension mismatch errors). As a side note, there are many other operators that would also benefit from broadcasting such as mod, rem, max and atan2 (in general, most binary functions). The problem here is that both randn() and randn(1) are scalars, yet only the former is a Number. This is why A+randn() works, but A+randn(1) and even A+x'*x do not (but should!).

Third, trailing singleton dimensions are ignored when adding or subtracting arrays, promoting the return type of the result to the maximal dimensionality of the two arguments. This is behaviour a user would expect, and I would like to expand on this idea per #2686. If ones(3) + ones(3)'' works, then ones(3) == ones(3)'' should also be true.

Here is a small set of rules which hopefully represents the behaviour we desire:

  1. Arrays have a dimensionality. I.e., the result of ndims does not depend on the number of trailing singleton dimensions (unlike in MATLAB).
  2. Operating on Arrays can not decrease the dimensionality. The dimensionality of a broadcasted result is the max of the two dimensionalities. E.g., the result of indexing operators and reduction operators (such as sum) do not change the the number of dimensions. Another example is that dot(x,x) and x'*x should both have size 1 x 1. [*]
  3. Arrays have (pseudo)supertypes which ignore trailing singleton dimensions:
    • Scalar-like: a Number or 1 Vector or 1x1 Matrix or 1x1x1 Tensor or ...
    • Vector-like: an N Vector or Nx1 Matrix or Nx1x1 Tensor or ...
    • Matrix-like: an NxM Matrix or NxMx1 Tensor or ...
    • ND-Tensor-like: an I1x...xIN Tensor or an I1x...xINx1 Tensor or ...
  4. Broadcasting is enabled for scalar-likes. Operations on vector- matrix- and tensor-likes can be broadcasted with the elementwise version of the operator (e.g., .+ and .-). [**]
  5. *Trailing singleton broadcasting is enabled for all -likes. Only trailing singleton dimensions are broadcasted by default for _-likes. Any admissible operation on *-likes returns an Array with a dimensionality equal to the max of the two dimensionalities, but ignores these trailing singleton dimensions in the actual operation. E.g., ones(3)'_ones(3,1,1)is a 1 x 1 x 1 scalar-like andones(3) == ones(3)''istrue.

[] The only exception is if a function explicitly removes dimensions, such as squeeze.
[
*] The only exception is that broadcasting is not enabled for equality testing with scalar-likes, e.g., ones(3,3) == 1 should be false.

For consistency, I would not even mind going so far as to remove rule 4, requiring people to write A.+5 instead of A+5. The additional required effort is small, and the former makes it clear that the developer is explicitly requesting broadcasting.

@johnmyleswhite
Copy link
Member

First off, I entirely agree that there are many worrisome quirks (like sum(A, 2)) in how Julia handles Array's. And I think we need some simple rules for specifying when broadcasting should happen.

But I'm a little unsure if I think we should adopt broadcasting of scalar-like arrays. This comes close to violating one of the design principles of Julia: the types of a method's outputs should be uniquely determined by the types of its inputs. While allowing something like A + x'' * x doesn't literally violate this requirement, it leads to the perhaps puzzling situation in which inputs of certain sizes correctly produce outputs, while others produce dimension mismatch errors. That already does happen with Array's, so it's not a problem per se. But it does mean that Julia could become less intuitive rather than more intuitive.

@lsorber
Copy link
Author

lsorber commented May 31, 2013

@johnmyleswhite Julia already broadcasts Numbers, rule 4 just extends this to scalar-likes. The only reason I included rule 4 was for consistency with MATLAB, where A+x'*x does not throw an error. I would prefer to remove rule 4, but I'm not against keeping it to so that MATLAB users feel more comfortable. In any case, I don't want to jump through hoops to compute A+x'*x.

@johnmyleswhite
Copy link
Member

By hoops do you mean dot(x, x) rather than x' * x? While I agree it takes a while to get used to, changing it will require a lot of changes throughout the whole language. For example, assignments currently fail:

julia> x = ones(3)
3-element Float64 Array:
 1.0
 1.0
 1.0

julia> y = ones(3)
3-element Float64 Array:
 1.0
 1.0
 1.0

julia> y[1] = x' * x
ERROR: no method convert(Type{Float64},Array{Float64,1})
 in setindex! at array.jl:394

julia> y[2] = dot(x, x)
3.0

I could see an argument for changing that as well, but I'm not sure where things stop before we're universally treating scalar-like tensors as scalars.

@lsorber
Copy link
Author

lsorber commented May 31, 2013

@johnmyleswhite It's not an easy decision, but the sooner a small and consistent set of rules for working with multidimensional arrays is thought out, the better. In the proposal above, scalar-likes are indeed treated as scalars, except that they have a dimensionality. A Number would be a zero-dimensional scalar-like, a 1-Vector would be a 1D scalar-like and so on. We should be able to do things like y[1] = x'*x. The dimensionality of an Array should not change depending on which function you are calling (dot changes dimensionality, yet sum doesn't).

@StefanKarpinski
Copy link
Member

It's definitely a non-starter to consider 1-element vectors or 1x1 matrices to be "scalar-like". Multiplying a 10x10 matrix by a scalar is fine but multiplying a 10x10 matrix by a 1x1 matrix is and should be a dimension mismatch error. Likewise, rand(1) is not a scalar at all, it's a single-element vector – they're very different. Making dot(v,w) return a 1x1 matrix – or anything besides a scalar, for that matter – is just wrong. It's the scalar product, after all. In general, I'm unclear on how the shape of what sum returns has any relation to what shape dot should return.

The fact that M[:,1] produces a vector while sum(M,2) produces a matrix may be a slight annoyance, but I'm actually not entirely clear on sure why slicing and reducing should necessarily produce the same shape. Since sum has to produce a matrix in the sum(M,1) case, it also has to in the sum(M,2) case. Thus the only way to make these match would be to make M[:,1] produce a column matrix instead of a vector. So all tensor slices would have the same number of dimensions as the tensor being sliced. I could maybe get on board with that, but it seems potentially pretty annoying, and I really don't know that M[:,1] and sum(M,2) having different shapes is all that big of a problem.

There are only two significant problems above as I see it, both involving transposes of vectors:

  1. v'' != v
  2. v'*v is a vector rather than a scalar

The first issue could be addressed by ignoring trailing singleton dimensions when comparing tensors for numeric equality (== as compared to isequal, which could remain stricter). That, however, doesn't address the second issue and may or may not be a good idea in general.

Introducing transposed tensor types solves both problems:

  1. v' would be a Transposed{Vector} and v'' would be of the same type as v.
  2. define Transposed{Vector}*Vector to return a scalar and thus be the same as dot(v,v).

This would also allow transpose to generalize to tensors by simply reversing the order of indices. The main reason we haven't done this is that I rather want the transpose to be lazy (i.e. share data with the original vector) and that introduce a number of issues.

@lsorber
Copy link
Author

lsorber commented May 31, 2013

@StefanKarpinski

It's definitely a non-starter to consider 1-element vectors or 1x1 matrices to be "scalar-like". Multiplying a 10x10 matrix by a scalar is fine but multiplying a 10x10 matrix by a 1x1 matrix is and should be a dimension mismatch error.

This is a question of whether scalar-likes should be broadcasted, not whether Numbers or 1-Vectors should be scalar-likes. I would also prefer not to do broadcasting for scalar-likes, so that A*randn(1,1) would throw an error but A.*randn(1,1) would not. Removing rule 4 would solve this problem.

Making dot(v,w) return a 1x1 matrix – or anything besides a scalar, for that matter – is just wrong.

In MATLAB, dot can be used to compute the dot product along a dimension. To me, it is not so clear that the dimensionality of the result should depend on the dimensionality of the function's arguments. If A and B are 3D tensors, dot(A,B,3) would presumably be a Matrix in your view, yet if they are 4D the result would still be 4D. Why disregard trailing singleton dimensions for dot, but not for sum?

@StefanKarpinski
Copy link
Member

I'm not sure what calling 1x1 matrices "scalar-like" means if it doesn't mean that they can be used like scalars. Regardless of whether you call a 1x1 matrix scalar-like or not, broadcasting is unproblematic: you broadcast along singleton dimensions and absent trailing dimensions and other dimension sizes must match exactly. When you broadcast two arrays, a and b, the result has the following shape:

map(i->max(size(a,i),size(b,i)), 1:max(ndims(a),ndims(b)))

@toivoh's new broadcasting code does this correctly and efficiently. In particular, multiplication acts as you propose:

julia> rand(5,5)*rand(1,1)
ERROR: *: argument shapes do not match
 in gemm_wrapper at linalg/matmul.jl:289
 in gemm_wrapper at linalg/matmul.jl:280
 in * at linalg/matmul.jl:94

julia> rand(5,5).*rand(1,1)
5x5 Float64 Array:
 0.633873   0.0593671  0.545569  0.438291  0.00419443
 0.590569   0.450254   0.360447  0.415297  0.200254  
 0.461908   0.633632   0.406851  0.432888  0.027472  
 0.109804   0.037443   0.298195  0.428459  0.455538  
 0.0525897  0.554127   0.389851  0.069876  0.427595  

Regardless of whatever other functionality it may have, I know that one thing should be true of the dot product: dot(v,w) where v and w are both vectors is a scalar. How to generalize this operation to higher dimensions is an open matter, but any proposal that starts by suggesting that dot(v,w) should be a matrix is not going to go anywhere. There are no trailing dimensions being discarded in the case of the dot product – it is simply defined to be the sum of the pairwise products of the elements of its vector arguments. I can even see defining dot(a,b) = sum(a.*b) for general tensors. But again, that's an open question because it's pretty unclear what the best generalization of the dot product is.

@StefanKarpinski
Copy link
Member

The other issue that's come up here is that A[1] = [2] doesn't work. It could be made to do so quite easily – of course, it should only work when the RHS a tensor with unit length (i.e. containing only a single value).

@lsorber
Copy link
Author

lsorber commented May 31, 2013

@StefanKarpinski

I'm not sure what calling 1x1 matrices "scalar-like" means if it doesn't mean that they can be used like scalars.

I see a scalar-like as something which has a number of dimensions but behaves as a scalar.

Regardless of whatever other functionality it may have, I know that one thing should be true of the dot product: dot(v,w) where v and w are both vectors is a scalar

The fact that dot can be used to compute the scalar product of two vectors does not mean that other use cases should just take the back seat. I would argue that dot is used just as often to contract two tensors along a given dimension. This just happens to be a scalar if the arguments are two vectors, just as x^T_x happens to be a scalar if x is a vector. In fact, for many people the contraction along a dimension is the reason to even have a dot function in the first place, otherwise we would just do x'_x as anyone who has used MATLAB would be used to. While I do not agree that the result should definitely be a Number in the case of two vectors, I do agree that the result should be treated as one where it makes sense.

The result of the following operations are all mathematically equivalent for a vector x:

  • x^T_\vec{1}: x'_ones(size(x))
  • \sum_i x_i: sum(x) and the still different sum(x,1)
  • <x,\vec{1}>: dot(x,ones(size(x)))

It is just wrong that the return types of these four operations differ and that some of them can't be used as scalars. What I would like is a uniform return type dimensionality, and whatever this dimensionality may be, that the result can transparently be used as a scalar. The way things are now, I honestly prefer MATLAB's simple yet crude solution of removing trailing singleton dimensions.

The problem is not limited to scalars. Analogously, I can not pass x = sum(A,2) as an argument to the function foo(x::Vector) because x is a Matrix. If x were treated as a vector-like, this problem would have been solved.

@JeffBezanson
Copy link
Member

I can see that an array of shape (n,) is perhaps equal to one of shape (n,1) the same way that 2 and 2.0 are equal.

Now, IIUC mathematics does not recognize "type differences" as in "integer 2" vs. "float 2". The types are things we computer scientists put on things for our own purposes. So from a mathematical standpoint, one cannot complain about the types because they don't exist. However, one can complain about the results of functions such as 2==2.0.

So it seems this is about allowing the scalars to be embedded in the set of matrices as 1x1 matrices the same way the reals are embedded in the complexes. Is that a fair description? This is consistent with rand().*rand(2,2) giving the same answer as rand(1,1).*rand(2,2), which indeed we now have, so we are on our way. We could also add conversions from 2D Array to 1D Array (etc.) that check size(A,2)==1, the same way that conversions from Float to Int check that the number is in fact an integer. And so on.

@lsorber
Copy link
Author

lsorber commented Jun 3, 2013

@JeffBezanson Yes, the embedding analogy is a good way of describing the problem. I'm not sure conversion is the solution though: if I have foo(x::Vector) and foo(x::Matrix), what will happen when I run foo(sum(randn(5,5),2))?

@lsorber
Copy link
Author

lsorber commented Jun 3, 2013

Could be of interest: the Tensor Toolbox for MATLAB explicitly stores the tensor's dimensionality, like Julia. From their TOMS paper:

A scalar is a zeroth-order tensor. An n-vector is a first-order tensor of size n. An m × n matrix is a second-order tensor of size m × n. Of course, a single number could be a scalar, a 1-vector, a 1 × 1 matrix, etc. Similarly, an n-vector could be viewed as an n × 1 matrix, or an m × n matrix could be viewed as a m × n × 1 tensor. It depends on the context, and our tensor class explicitly tracks the context, as described in Section 2.2.

Section 2.2 says:

Out of necessity, the tensor class handles sizes differently than MATLAB arrays. Every MATLAB array has at least two dimensions; for example, a scalar is an object of size 1 × 1 and a column vector is an object of size n × 1. On the other hand, MATLAB drops trailing singleton dimensions for any object of order greater than two. Thus, a 4 × 3 × 1 object has a reported size of 4 × 3.

The paper doesn't say why explicitly storing the dimensionality is a necessity. In other words, why not just drop trailing singleton dimensions, as MATLAB does (or more precisely, have an implicit infinite number of trailing singleton dimensions)? Speaking from my experience as lead developer of Tensorlab, also a MATLAB toolbox for tensors, I don't mind MATLAB's convention so much -- it has never been a hindrance in the development of Tensorlab. There is only one operation which comes to mind right now where the actual number of trailing singleton dimensions matters (and is perhaps the reason the authors of the Tensor Toolbox had in mind): the outer product. If A and B are two n x n matrices, their outer product is an n x n x n x n tensor. If A and B are two n x n x 1 tensors, their outer product should be an n x n x 1 x n x n x 1 tensor. An optimal solution would, at least in my view, keep track of the dimensionality of tensors, yet treat things that look like scalars as scalars, and things that look like vectors as vectors, and so on. The former Julia already does, the latter implies disregarding trailing singleton dimensions for most operations.

@wlbksy
Copy link
Contributor

wlbksy commented Jun 3, 2013

@StefanKarpinski I'm curious of the issues by introducing transposed vector,what would they be?

Could we make vector a type containing both value and direction fields? Direction field default to be column vector and has an alternative value of row vector. Would this solve the lazy problem by having the independant value field?

@StefanKarpinski
Copy link
Member

@lsorber, from the way you're talking about treating things as scalar-like or vector-like, it seems that you may be thinking of some kind of automatic conversion or identification of types in Julia's type system. This isn't something Julia does or will do. We had initially considered it – in particular identifying zero-tensors with scalars – but concluded that it's not a good idea for a variety of reasons, and that decision has turned out to be a sound one. Thus, zero-tensors, while they can be conceptually identified with scalar numbers, are not the same thing in Julia. Rather, they are just made to behave like scalars wherever possible, but they still have a different type and different representation. Similarly, we are never going to have foo(x::Vector) automatically apply then x is anything else other than an object of type Array{T,1} for some T. This is part of a general anti-magic philosophy in Julia: things don't happen automatically, they only ever happen because you asked for them to. While that may seem a bit limiting, Julia's dispatch system is sufficiently powerful that it's usually easy to add fallbacks that seem like magic. For example, it's easy to say something like foo(x) = foo(convert(Vector,x)) and then allow the conversion mechanism to handle attempting to turn various objects into a vectors for you.

[Taking a step back, the identification of vectors with column matrices is not a unique mathematical phenomenon – the integers are embedded in the rationals, which are embedded in the reals, which are embedded in the complex numbers. If you pay attention to their construction, however, that's really a sleight of hand. The integers can't really be a subset of the rationals because the very definition of the rationals depends on the existence of integers. What's really happening is that we identify the integers with a particular subset of rational numbers which happen to look and act just like the integers. The human brain is so good at dealing with ambiguous situations that this issue never really comes up – it just works. The way Julia handles this is similar: there is a subset of the Rationals that generally behave a lot like the Integers; however, we only identify them behaviorally, we do not try to make them actually the same type.]

@StefanKarpinski
Copy link
Member

@wlbksy, the main issue is whether to copy the data or not. However, there's also the issue that we may want to just do something more general: #3224. I'm not sure what the full design looks like, but I suspect that transposed vectors might be subsumed by some kind of generalized array view type.

@JeffBezanson
Copy link
Member

I can accept that more operations should tolerate trailing singleton dimensions. Yes, conversion is not a "solution", just another example of the embedding. Although in some cases it can be a solution by adding definitions, as Stefan points out.

The mere existence of both foo(x::Vector) and foo(x::Matrix) is not enough to cause a problem. For example the matrix version could call the vector version in the Nx1 case, or the other way around. Or one of them might return an array of shape N, and the other Nx1, which is fine if you consider them mathematically equal. We do this kind of thing all the time, for example in the large number of definitions for matrix multiply. There are many cases, but they're all matrix multiply.

@ggggggggg
Copy link
Contributor

I don't understand why one would try to handle only trailing singleton dimensions. Perhaps there is something I'm missing, but take

a = zeros((4,4,4,4))
a[1,:,1,:] = ones((4,4))

for example. This is an assignment that currently doesn't work, and wouldn't work when handling only trailing singletons, yet it is unambiguous in intent and the array sizes differ only by singletons.

@JeffBezanson
Copy link
Member

That is a separate issue (which shapes are compatible for assignment) covered by #4048. This issue is about a formal embedding; i.e. the sense in which a 4x4 array and a 4x4x1 array are equivalent. I don't think anybody considers a 4x4 array and a 4x1x4 array equivalent in a general sense.

@wenxgwen
Copy link

Now I see the reason why Julia treats x'*x as a 1x1 Array but not as a scaler.

Let A be a matrix and x be a vector (column vector). If we treat x'*x as a scaler, A + x'*x is OK but A*x'*x is not OK, since (A*x')*x and A*(x'*x) is different. so it is a good idea that x'*x is a 1x1 matrix but dot(x,x) is scaler.

@wenxgwen
Copy link

Let A has a size n x n x n. I feel that A[1,:,:] should have a size n x n, while A[1:1,:,:] should have a size 1 x n x n. This is much more consistent and uniform (since 1:1 is a range while 1 is not a range).

Currently, A[1,:,:] has a size 1 x n x n in Julia. This is not intuitive and is inconsistent (since currently in Julia, A[1,1,1] is a scaler not a 1 x 1 x 1 Array).

I think A[1,1,1] being a scaler is the right behaviour, while A[1,:,:] being an Array of size 1 x n x n is not the right behaviour.

@pao
Copy link
Member

pao commented Nov 14, 2013

Currently, A[1,:,:] has a size 1 x n x n in Julia. This is not intuitive...

Intuition is subjective--if I slice the top layer off a cube (three-dimensional array) and don't rotate it, it has the appearance (shape) of a layer of a cube which still has depth, not the front face of the cube (a matrix)

@hayd
Copy link
Member

hayd commented Dec 17, 2013

@wenxgwen I also think that is inconsistent. In numpy these would be different shapes:

In [11]: A[1, :, :].shape
Out[11]: (2, 2)

In [12]: A[1:2, :, :].shape
Out[12]: (1, 2, 2)

@pao I think it's hard to argue that this is intuitive, if the range is the last argument they have different shapes!

julia> A[:,1,1:1]
2x1x1 Array{Float64,3}:

julia> A[:,1,1]
2-element Array{Float64,1}:

julia> A[1,1,:]
1x1x2 Array{Float64,3}:

julia> A[1:1,1,:]
1x1x2 Array{Float64,3}:

@pao
Copy link
Member

pao commented Dec 18, 2013

I'm not arguing that Julia's doing the right thing, or doing it consistently, just that I don't believe that "intuition" is useful guidance here.

@BobPortmann
Copy link
Contributor

I think that this is one of the most critical issues to be decided before julia gets too far along (certainly well before 1.0). I think some of present rules are overly magic. I have slowly come to the belief that a simpler array indexing system is preferable.

I would propose the following 4 rules for array indexing, reduction operations, and broadcasting:

  1. Dimensions indexed with with a scalar are dropped and those indexed with arrays or ranges are not. This is in line with what several other people are suggesting, but not what Julia presently does (see drop dimensions indexed with a scalar? #5949 as well). Note that this rule holds for tailing singleton dimensions as well as interior ones (ie, trailing dimensions that are indexed with a one-element array or range are not dropped). The implicit dropping of trailing singleton dimensions can be a real pain when you are doing serious dimension juggling.
  2. In general the element-wise array operators should only work if the ranks and dimensions of both arrays are equal (and singleton dimensions count in this comparison). Broadcasting singleton dimensions is an exception to this (see below). This rule makes reasoning about code simpler and reduces the edge cases one must guard against. This rule applies to the assignment operator with the exception that treating a higher rank array as one dimensional in memory order in addition to its strided version is allowed (Julia already does this and its too handy not to have).
  3. Reduction functions should reduce the rank of the array by default. That's why they are called reduction functions. If you leave the singleton dimension and don't want to do a broadcasting operation then you have to squeeze it out anyway. Reduction functions should have a kw flag to leave the singleton dimension (see below also). I strongly believe that if you agree with rule 1 then rule 3 follows.
  4. Broadcasting singleton dimensions work similar to how they work now, with the important exception that trailing singletons are not special cased (i.e., they must be explicitly present just like interior singletons for broadcasting to take place). The broadcasting of implicit trailing singletons is the thing I most dislike about julia.

I believe the above system would make it easier to reason about code and be less error prone then the present system. Since it makes it easier to drop dimensions, uses of squeeze would go drastically down. However, the need to add dimensions will go up and so a simpler system for adding singleton dimensions would be helpful. There are array languages that have syntax for this, e.g. pseudo-indexes in Yoric that uses '-' for this (but '!', '*', or '$' may be better choices). Thus:

a = reshape(1:12, 4, 3)

size(a[:,:])    # (4,3)
size(a[:,-,:])    # (4,1,3)
size(a[:,-,:,-])    # (4,1,3,1)

This syntax would make broadcasting operations easy again and would make the intent clearer than the present system. Let me know if I should move this comment into its own issue since it addresses possible new syntax and goes beyond the present issue.

@StefanKarpinski
Copy link
Member

Any proposal of dropping singleton slices really needs to address @toivoh's associativity problem, which no one has yet done.

@malmaud
Copy link
Contributor

malmaud commented Feb 25, 2014

@StefanKarpinski Are you addressing @BobPortmann? I'm not sure I understand the connection between dropping dimensions with integer indexes and reduction operations (which I fully agree with) and the dimensionality that results from matrix multiplication.

@timholy What was your objection to reduction functions dropping a dimensions? I know you've expressed skepticism about that, but I'm not sure why. I have a lot of code of the form sqeeze(f(x,n), n) that I would love to eliminate. If it's about making broadcasting easier, numpy semantics allow for things like

x=randn(3,5) 
col_means = mean(x,0)
z = x-col_means

where col_means is a one-dimensional array. That seems like the best of all worlds.

@timholy
Copy link
Member

timholy commented Feb 25, 2014

Like #5949, this isn't a proposal to drop all singleton slices, merely a proposal to use scalar/range to encode drop/not drop. I.e, A[3,:] drops and A[3:3,:] doesn't. So it's a bit orthogonal to @toivoh's point (which is excellent, as always).

@BobPortmann, perhaps there's a more technical meaning to "reduction" than I'm aware of, but I'd assert that "reduce" can apply at least as well to the size of an array (I've "reduced the amount of data in the array") as to its dimensionality. Personally, I'd prefer to keep the dimensions lined up when one does reductions.

@timholy
Copy link
Member

timholy commented Feb 25, 2014

@malmaud, what does mean(x,0) do? (What in particular is the 0 for?)

Currently both X .- mean(X,1) and X .- mean(X,2) work with no fanfare. For 2d X, the first mean gives you a row vector, the second mean gives you a column vector. At least in my own code, this kind of thing is far more common than squeeze. Inserting - at the right spot in a list of colons might work fine if the dimension you're reducing along is hardcoded (3), but when it's stored in a variable this seems like it would be a major pain in the neck. Compare:

m = mean(X, dim)
msq = squeeze(m, dim)

to

m = mean_drop_singletons(X, dim)
indexes = Any[colon() for i = 1:ndims(m)]
insert!(indexes, dim, '-')
mnosq = m[indexes...]

@BobPortmann
Copy link
Contributor

@timholy The opposite of

msq = squeeze(m, dim)

would be

msd = add_unit_dim(m, dim)   # or maybe add_dim/adddim

which would do the right thing. I don't see how it is any harder. And if there was

msq = mean(X, dim, unit_dim=true)

you wouldn't need it.

@JeffBezanson
Copy link
Member

What worries me about dropping dimensions all the time is that dimensions
can have meanings. An RGB image might be NxNx3, and after selecting one row
or column it should still have 2 spatial and 1 color dimension(s).
On Feb 25, 2014 6:04 PM, "Tim Holy" [email protected] wrote:

@malmaud https://github.com/malmaud, what does mean(x,0) do? (What in
particular is the 0 for?)

Currently both X .- mean(X,1) and X .- mean(X,2) work with no fanfare.
For 2d X, the first mean gives you a row vector, the second mean gives
you a column vector. At least in my own code, this kind of thing is far
more common than squeeze. Inserting - at the right spot in a list of
colons might work fine if the dimension you're reducing along is hardcoded (
3), but when it's stored in a variable this seems like it would be a
major pain in the neck. Compare:

m = mean(X, dim)
msq = squeeze(m, dim)

to

m = mean_drop_singletons(X, dim)
indexes = Any[colon() for i = 1:ndims(m)]
insert!(indexes, dim, '-')
mnosq = m[indexes...]

Reply to this email directly or view it on GitHubhttps://github.com//issues/3262#issuecomment-36070243
.

@lindahua
Copy link
Contributor

As I argued in #5949, dropping dimensions in reduction would make it particularly difficult to write things like x .- mean(x, dim). I think @timholy agrees with this, and his argument above is convincing. The issue of whether to drop dimension in a[3, :] is not as obvious.

When comes to indexed views, I am not completely sure why the current way is less desirable than @BobPortmann's proposal. I think it is consistent and easy to understand, and just works in many practical cases without using extra functions like squeeze and add_new_dimension, etc.

@malmaud
Copy link
Contributor

malmaud commented Feb 25, 2014

The '0' is the dimension of the reduction (since python is 0-based, the first dimension). My point was just that in python, x - mean(x,0) (which would be x.-mean(x,1) in julia) works just as it works in Julia, but the python version has the added benefit that mean(x,0) is a 1-dimensional vector, which I think is generally preferable that having mean(x,n) always return a sometimes-annoying singleton dimension.

edit:
My point here is exaggerated, because the python example only works here since it's a column-wise operation. The equivalent row-operation x-mean(x,1) would not work in Python, while the Julia x.-mean(x,2) does.

@lindahua
Copy link
Contributor

If we drop the dimension in the first place only to find that we have to add it back in order for the ensuring interactions with other arrays to work properly, then there is a question why we would want to drop the dimension initially.

@BobPortmann
Copy link
Contributor

@StefanKarpinski It seems to me that the associativtiy problem you reference is different. I think that if you want v'M*w to work with v,w vectors and M matrix then there needs to be a new transposed vector type, as @toivoh says in that post.

@BobPortmann
Copy link
Contributor

@lindahua What if x .- mean(x, dim) only had to become x .- mean(x, dim, true). That doesn't seem particularly difficult.

@StefanKarpinski
Copy link
Member

Yes, but it's an extra argument to every single reduction function.

@BobPortmann
Copy link
Contributor

@lindahua "then there is a question why we would want to drop the dimension initially"

I dislike the special casing of tailing dimensions (both Julia and IDL do this). Let x be a 3D array. x[3,2,:] is a 3D array but x[:,3,2] is a 1D array. I think they should be consistent. I prefer to think of them both as 1 dimensional slices (i.e., both are 1D). I guess I wouldn't know exactly how it feels to use such a system until it was tried since none of the ones I use does this (fortran90 probably comes closest, but its not dynamic). I definitely think x[:,:,[2]] should be a 3D array and julia presently gets this right (IDL does not which can cause problems).

I do think that if v is a 1D array and x is 3D array then v .+ x should give an error. Arrays should be the same rank to consider broadcasting. However, this is somewhat independent of the dropping dimension issue.

@lindahua
Copy link
Contributor

In my view, consistency means there is a simple principle/rule that everything conforms to. From this perspective, @BobPortmann your proposal is consistent, treating both x[3,2,:] and x[:,3,2] is also consistent (based on a different rule).

The current Julian way is also a consistent framework, that is ndims(a[I1, I2, .., In]) equals to the index of the last non-scalar index. This invariance is very nice, without special cases, which even applies to the case of a[i1, i2, ..., in] where all indices i1, i2, ..., in are integers.

Generally, there are more than one consistent ways to define things. We have to carefully choose one that provides us the most practical benefit. I think the current Julia approach is a reasonable & sane approach.

@BobPortmann
Copy link
Contributor

@lindahua It may be consistent, but trailing dimensions still get treated different. In any case, I won't argue semantics, I think you know what I'm trying to say. I hope people give some thought to my proposal and don't just casually dismiss it cause it is a big change. Many of the ideas in this issue are big changes so I cannot understand why it is labeled milestone 1.0.

@malmaud
Copy link
Contributor

malmaud commented Feb 26, 2014

I mean, Matlab is also 'consistent' in that limited sense: the simple principle is that all singleton dimensions are dropped when squeeze is explicitly or implicitly called. But I think we all agree that's bad. I find it disturbing that

function getslice(myarray:Array{Int,2}, dim, idx) 
indexers = {1:size(myarray, 1), 1:size(myarray, 2)}
indexers[dim] = idx
getindex(myarray, indexers...)
end

is not type-stable, since the return type depends on whether dim==1 or dim==2. I can't really think of any reasons why the right time to drop dimensions is only when indexing with trailing scalars. It seems whether you agree or disagree with @JeffBezanson's point above, integer indexing should either never drop or always drop a dimension, respectively.

@lindahua
Copy link
Contributor

My point is not actually against @BobPortmann's proposal. Just that many different ways can be seen as consistent, and therefore we may need stronger reasons as to why we prefer one to another.

@timholy
Copy link
Member

timholy commented Feb 26, 2014

@BobPortmann:

add_unit_dim(m, dim)

Certainly you're right. The question is, which happens more often: the current objectionable call to squeeze, or the potential future objectionable call to add_unit_dim. In my own code I can promise you the latter would happen more often, and therefore be more annoying. (I somehow also find it more intuitive that squeeze(A, (2,4)) affects the current dimensions 2 and 4, whereas with add_unit_dim I find myself wondering how the 2 and 4 intercalate into the current dimensions. But this is probably just me, and I would soon find it perfectly clear.)

I agree with Stefan that the extra argument to reduction functions is also annoying. On balance I don't find the arguments about slicing out reduced dimensions very convincing.

I also agree with @lindahua that the current scheme is consistent, and that there are multiple consistent solutions. It does, however, seem to me that the rules are simplest if you just drop scalar-indexed dimensions everywhere.

@JeffBezanson, I agree that dimensions have meanings, but you can keep them using range indexing rather than scalar indexing. However, one annoying consequence of this proposal is that for any algorithm that accepts indexes but wants to keep the meaning of all the dimensions, one will need a call to something like rangeify_scalars:

function crop(img, indexes)
    I = rangeify_scalars(indexes)
    img[I...]
end

where

rangeify_scalars(indexes) = map(x->isa(x,Real) ? (x:x) : x, indexes)

Remembering to insert this call will surely be a source of bugs, at least a first.

@StefanKarpinski
Copy link
Member

@StefanKarpinski It seems to me that the associativtiy problem you reference is different. I think that if you want v'M*w to work with v,w vectors and M matrix then there needs to be a new transposed vector type, as @toivoh says in that post.

If you have a transposed vector type then that type is precisely what taking a row slice of a matrix should give you. Which brings us back to exactly where we are right now.

I think the best proposal aside from we currently do is @toivoh's "absent" dimensions idea. This could be represented with a dimension size of -1, although that's an implementation detail. Dimensions sliced with scalars become "absent" rather than being removed entirely. Then a row vector is -1 x n and a column vector is m x -1. All dimensions are treated the same – the trailing ones are not special. Something about this bothers me, however, although I'm not entirely sure what it is. The scheme can be generalized by allowing negative dimensions of arbitrary size. Then we get something very much like "up" and "down" dimensions. But then it starts to get very complicated. Should a matrix have an up and a down dimension? Should it have two up dimensions if it's a bilinear form?

@timholy
Copy link
Member

timholy commented Mar 1, 2014

Re dropping dimensions in reduction operations, here's another reason: if we dropped dimensions, mean(x, [1,2]) would not be type-stable.

@andreasnoack
Copy link
Member

@timholy That doesn't have to be the case. It could do as squeeze

julia> squeeze(mean(eye(2),(1,2)),(1,2))
0-dimensional Array{Float64,0}:
0.5

@timholy
Copy link
Member

timholy commented Mar 1, 2014

Sure, if you pass the dims with a tuple, but there are good reasons to allow people to use vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests