Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array type hierarchy #987

Closed
JeffBezanson opened this issue Jun 28, 2012 · 66 comments
Closed

array type hierarchy #987

JeffBezanson opened this issue Jun 28, 2012 · 66 comments
Labels
kind:breaking This change will break code needs decision A decision on this change is needed

Comments

@JeffBezanson
Copy link
Sponsor Member

We have discussed the need for more types above Array, e.g. in #750. Candidates include DenseArray, UniformDenseArray, ContiguousArray, UniformAccessArray, etc.

@timholy
Copy link
Sponsor Member

timholy commented Jun 28, 2012

While the topic is up for consideration: it also might be useful to be able to test whether an array is stored in-memory or is mmapped. There are times when the best algorithm will depend upon access patterns.

I don't know whether this issue can be solved by the type hierarchy, though---seems like a multiple inheritance problem?

@carlobaldassi
Copy link
Member

I originally proposed ContiguousArray but I vote for UniformAccessArray, it seems the most descriptive to me.

On a related note: in #750 I also proposed to make StridedArray a part of the abstract hierarchy rather than a Union, but that would cause problems in capturing the A<:Array tag which is part of the current definition.
However: as things are defined now, SubArrays would not fit in UniformAccessArray (or whatever it's going to be called), since they can be associated in general with AbstractArrays. But StridedArrays restrict that to Arrays, so StridedArrays should be legitimately thought of as UniformAccessArrays. I don't think it's possible to do that with a Union though.

Wouldn't it make sense to restrict SubArrays to be derived from UniormAccessArray only, once we have that? It seems to me that they already are implemented with that assumption implicitly. If so, then perhaps the A<:Array tag could be dropped in StridedArray and it could be made part of the hierarchy.

@JeffBezanson
Copy link
Sponsor Member Author

SubArray is a bit of a problem here; basically, it is a UniformAccessArray if and only if its underlying array is. SubArrays of DArrays work fairly well, but mostly because I explicitly wrote DArray functions to handle them (using the type SubOrDArray). It seems to me the same thing would have to be done for SubArrays of sparse.

The non-scalar indexing functions for SubArray are pretty generic, since they just transform indexes.

We might have to use a Union of UniformAccessArray and SubArray{T,N,<:UniformAccessArray} in most places.

@carlobaldassi
Copy link
Member

I see, you're definitely right of course. Speaking about Unions, it would probably be nice to have some common convention for those things, e.g. that used for DArrays seems nice (the only problem would be that SubOrUniformAccessArray is really too long, and so would be SubOrSparseMatrixCSC; this calls for some kind of abbreviation, or for choosing a different name).
As an alternative convention, it may also make sense to give the UniformAccessArray name to the Union itself, since that's the type which is going to be used in most places and the name fits anyway, and have BaseUniformAccessArray and SubUniformAccessArray (or some better/shorter names) as the base and sub types, respectively.

@JeffBezanson
Copy link
Sponsor Member Author

Given that we have AbstractSparseMatrix, it seems to make sense to add AbstractDenseArray. A distributed array might manage (distribute) any kind of array, and so is neither inherently dense nor sparse. The type system is not able to express that a distributed array is dense iff its component arrays are.

Note: not a breaking change, so not urgent.

@ViralBShah
Copy link
Member

This would also help address #1276.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 2, 2013

related: #1412

@goretkin
Copy link
Contributor

goretkin commented Dec 5, 2013

I'm not sure where to put this, but I want to call attention to some comments by Prof. Stephen Boyd regarding a system that would keep track of special structure of arrays. He suggests a name "disciplined linear algebra".

https://www.youtube.com/watch?v=Ap8LGbCVx4I&t=49m50s

I actually think that, since you asked, I'll actually say a little bit about it. I think the situation here is like disciplined convex programming. So when you first do convex optimization, what you want to do is to write a modeling language where you can write down anything you want, and then some program will analyze it, come back, and say, "Oh it's your lucky day. It's convex."
....
I think maybe someone should make something that's disciplined linear algebra, where the goal is not to discover some structure that you or one of your fellow researchers obscured my multiplying this out. This doesn't apply to raw data, by the way, of course. There it's not that somebody obscured the sample returns, this is a model. But in the case of working with stuff, you would actually want something like disciplined linear algebra, where you have an interesting data structure. You would store things as diagonal plus low rank, and that would actually be trapped and taken care of. That's actually what you'd really want, to tell you the truth, something that works that way. Not something that kind of discovers structure.

The same way with convex programming, I think it's better to be given a set of atoms, a set of rules that come straight from the math, and you just follow those and everything's cool."

@JeffBezanson
Copy link
Sponsor Member Author

@jiahao, @alanedelman and I discussed this today. Our current thinking is:

abstract AbstractTensor{N}
abstract StructuredArray{T,N} <: AbstractTensor{N}
abstract StoredArray{T,N} <: StructuredArray{T,N}
abstract MatrixFactorization{T} <: AbstractTensor{2}

type Array{T,N} <: StoredArray{T,N}; ...; end

typealias LinearOperator AbstractTensor{2}
typealias StructuredMatrix{T} StructuredArray{T,2}

Basically, StoredArray captures the idea of an array that works like a memory --- you have general read/write access. Another definition is "arrays that would be familiar to a Fortran programmer". NumPyArray would be another subtype of it.

A StructuredArray can be read from with getindex, but you might not be able to mutate it, or you might be able to mutate only certain indexes. This would include Tridiagonal and such. A MatrixFactorization is similar but slightly more opaque.

An AbstractTensor is a very general thing that you can imagine occupying a rectangular region of some (abstract) space, or a thing with indexes that form a product space. This includes both conventional arrays and general linear operators (and potentially multilinear operators).

The goals are

  • fix StridedArray should be an abstract class, not a union #2345 (AbstractArray is not cutting it)
  • allow people who know about general linear operators to write methods for them, but have them apply to normal Arrays
  • separate not-quite-arrays (all the matix factorization types), while still giving them a place to live

We found AbstractTensor the hardest thing to name, and we aren't fully happy with it. What means "like either an array, or an opaque linear operator"? Or what means "like an array, but more abstract"? Another possibility was AbstractMap. Or possibly something involving the word "cartesian".

cc @StefanKarpinski

@jiahao
Copy link
Member

jiahao commented Dec 21, 2013

In the final round of conclave voting, AbstractTensor narrowly beat out SuperTensor, MultilinearOperator, AbstractMap, AbstractRectangle, Arrayable, PotentialTensor, and Fred to be the least annoying abstract name.

On a more serious note, what we wanted was a way to unify the notions of Array{T,N} as data container and "MultilinearOperator{N}" as an N-dimensional map. In particular, the thing we didn't want to do is to distinguish Matrix{T} from Array{T,2}. (Numpy, to its great woe, does, and we think it's a Bad Idea.)

Also, I wouldn't be sad to banish AbstractMatrix; the name is misleading.

@lindahua
Copy link
Contributor

This is a problem where the decision would have profound impact to the future evolution of Julia.

I read the proposal outlined by @JeffBezanson above carefully, and I am quite happy (actually excited) with that.

@lindahua
Copy link
Contributor

I still have a question though. Where is sparse matrix going? I guess it is under StructuredArray, but not StoredArray?

@jiahao
Copy link
Member

jiahao commented Dec 21, 2013

We envisioned it to be a subtype of StoredArray. The distinction we had in mind between StoredArray and StructuredArray is that setindex! on arbitrary indices (in range) is guaranteed to work for the former but not the latter.

More generally, we thought that the hierarchies should reflect also the computational cost of indexing. For example, one could consider a matrix Factorization as some representation of a matrix with a nontrivial cost of getindex and a highly nontrivial cost of setindex! (imagine for example a matrix represented as a long chain of Givens rotations multiplied together and postmultiplied by a Tridiagonal). So we had in mind a classification that went something like this:

getindex setindex! Type
O(1) O(1), always possible within bounds Matrix, SparseMatrixCSC, ... <: StoredArray
O(1) O(1), sometimes impossible Tridiagonal, Triangular, ... <: StructuredArray
>=O(1) >=O(1), sometimes impossible Factorization, QRPackedQ, ... <: AbstractTensor{2}

@lindahua
Copy link
Contributor

Are you sure getindex and setindex! are O(1) for sparse matrices?

I think even figuring out whether an element is non-zero in a sparse matrix may require O(nnz) in worst case.

@jiahao
Copy link
Member

jiahao commented Dec 21, 2013

Are you sure getindex and setindex! are O(1) for sparse matrices?

Not really. In @alanedelman's terms, we distinguish between cases where getindex/setindex! should take 0 flops (pure memory offset calculation), and when they don't (i.e. some nontrivial computation is needed to simulate a memory offset calculation). Perhaps sparse matrices are somewhere in between, but should be O(1) with respect to matrix dimension.

@jiahao
Copy link
Member

jiahao commented Dec 21, 2013

Maybe a better way to express the difference is to differentiate between things that behave like matrices and have matrix elements stored somewhere in memory, and things that behave like matrices but don't have explicitly stored values that correspond to matrix elements and have to be reconstructed by some kind of vector or matrix arithmetic.

@ViralBShah
Copy link
Member

Our Sparse indexing is O(1) to find the column and binary search in the column for the row.

@JeffBezanson
Copy link
Sponsor Member Author

Yes, the complexity issue is more of a guideline --- if getindex is only possible but very inefficient, then you probably don't have a StoredArray.

I wonder where AbstractArray fits in. It could maybe be an alias for StructuredArray for backwards compatibility.

We might need finer distinctions within StoredArray --- a subtype like LocalDenseArray would still make sense.

@jiahao
Copy link
Member

jiahao commented Dec 21, 2013

We also didn't discuss where DArrays fit in.

@alanedelman
Copy link
Contributor

I for one would love a world where I can type

full( StoredArray( cumsum, 5)) or some such and obtain
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
Also
One approximate distinction
that divides StoredArrays from non-StoredArray's is that aStoredArray` is convenient for Balacing for eigenvalues

@timholy
Copy link
Sponsor Member

timholy commented Dec 21, 2013

Does size work efficiently on LinearOperator? I was recently wrestling with this in IterativeSolvers, and decided to declare that it should. But it was a close call; the alternative is simply to calculate A'*b and see what size object comes out.

@alanedelman
Copy link
Contributor

My previous note written before coffee had the right idea but the wrong details..

Would it make sense to create a linear operator out of cumsum by insisting on a size?
At present cumsum knows its size by the argument

Thus we get a linear operator by creating

A = LinearOperator(cumsum, 5)

and then one could type size(A) and lookup the answer (5,5)

one could then make sense of

A*eye(5) which would would give the same answer as full(A)

5x5 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0
 1.0  1.0  0.0  0.0  0.0
 1.0  1.0  1.0  0.0  0.0
 1.0  1.0  1.0  1.0  0.0
 1.0  1.0  1.0  1.0  1.0

I think what I was missing until yesterday is that to me at the highest level of abstraction, I always
thought that a linear function really acts like a matrix, but it's very nice to explicitly turn a function
into a LinearOperator and then let it join the matrix wannabe club.

Stencils would fit nicely here.

@lindahua
Copy link
Contributor

Another family of arrays that need to be taken into account when designing such a hierarchy are those from remote machines/devices -- such arrays (e.g. DArray, Arrays on GPU, etc) are increasingly used in practice.

For these arrays, their elements are probably stored somewhere (but not in the memory that CPU can directly access), so you are not expected to directly do getindex or setindex! on them. However, you can fetch their content to make an ordinary array.

@alanedelman
Copy link
Contributor

Star-P went very far with this , but somehow one wishes this could be done without types.

@JeffBezanson
Copy link
Sponsor Member Author

I think a DArray is a StoredArray. But we should identify a class of algorithms that usefully separates DArray from other StoredArrays. My guess is that the key class is "code that naively loops over all indexes". You can only do that for dense arrays with uniform access, so the type would be UniformDenseArray or LocalDenseArray. This is a pretty important type so the length of those names is regrettable.

@lindahua
Copy link
Contributor

The algorithmic pattern of looping over arrays using linear index consists of a very large part of the current code base (in both Base and packages). I think it is important to give such class of arrays a short and easy-to-remember name. In my opinion, DenseArray is good enough.

@JeffBezanson
Copy link
Sponsor Member Author

We actually don't have a very good way to document abstract types.

@timholy
Copy link
Sponsor Member

timholy commented Feb 21, 2014

How about laying out the meaning of the types here?

@lindahua
Copy link
Contributor

I reopen this as there remains some debates in the array type system (#5810).

The key questions needed to be answered: we need clarification of APIs for these array types.

Related discussion: #6212.

@lindahua lindahua reopened this Mar 19, 2014
@lindahua lindahua removed this from the 0.3 milestone Mar 19, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 24, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 24, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 31, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 31, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 31, 2014
stevengj added a commit to stevengj/julia that referenced this issue Mar 31, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests