-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eliminate the StoredArray/AbstractArray distinction #6212
Comments
See my comment in #5810. There is a useful distinction to draw between arrays whose matrix elements are explicitly stored ( |
Here are two questions:
|
The crux of the matter, it seems, is whether the type system should only be allowed to delineate what is permitted (a descriptive approach), or whether it should encode preferences as to what should or should not be done (a prescriptive approach). In the former approach, cost doesn't really enter into the design decision. In the latter, structuring the type hierarchy in a particular way can be suggestive of underlying computational complexity. Should the design decision be, we'll be general enough to let you do it automatically but you just have to know that you have no guarantee on cost in practice, or that we know that certain operations on certain types will be terribly slow, so if you want to write this method you should know what you are getting into? |
I generally favor not providing default implementations that we know are going to have terrible performance. This is a bit of a judgement call, but I feel that it more often than not just leads to people trying something and filing issues about the performance. |
If getindex is not reasonably fast, then I don't think it should be an array. "Array" in computer science implies random access. On a more practical note, if you can't assume that getindex is reasonably fast, how could you implement any nontrivial method for AbstractArrays? And if you can't write a nontrivial method for it, why bother having that type in the first place? |
But isn't the point that
You can define |
@jiahao, an The point is to define a useful generalization, not just "some sort" of generalization. It has to be a generalization that is still specific enough that I can write a non-trivial Whether you can define a |
Technically, you can define |
You could make exactly the same arguments with |
@StefanKarpinski, technically, you cannot define Furthermore, this is not just "my" point of view. It is the point of view already adopted in @jiahao, you can only make the same arguments about For example, random access allows us to distinguish between |
I suspect that we won't be implementing linear operators on uncountable spaces any time soon and as long as you can enumerate your basis, that approach works. I had two points:
I'm not really arguing for or against elimination – I don't know – I'm just making some points. |
And you still haven't answered my question: with your suggested definitions, what non-trivial (With regards to |
@stevengj Do you think sparse matrices as arrays? Doing random access over a sparse matrix/vector is not less worse than doing random access over a linked list. |
Lookup of an arbitrary (i,j) index on an MxN CSC sparse matrix with at most K elements per column can be done in O(log K) time, which is close enough to random access, and is far better than the O(N) of a linked list of N elements. But if you have a sparse matrix in IJ format with no ordering, e.g. just a linked list of (i,j,value), then no, I would not think of that as an array data structure. Not all arrays are matrices, and not all matrices are stored in arrays. |
So maybe logarithmic or better (with low constant factor) is the right definition? That means that, e.g. binary search is admissible. There's not much besides direct indexing that is actually O(1), so if we required O(1), then we would almost necessarily be limiting ourselves to plain old dense arrays. |
Yes, I would say that we should recommend Õ(1) access for |
For many array (or array-like objects), even Take sum, for example. We can write: function sum(a::DenseArray)
# ... default implementation ...
end
function sum(a::AbstractArray)
tmp = copy!(similar(a), a)
return sum(tmp)
end Even if |
@lindahua, it seems like you have an infinite dispatch loop, since
but in this case there is no point in restricting it to |
Maybe I should write it this way to make the example more clear function sum{T}(a::AbstractArray{T}, region)
tmp = copy!(Array(T, size(a)), a)
return sum(tmp, region)
end I don't think |
@lindahua, your function would still be O(N^2) for a linked list. I'm fine if you want to set the bar for (But for Õ(1) amortized access, I don't see the point in making a copy. Why not just call |
I suggest the following requirement for
|
My The |
This seems like a very fine distinction. Why not just require Õ(1) amortized access per element, assuming many elements are accessed? Otherwise, pretty much all non-trivial If you want a |
My point is that something is qualified as an instance of The threshold can be discussed. The distinction between |
In this way, we can define function sum(a::StoredArray)
# using getindex
end
function sum(a::AbstractArray)
# first copy/convert it to an ordinary array, and then do sum
end |
@lindahua, I agree that you can define an It seems very unpleasant to me to have an "array" type such that all non-trivial methods on it first make a copy. As a user, I wouldn't expect that At the very least, we shouldn't call it an |
That's just a default fallback. One can always specialize on its own special array types if there exist specialized ways that are more efficient. |
@lindahua, I agree, but that is not the documented meaning of
We should update the documentation to be something like your definition: subtypes must define
But probably your stronger definition, specifying the memory layout, is best. Otherwise, there is no way to know the memory layout that is pointed to by |
Ok. That definition is a fully compatible refinement of the current description of |
Think about it more, I agree that the distinction between A reasonable way would be to maintain two levels of abstract array types in Base:
Additional notes:
|
I like it. |
Or perhaps the other way round: Anyway, this is a minor issue. I am fine if we keep the current names as they are. I think we should make the decision sooner better than later (preferably before 0.3 release). Then we have a clear guideline to clean up things. |
Looking at concrete examples currently in
I wonder how the |
The point of
Hence, I think the solution is pretty straightforward -- |
One of the original goals of revising the array hierarchy was to avoid code duplication between Arrays and BitArrays. It is true however that at the moment no methods use As far as the name goes, on the other hand, it's hard to think of something denser than a BitArray... |
perhaps call it To me, the choice of these names is a relatively minor issue. It is more important to give clear meaning/semantics to these types, and stick to them in practice. |
Marking this for 0.3, since we really shouldn't release 0.3 with all the new |
+1 for |
My search through the codebase shows that These codes use nothing more than |
I like the latest conclusion. Just to complicate matters a bit, the recently introduced UniformScaling <: AbstractArray but doesn't have size defined (nor getindex). |
If we settle on this, then |
+1 for the proposal; though about the naming |
|
…ng#6212, JuliaLang#987); UniformScaling is no longer an AbstractArray (JuliaLang#5810)
…ng#6212, JuliaLang#987); UniformScaling is no longer an AbstractArray (JuliaLang#5810)
…ling is no longer an AbstractArray (JuliaLang#5810)
…ling is no longer an AbstractArray (JuliaLang#5810)
…ling is no longer an AbstractArray (JuliaLang#5810)
…ling is no longer an AbstractArray (JuliaLang#5810)
I don't see what useful purpose is served by having this distinction in the type system, introduced by #987 (and see also the discussion in #5810). According to the description in the manual:
the main difference between a
StoredArray
and anAbstractArray
is that in the latter, the elements are computed or requested on the fly whereas in the former they are "stored". But this seems like an implementation detail.... why do we want to have this distinction in the type system? Under what circumstances would you dispatch on the difference between computed and stored values?The other difference is that, in an
AbstractArray
, not all elements may be accessible. (But in this case, shouldn't it be anAssociative
type?) If you can't even access all elements, what non-trivial array-like methods could possibly operate on anAbstractArray
? And if you can't write a non-trivial method for it, why even have the type at all?My suggestion would be to just drop this distinction, and make
StoredArray === AbstractArray
. And suggest that every concreteAbstractArray
type should, at minimum, providesize
andgetindex
(andsetindex!
if it is mutable), including a single-index variant.The text was updated successfully, but these errors were encountered: