Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce iscontiguous function or trait #10889

Open
lindahua opened this issue Apr 19, 2015 · 29 comments
Open

Introduce iscontiguous function or trait #10889

lindahua opened this issue Apr 19, 2015 · 29 comments
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s]

Comments

@lindahua
Copy link
Contributor

Many C libraries have functions like the following:

void fastadd(size_t len, const double *a, const double *b, double *c)

These functions assume that the arrays have contiguous memory layout.

Now for safety, packages like Yeppp and VML etc restrict the arguments to Array. It would be very useful to generalize the fast computation functions there to arbitrary array types with contiguous array layouts (e.g. some SubArray or array views).

A iscontiguous function can be useful in such cases

function fastadd(a::StridedArray{Float64}, b::StridedArray{Float64}, c::StridedArray{Float64})
    (iscontiguous(a) && iscontiguous(b) && iscontiguous(c)) || 
        error("a, b, c must be contiguous")
    # invoke the fastadd c-function
end

cc: @timholy @ViralBShah @simonster

@lindahua
Copy link
Contributor Author

I can make a PR if people think it is worth it.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 19, 2015

I feel that this should already verified by any implementation of convert(Ptr, AbstractArray), so that the extra effort here is not an improvement over duck-typing

@lindahua
Copy link
Contributor Author

There are certain cases where we want to get a ptr, but doesn't require the underlying memory to be contiguous (e.g. gemv and gemm only requires the matrix to be strided but not necessarily contiguous).

The iscontiguous function is used where the ccall requires the underlying memory to be strictly contiguous.

@lindahua
Copy link
Contributor Author

The BLAS functions now use pointer to obtain the pointer to the base address, which does not require the array to be contiguous. For example,

a = rand(3, 4)
b = sub(a, 1:2, :)
p = pointer(b)   # this is fine, and rightly so

However, supplying b to a function that relies on contiguous memory block is dangerous.

Hence, being able to get the pointer and testing whether an array is contiguous are two different things, and need to be addressed in different functions.

@simonster
Copy link
Member

DenseArrays are necessarily contiguous, right? So the problem is effectively that there are some parameters that make for contiguous SubArrays and some that don't. I see two other possible ways to handle this:

  • Split contiguous SubArrays and non-contiguous SubArrays into two separate types, so that we can make contiguous SubArrays <: DenseArray and then use DenseArray as intended. I think I would prefer this, although I'm not sure I'm imagining all the complications.
  • Add a ContiguousArray (or something) union type. StridedArray is also a union type, so I'm not sure this makes things less flexible than they are now. This requires being able to define the set of SubArrays that are contiguous within the type system (although this is trivial if we were to add another parameter during SubArray construction). This is kind of ugly, because ContiguousArray doesn't really mean anything different from DenseArray.

You might actually want to dispatch on contiguity, e.g. if you are picking between an optimized algorithm from an external library that only supports contiguous arrays and pure Julia code that is slower but can handle anything, so I think I'd prefer one of the above two approaches to iscontiguous.

@lindahua
Copy link
Contributor Author

DenseArrays only need to be strided, not contiguous.

Adding ContiguousArray is one approach, but it needs to be an abstract type not a union. Otherwise, the contiguous view in ArrayViews can not be incorporated here.

If we take this approach, we may need some revision of the type hierarchy of arrays. While the iscontsguous function seems to be less elegant, it is much easier to implement.

@lindahua
Copy link
Contributor Author

The current type hierarchy is as follows:

-- AbstractArray
   |-- DenseArray
        |-- Array
        |-- ArrayViews.ContiguousView
        |-- ArrayViews.StridedView
   |-- SubArray

Then, we have a union type StridedArray defined as a union of certain subarray types and dense array.

Union(SubArray{T,N,A<:DenseArray{T,N},I<:Tuple{Vararg{Union(Int64,Colon,Range{Int64})}},LD},DenseArray{T,N})

@lindahua
Copy link
Contributor Author

Related to the discussion #10385.

@lindahua
Copy link
Contributor Author

Another way is to use array traits, as suggested by @timholy

type Contiguous end
type NotContiguous end

contiguousness(a::AbstractArray) = NotContiguous()
contiguousness(a::Array) = Contiguous()
contiguousness(a::SubArray) = # more sophisticated stuff

function fastsum(a::StridedArray)
    _fastsum(a, contiguousness(a))
end

_fastsum(a::StridedArray, ::Contiguous) = # ... call the underlying C functions

@lindahua lindahua changed the title Introduce iscontiguous function Introduce iscontiguous function or trait Apr 19, 2015
@timholy
Copy link
Sponsor Member

timholy commented Apr 19, 2015

iscontiguous already exists:

julia/base/subarray.jl

Lines 396 to 412 in 8f69307

# This might be conservative, but better safe than sorry
function iscontiguous{T,N,P,I,LD}(::Type{SubArray{T,N,P,I,LD}})
Ip = I.parameters
LD == length(Ip) || return false
length(Ip) < 1 && return true
Ip[1] == Colon && return true
if Ip[1] <: UnitRange
# It might be stride1 == 1, or this might be because `sub` was
# used with an integer for the first index
for j = 2:length(Ip)
(Ip[j] == Colon || (Ip[j] <: AbstractVector)) && return false
end
return true
end
false
end
iscontiguous(S::SubArray) = iscontiguous(typeof(S))

Right now it's intended for internal use in stagedfunctions, but we could change it to return Val{true} or Val{false} so it could be used in type inference.

@lindahua
Copy link
Contributor Author

Great. I think you may also add

iscontiguous{T,N}(::Type{Array{T,N}}) = true
iscontiguous(::Array) = true

I think this is useful for interfacing with C libraries, as mentioned above.

@stevengj
Copy link
Member

See also the (slightly more general, perhaps, because it doesn't assume the dimesions are sorted in a particular order?) iscontiguous function that I wrote for Blosc.jl.

@timholy
Copy link
Sponsor Member

timholy commented Apr 19, 2015

There seem to be two conflicting requirements: something that works on types so you can use it during type inference, and something that you can evaluate at runtime and call out to C libraries. This:

A = rand(3,5)
B1 = sub(A, :, 2:3)    # inference: yes, runtime: yes
B2 = sub(A, 1:2, 2:3)  # inference: no, runtime: no
B3 = sub(A, 1:3, 2:3)  # inference: no, runtime: yes

illustrates the conflict.

@Jutho
Copy link
Contributor

Jutho commented Apr 19, 2015

Instead of Val{true} why not a new type as with linear indexing. Then we can make storage itself a trait with a useful hierarchy: Strided, UnitStrided ( for BLAS matrices), Contiguous, ...

That way storage does no longer need no longer be reflected in the AbstractArray hierarchy itself, where it often leads to conflicts.

@lindahua
Copy link
Contributor Author

I think using specific types to indicate different memory layout is a good idea.

Below is an example that demonstrates how this may be used:

# memory layout traits

abstract MemLayout
type Contiguous <: MemLayout end
type UnitStrided <: MemLayout end
type Strided <: MemLayout end

memlayout(a::StridedArray) = Strided
memlayout(a::Array) = Contiguous
memlayout(a::SubArray) = ...   # probably need a staged function to figure out

# suppose we have a highly optimized C functions to sum elements
# and we want to port it to Julia, naming it to `fastsum`

fastsum(a::StridedArray{Float64}) = fastsum(a, memlayout(a))

fastsum(a::StridedArray{Float64}, ::Type{Contiguous}) = # call out the C function

# it is possible to write it for general arrays, 
# using matrix here to make the illustration simple
function fastsum(a::StridedMatrix{Float64}, ::Type{UnitStrided})
    s = 0.0
    for i = 1:size(a,2)
        s += fastsum(slice(a, :, i), Contiguous)
    end
    return s
end

# fallbacks to the builtin implementation for other layouts
fastsum{L<:MemLayout}(a::StridedArray, ::Type{L}) = sum(a)  

@timholy
Copy link
Sponsor Member

timholy commented Apr 20, 2015

Sure, we could definitely introduce those types. An alternative approach would be Val{:LinearFast} etc (or Val{:Contiguous} in this case). I'm not actually sure which is the better approach, but I had the impression that proliferation of types is more of a problem than proliferation of parameters. Others surely know more.

(For the record, the linear indexing traits were introduced before Val existed, if my memory serves.)

@Jutho
Copy link
Contributor

Jutho commented Apr 20, 2015

The point of using types would be that you could use them like this:

abstract MemLayout
abstract Strided <: MemLayout
abstract UnitStrided <: Strided
abstract Contiguous <: UnitStrided

(the last one can probably be a type or immutable). Then some methods can be defined for all strided storage (e.g. permutedims), others only for unit strided storage (BLAS methods), and yet others only for contiguous storage.

@timholy
Copy link
Sponsor Member

timholy commented Apr 20, 2015

Works for me.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 14, 2016
@JeffBezanson JeffBezanson modified the milestones: 1.0, 0.6.0 Jan 3, 2017
@StefanKarpinski
Copy link
Sponsor Member

Still relevant/necessary?

@Jutho
Copy link
Contributor

Jutho commented Jul 20, 2017

I think it is. It would be good to have a storage/memlayout trait, rather than trying to reflect this property in the type hierarchy, which makes it less flexible and more difficult to extend (as in the SubArray example).

@mbauman
Copy link
Sponsor Member

mbauman commented Jul 20, 2017

Yes — the StridedArray typealias is still a major wart here. I'd like to see that replaced with a more trait-like interface.

@timholy
Copy link
Sponsor Member

timholy commented Jul 21, 2017

For anyone who wants to jump on this, it mostly consists of replacing

qr(A::StridedArray) = ...

with

qr(A::AbstractArray) = _qr(MemLayout(A), A)
_qr(::Strided, A) = ...

once the MemLayout trait is defined (for which one can mimic the IndexStyle trait).

@mbauman mbauman added the domain:arrays [a, r, r, a, y, s] label Aug 29, 2017
@StefanKarpinski
Copy link
Sponsor Member

The first question to answer here is if adding this is a breaking change in any way. If not, we should move this off of the 1.0 milestone. If not, we should figure out the minimal change we'd need to allow this in the future without breaking anything.

@StefanKarpinski StefanKarpinski added the status:triage This should be discussed on a triage call label Sep 21, 2017
@StefanKarpinski
Copy link
Sponsor Member

Still unclear if there's anything breaking here and no one has chimed in since I asked.

@stevengj
Copy link
Member

stevengj commented Sep 21, 2017

I'm not convinced that traits are needed here. If it is a DenseArray or a subarray thereof, then it should have strides, and from strides we can implement functions to check whether it is contiguous (and if so in what dimension order). That approach would not be breaking.

(And if traits are needed, they can be added on top of that. A trait would not be breaking either.)

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 21, 2017

If it is a DenseArray

Well, that's the problem. SubArray only defines strides for some of its parameterizations and is not a DenseArray in general. Similar issues exist for Sparse arrays and views thereof.

That said, I don't think this is breaking, but it can be hard to think through all the ramifications of changing informal interfaces that get extended by user code. This is really largely an addition of a new feature — the ability to ask an array about its storage type — and a large refactoring of functions in Base to take advantage of that fact.

We can follow the pattern that we use for IndexStyle optimizations (as Tim proposes above for qr via the internal _qr). For some functions, we'd be removing the provided svdfact(::StridedMatrix) in favor of a more general svdfact(::AbstractMatrix), which I suppose could affect the precedence of some pathological and type-pirating user-code (e.g., defining svdfact(::Any)).

Type-Piracy absolutely can and should be excluded from stability guarantees. Am I missing any other places where this would be breaking?

@timholy
Copy link
Sponsor Member

timholy commented Sep 22, 2017

@stevengj, you want to have AbstractArray types that endow arrays with all kinds of new properties: you want to reshape them (ReshapedArray and RowVector), modify their indices (OffsetArray), reinterpret/map their values (ReinterpretedArray or MappedArray), give their axes names (AxisArray and similar), make their indices periodic (FFTViews), and so on. These wrappers can often work just fine for either dense/contiguous or non-dense/non-contiguous parent arrays. Consequently, none of these "behavior-endowing" AbstractArrays can be a subtype of DenseArray. So for most purposes, if you need to know whether data are packed contiguously, it's almost completely useless to rely on inheritance. This is a corner of the design space that simply requires traits.

@timholy
Copy link
Sponsor Member

timholy commented Sep 22, 2017

@StefanKarpinski, the breaking part is that in an ideal world we'd simultaneously ditch StridedArray from Julia 1.0.

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 22, 2017

Eh, it's a fairly harmless typealias. We can always deprecate it alongside the new feature on a minor release (even if it won't be completely removed until 2.0). The important part isn't that it goes away, the important part is that folks start using the alternative. And a deprecation alone is sufficient for that.

@mbauman mbauman modified the milestones: 1.0, 1.x Sep 22, 2017
@mbauman mbauman removed the status:triage This should be discussed on a triage call label Sep 22, 2017
@DilumAluthge DilumAluthge removed this from the 1.x milestone Mar 13, 2022
@nsajko nsajko added the design Design of APIs or of the language itself label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests