- 1 The
MGL-MAT
ASDF System - 2 Links
- 3 Introduction
- 4 Tutorial
- 5 Basics
- 6 Element types
- 7 Printing
- 8 Shaping
- 9 Assembling
- 10 Caching
- 11 BLAS Operations
- 12 Destructive API
- 13 Non-destructive API
- 14 Mappings
- 15 Random numbers
- 16 I/O
- 17 Debugging
- 18 Facet API
- 19 Writing Extensions
- Version: 0.1.0
- Description:
MAT
is library for working with multi-dimensional arrays which supports efficient interfacing to foreign and CUDA code. BLAS and CUBLAS bindings are available. - Licence: MIT, see COPYING.
- Author: Gábor Melis [email protected]
- Mailto: [email protected]
- Homepage: http://melisgl.github.io/mgl-mat
- Bug tracker: https://github.com/melisgl/mgl-mat/issues
- Source control: GIT
Here is the official repository and the HTML documentation for the latest version.
MGL-MAT is library for working with multi-dimensional arrays which supports efficient interfacing to foreign and CUDA code with automatic translations between cuda, foreign and lisp storage. BLAS and CUBLAS bindings are available.
Currently only row-major single and double float matrices are supported, but it would be easy to add single and double precision complex types too. Other numeric types, such as byte and native integer, can be added too, but they are not supported by CUBLAS. There are no restrictions on the number of dimensions, and reshaping is possible. All functions operate on the visible portion of the matrix (which is subject to displacement and shaping), invisible elements are not affected.
All dependencies are in quicklisp except for
CL-CUDA that needs to be
fetched from github. Just clone CL-CUDA and MGL-MAT into
quicklisp/local-projects/
and you are set. MGL-MAT itself lives
at github, too.
Prettier-than-markdown HTML documentation cross-linked with other libraries is available as part of PAX World.
We are going to see how to create matrices, access their contents.
Creating matrices is just like creating lisp arrays:
(make-mat '6)
==> #<MAT 6 A #(0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0)>
(make-mat '(2 3) :ctype :float :initial-contents '((1 2 3) (4 5 6)))
==> #<MAT 2x3 AB #2A((1.0 2.0 3.0) (4.0 5.0 6.0))>
(make-mat '(2 3 4) :initial-element 1)
==> #<MAT 2x3x4 A #3A(((1.0d0 1.0d0 1.0d0 1.0d0)
--> (1.0d0 1.0d0 1.0d0 1.0d0)
--> (1.0d0 1.0d0 1.0d0 1.0d0))
--> ((1.0d0 1.0d0 1.0d0 1.0d0)
--> (1.0d0 1.0d0 1.0d0 1.0d0)
--> (1.0d0 1.0d0 1.0d0 1.0d0)))>
The most prominent difference from lisp arrays is that MAT
s are
always numeric and their element type (called CTYPE
here) defaults
to :DOUBLE
.
Individual elements can be accessed or set with MREF
:
(let ((m (make-mat '(2 3))))
(setf (mref m 0 0) 1)
(setf (mref m 0 1) (* 2 (mref m 0 0)))
(incf (mref m 0 2) 4)
m)
==> #<MAT 2x3 AB #2A((1.0d0 2.0d0 4.0d0) (0.0d0 0.0d0 0.0d0))>
Compared to AREF
MREF
is a very expensive operation and it's best
used sparingly. Instead, typical code relies much more on matrix
level operations:
(princ (scal! 2 (fill! 3 (make-mat 4))))
.. #<MAT 4 BF #(6.0d0 6.0d0 6.0d0 6.0d0)>
==> #<MAT 4 ABF #(6.0d0 6.0d0 6.0d0 6.0d0)>
The content of a matrix can be accessed in various representations called facets. MGL-MAT takes care of synchronizing these facets by copying data around lazily, but doing its best to share storage for facets that allow it.
Notice the ABF
in the printed results. It illustrates that behind
the scenes FILL!
worked on the BACKING-ARRAY
facet (hence the B
) that's basically a 1d lisp array. SCAL!
on the
other hand made a foreign call to the BLAS dscal
function for
which it needed the FOREIGN-ARRAY
facet (F
).
Finally, the A
stands for the ARRAY
facet that was
created when the array was printed. All facets are up-to-date (else
some of the characters would be lowercase). This is possible because
these three facets actually share storage which is never the case
for the CUDA-ARRAY
facet. Now if we have a
CUDA-capable GPU, CUDA can be enabled with WITH-CUDA*
:
(with-cuda* ()
(princ (scal! 2 (fill! 3 (make-mat 4)))))
.. #<MAT 4 C #(6.0d0 6.0d0 6.0d0 6.0d0)>
==> #<MAT 4 A #(6.0d0 6.0d0 6.0d0 6.0d0)>
Note the lonely C
showing that only the CUDA-ARRAY
facet was used for both FILL!
and SCAL!
. When WITH-CUDA*
exits and
destroys the CUDA context, it destroys all CUDA facets, moving their
data to the ARRAY
facet, so the returned MAT
only has
that facet.
When there is no high-level operation that does what we want, we may need to add new operations. This is usually best accomplished by accessing one of the facets directly, as in the following example:
(defun logdet (mat)
"Logarithm of the determinant of MAT. Return -1, 1 or 0 (or
equivalent) to correct for the sign, as the second value."
(with-facets ((array (mat 'array :direction :input)))
(lla:logdet array)))
Notice that LOGDET
doesn't know about CUDA at all. WITH-FACETS
(0
1
)
gives it the content of the matrix as a normal multidimensional lisp
array, copying the data from the GPU or elsewhere if necessary. This
allows new representations (FACET
s) to be added easily and it also
avoids copying if the facet is already up-to-date. Of course, adding
CUDA support to LOGDET
could make it more efficient.
Adding support for matrices that, for instance, live on a remote machine is thus possible with a new facet type and existing code would continue to work (albeit possibly slowly). Then one could optimize the bottleneck operations by sending commands over the network instead of copying data.
It is a bad idea to conflate resource management policy and algorithms. MGL-MAT does its best to keep them separate.
-
[class] MAT CUBE
A
MAT
is a dataCUBE
that is much like a lisp array, it supportsDISPLACEMENT
, arbitraryDIMENSIONS
andINITIAL-ELEMENT
with the usual semantics. However, aMAT
supports different representations of the same data. See Tutorial for an introduction.
-
[reader] MAT-CTYPE MAT (:CTYPE = *DEFAULT-MAT-CTYPE*)
One of
*SUPPORTED-CTYPES*
. The matrix can hold only values of this type.
-
[reader] MAT-DISPLACEMENT MAT (:DISPLACEMENT = 0)
A value in the
[0,MAX-SIZE]
interval. This is like the DISPLACED-INDEX-OFFSET of a lisp array, but displacement is relative to the start of the underlying storage vector.
-
[reader] MAT-DIMENSIONS MAT (:DIMENSIONS)
Like
ARRAY-DIMENSIONS
(0
1
). It holds a list of dimensions, but it is allowed to pass in scalars too.
-
[function] MAT-DIMENSION MAT AXIS-NUMBER
Return the dimension along
AXIS-NUMBER
. Similar toARRAY-DIMENSION
.
-
[reader] MAT-INITIAL-ELEMENT MAT (:INITIAL-ELEMENT = 0)
If non-nil, then when a facet is created, it is filled with
INITIAL-ELEMENT
coerced to the appropriate numeric type. IfNIL
, then no initialization is performed.
-
[reader] MAT-SIZE MAT
The number of elements in the visible portion of the array. This is always the product of the elements
MAT-DIMENSIONS
(0
1
) and is similar toARRAY-TOTAL-SIZE
.
-
[reader] MAT-MAX-SIZE MAT (:MAX-SIZE)
The number of elements for which storage may be allocated. This is
DISPLACEMENT
+MAT-SIZE
+SLACK
whereSLACK
is the number of trailing invisible elements.
-
[function] MAKE-MAT DIMENSIONS &REST ARGS &KEY (CTYPE *DEFAULT-MAT-CTYPE*) (DISPLACEMENT 0) MAX-SIZE INITIAL-ELEMENT INITIAL-CONTENTS (SYNCHRONIZATION *DEFAULT-SYNCHRONIZATION*) DISPLACED-TO (CUDA-ENABLED *DEFAULT-MAT-CUDA-ENABLED*)
Return a new
MAT
object. IfINITIAL-CONTENTS
is given then the matrix contents are initialized withREPLACE!
. See classMAT
for the description of the rest of the parameters. This is exactly what (MAKE-INSTANCE
'MAT
...) does exceptDIMENSIONS
is not a keyword argument so thatMAKE-MAT
looks more likeMAKE-ARRAY
. The semantics ofSYNCHRONIZATION
are desribed in the Synchronization section.If specified,
DISPLACED-TO
must be aMAT
object large enough (in the sense of itsMAT-SIZE
), to holdDISPLACEMENT
plus(REDUCE #'* DIMENSIONS)
elements. Just like withMAKE-ARRAY
,INITIAL-ELEMENT
andINITIAL-CONTENTS
must not be supplied together withDISPLACED-TO
. See Shaping for more.
-
[function] ARRAY-TO-MAT ARRAY &KEY CTYPE (SYNCHRONIZATION *DEFAULT-SYNCHRONIZATION*)
Create a
MAT
that's equivalent toARRAY
. Displacement of the created array will be 0 and the size will be equal toARRAY-TOTAL-SIZE
. IfCTYPE
is non-nil, then it will be the ctype of the new matrix. ElseARRAY
's type is converted to a ctype. If there is no corresponding ctype, then*DEFAULT-MAT-CTYPE*
is used. Elements ofARRAY
are coerced toCTYPE
.Also see Synchronization.
- [function] MAT-TO-ARRAY MAT
-
[function] REPLACE! MAT SEQ-OF-SEQS
Replace the contents of
MAT
with the elements ofSEQ-OF-SEQS
.SEQ-OF-SEQS
is a nested sequence of sequences similar to theINITIAL-CONTENTS
argument ofMAKE-ARRAY
. The total number of elements must match the size ofMAT
. ReturnsMAT
.SEQ-OF-SEQS
may contain multi-dimensional arrays as leafs, so the following is legal:(replace! (make-mat '(1 2 3)) '(#2A((1 2 3) (4 5 6)))) ==> #<MAT 1x2x3 AB #3A(((1.0d0 2.0d0 3.0d0) (4.0d0 5.0d0 6.0d0)))>
-
[function] MREF MAT &REST INDICES
Like
AREF
for arrays. Don't use this if you care about performance at all. SETFable. When set, the value is coerced to the ctype ofMAT
withCOERCE-TO-CTYPE
. Note that currentlyMREF
always operates on theBACKING-ARRAY
facet so it can trigger copying of facets. When it'sSETF
'ed, however, it will update theCUDA-ARRAY
if cuda is enabled and it is up-to-date or there are no facets at all.
-
[function] ROW-MAJOR-MREF MAT INDEX
Like
ROW-MAJOR-AREF
for arrays. Don't use this if you care about performance at all. SETFable. When set, the value is coerced to the ctype ofMAT
withCOERCE-TO-CTYPE
. Note that currentlyROW-MAJOR-MREF
always operates on theBACKING-ARRAY
facet so it can trigger copying of facets. When it'sSETF
'ed, however, it will update theCUDA-ARRAY
if cuda is enabled and it is up-to-date or there are no facets at all.
-
[function] MAT-ROW-MAJOR-INDEX MAT &REST SUBSCRIPTS
Like
ARRAY-ROW-MAJOR-INDEX
for arrays.
- [variable] *SUPPORTED-CTYPES* (:FLOAT :DOUBLE)
-
[type] CTYPE
This is basically
(MEMBER :FLOAT :DOUBLE)
.
-
[variable] *DEFAULT-MAT-CTYPE* :DOUBLE
By default
MAT
s are created with this ctype. One of:FLOAT
or:DOUBLE
.
-
[function] COERCE-TO-CTYPE X &KEY (CTYPE *DEFAULT-MAT-CTYPE*)
Coerce the scalar
X
to the lisp type corresponding toCTYPE
.
-
[variable] *PRINT-MAT* T
Controls whether the contents of a
MAT
object are printed as an array (subject to the standard printer control variables).
-
[variable] *PRINT-MAT-FACETS* T
Controls whether a summary of existing and up-to-date facets is printed when a
MAT
object is printed. The summary that looks likeABcfh
indicates that all five facets (ARRAY
,BACKING-ARRAY
,CUDA-ARRAY
,FOREIGN-ARRAY
,CUDA-HOST-ARRAY
) are present and the first two are up-to-date. A summary of a single #- indicates that there are no facets.
We are going to discuss various ways to change the visible portion
and dimensions of matrices. Conceptually a matrix has an underlying
non-displaced storage vector. For (MAKE-MAT 10 :DISPLACEMENT 7 :MAX-SIZE 21)
this underlying vector looks like this:
displacement | visible elements | slack
. . . . . . . 0 0 0 0 0 0 0 0 0 0 . . . .
Whenever a matrix is reshaped (or displaced to in lisp terminology), its displacement and dimensions change but the underlying vector does not.
The rules for accessing displaced matrices is the same as always: multiple readers can run in parallel, but attempts to write will result in an error if there are either readers or writers on any of the matrices that share the same underlying vector.
One way to reshape and displace MAT
objects is with MAKE-MAT
and
its DISPLACED-TO
argument whose semantics are similar to that of
MAKE-ARRAY
in that the displacement is relative to the
displacement of DISPLACED-TO
.
(let* ((base (make-mat 10 :initial-element 5 :displacement 1))
(mat (make-mat 6 :displaced-to base :displacement 2)))
(fill! 1 mat)
(values base mat))
==> #<MAT 1+10+0 A #(5.0d0 5.0d0 1.0d0 1.0d0 1.0d0 1.0d0 1.0d0 1.0d0 5.0d0
--> 5.0d0)>
==> #<MAT 3+6+2 AB #(1.0d0 1.0d0 1.0d0 1.0d0 1.0d0 1.0d0)>
There are important semantic differences compared to lisp arrays all which follow from the fact that displacement operates on the underlying conceptual non-displaced vector.
-
Matrices can be displaced and have slack even without
DISPLACED-TO
just likeBASE
in the above example. -
It's legal to alias invisible elements of
DISPLACED-TO
as long as the new matrix fits into the underlying storage. -
Negative displacements are allowed with
DISPLACED-TO
as long as the adjusted displacement is non-negative. -
Further shaping operations can make invisible portions of the
DISPLACED-TO
matrix visible by changing the displacement. -
In contrast to
ARRAY-DISPLACEMENT
,MAT-DISPLACEMENT
only returns an offset into the underlying storage vector.
The following functions are collectively called the functional shaping operations, since they don't alter their arguments in any way. Still, since storage is aliased modification to the returned matrix will affect the original.
-
[function] RESHAPE-AND-DISPLACE MAT DIMENSIONS DISPLACEMENT
Return a new matrix of
DIMENSIONS
that aliasesMAT
's storage at offsetDISPLACEMENT
.DISPLACEMENT
0 is equivalent to the start of the storage ofMAT
regardless ofMAT
's displacement.
-
[function] RESHAPE MAT DIMENSIONS
Return a new matrix of
DIMENSIONS
whose displacement is the same as the displacement ofMAT
.
-
[function] DISPLACE MAT DISPLACEMENT
Return a new matrix that aliases
MAT
's storage at offsetDISPLACEMENT
.DISPLACEMENT
0 is equivalent to the start of the storage ofMAT
regardless ofMAT
's displacement. The returned matrix has the same dimensions asMAT
.
The following destructive operations don't alter the contents of
the matrix, but change what is visible. ADJUST!
is the odd one out,
it may create a new MAT
.
-
[function] RESHAPE-AND-DISPLACE! MAT DIMENSIONS DISPLACEMENT
Change the visible (or active) portion of
MAT
by altering its displacement offset and dimensions. Future operations will only affect this visible portion as if the rest of the elements were not there. ReturnMAT
.DISPLACEMENT
+ the new size must not exceedMAT-MAX-SIZE
. Furthermore, there must be no facets being viewed (withWITH-FACETS
(0
1
)) when calling this function as the identity of the facets is not stable.
-
[function] RESHAPE! MAT DIMENSIONS
Like
RESHAPE-AND-DISPLACE!
but only alters the dimensions.
-
[function] DISPLACE! MAT DISPLACEMENT
Like
RESHAPE-AND-DISPLACE!
but only alters the displacement.
-
[function] RESHAPE-TO-ROW-MATRIX! MAT ROW
Reshape the 2d
MAT
to make only a singleROW
visible. This is made possible by the row-major layout, hence no column counterpart. ReturnMAT
.
-
[macro] WITH-SHAPE-AND-DISPLACEMENT (MAT &OPTIONAL (DIMENSIONS NIL) (DISPLACEMENT NIL)) &BODY BODY
Reshape and displace
MAT
ifDIMENSIONS
and/orDISPLACEMENT
is given and restore the original shape and displacement afterBODY
is executed. If neither is specificed, then nothing will be changed, butBODY
is still allowed to alter the shape and displacement.
-
[function] ADJUST! MAT DIMENSIONS DISPLACEMENT &KEY (DESTROY-OLD-P T)
Like
RESHAPE-AND-DISPLACE!
but creates a new matrix ifMAT
isn't large enough. If a new matrix is created, the contents are not copied over and the old matrix is destroyed withDESTROY-CUBE
ifDESTROY-OLD-P
.
The functions here assemble a single MAT
from a number of
MAT
s.
-
[function] STACK! AXIS MATS MAT
Stack
MATS
alongAXIS
intoMAT
and returnMAT
. IfAXIS
is 0, placeMATS
intoMAT
below each other starting from the top. IfAXIS
is 1, placeMATS
side by side starting from the left. HigherAXIS
are also supported. All dimensions except forAXIS
must be the same for allMATS
.
-
[function] STACK AXIS MATS &KEY (CTYPE *DEFAULT-MAT-CTYPE*)
Like
STACK!
but return a newMAT
ofCTYPE
.(stack 1 (list (make-mat '(3 2) :initial-element 0) (make-mat '(3 1) :initial-element 1))) ==> #<MAT 3x3 B #2A((0.0d0 0.0d0 1.0d0) --> (0.0d0 0.0d0 1.0d0) --> (0.0d0 0.0d0 1.0d0))>
Allocating and initializing a MAT
object and its necessary facets
can be expensive. The following macros remember the previous value
of a binding in the same thread and /place/. Only weak references
are constructed so the cached objects can be garbage collected.
While the cache is global, thread safety is guaranteed by having separate subcaches per thread. Each subcache is keyed by a /place/ object that's either explicitly specified or else is unique to each invocation of the caching macro, so different occurrences of caching macros in the source never share data. Still, recursion could lead to data sharing between different invocations of the same function. To prevent this, the cached object is removed from the cache while it is used so other invocations will create a fresh one which isn't particularly efficient but at least it's safe.
-
[macro] WITH-THREAD-CACHED-MAT (VAR DIMENSIONS &REST ARGS &KEY (PLACE :SCRATCH) (CTYPE '*DEFAULT-MAT-CTYPE*) (DISPLACEMENT 0) MAX-SIZE (INITIAL-ELEMENT 0) INITIAL-CONTENTS) &BODY BODY
Bind
VAR
to a matrix ofDIMENSIONS
,CTYPE
, etc. Cache this matrix, and possibly reuse it later by reshaping it. WhenBODY
exits the cached object is updated with the binding ofVAR
whichBODY
may change.There is a separate cache for each thread and each
PLACE
(underEQ
). Since every cache holds exactly oneMAT
perCTYPE
, nestedWITH-THREAD-CACHED-MAT
often want to use differentPLACE
s. By convention, these places are called:SCRATCH-1
,:SCRATCH-2
, etc.
-
[macro] WITH-THREAD-CACHED-MATS SPECS &BODY BODY
A shorthand for writing nested
WITH-THREAD-CACHED-MAT
calls.(with-thread-cached-mat (a ...) (with-thread-cached-mat (b ...) ...))
is equivalent to:
(with-thread-cached-mat ((a ...) (b ...)) ...)
-
[macro] WITH-ONES (VAR DIMENSIONS &KEY (CTYPE '*DEFAULT-MAT-CTYPE*) (PLACE :ONES)) &BODY BODY
Bind
VAR
to a matrix ofDIMENSIONS
whose every element is 1. The matrix is cached for efficiency.
Only some BLAS functions are implemented, but it should be easy to
add more as needed. All of them default to using CUDA, if it is
initialized and enabled (see USE-CUDA-P
).
Level 1 BLAS operations
-
[function] ASUM X &KEY (N (MAT-SIZE X)) (INCX 1)
Return the l1 norm of
X
, that is, sum of the absolute values of its elements.
-
[function] AXPY! ALPHA X Y &KEY (N (MAT-SIZE X)) (INCX 1) (INCY 1)
Set
Y
toALPHA
*X
+Y
. ReturnY
.
-
[function] COPY! X Y &KEY (N (MAT-SIZE X)) (INCX 1) (INCY 1)
Copy
X
intoY
. ReturnY
.
-
[function] DOT X Y &KEY (N (MAT-SIZE X)) (INCX 1) (INCY 1)
Return the dot product of
X
andY
.
-
[function] NRM2 X &KEY (N (MAT-SIZE X)) (INCX 1)
Return the l2 norm of
X
, which is the square root of the sum of the squares of its elements.
-
[function] SCAL! ALPHA X &KEY (N (MAT-SIZE X)) (INCX 1)
Set
X
toALPHA
*X
. ReturnX
.
Level 3 BLAS operations
-
[function] GEMM! ALPHA A B BETA C &KEY TRANSPOSE-A? TRANSPOSE-B? M N K LDA LDB LDC
Basically
C
=ALPHA
*A
' *B
' +BETA
*C
.A
' isA
or its transpose depending onTRANSPOSE-A?
.B
' isB
or its transpose depending onTRANSPOSE-B?
. ReturnsC
.A
' is an MxK matrix.B
' is a KxN matrix.C
is an MxN matrix.LDA
is the width of the matrixA
(not ofA
'). IfA
is not transposed, thenK
<=LDA
, if it's transposed thenM
<=LDA
.LDB
is the width of the matrixB
(not ofB
'). IfB
is not transposed, thenN
<=LDB
, if it's transposed thenK
<=LDB
.In the example below M=3, N=2, K=5, LDA=6, LDB=3, LDC=4. The cells marked with + do not feature in the calculation.
N --+ --+ K -B+ --+ --+ +++ K -----+ --++ M --A--+ -C++ -----+ --++ ++++++ ++++
-
[function] .SQUARE! X &KEY (N (MAT-SIZE X))
Set
X
to its elementwise square. ReturnX
.
-
[function] .SQRT! X &KEY (N (MAT-SIZE X))
Set
X
to its elementwise square root. ReturnX
.
-
[function] .LOG! X &KEY (N (MAT-SIZE X))
Set
X
to its elementwise natural logarithm. ReturnX
.
-
[function] .EXP! X &KEY (N (MAT-SIZE X))
Apply
EXP
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .EXPT! X POWER
Raise matrix
X
toPOWER
in an elementwise manner. ReturnX
. Note that CUDA and non-CUDA implementations may disagree on the treatment of NaNs, infinities and complex results. In particular, the lisp implementation always computes theREALPART
of the results while CUDA's pow() returns NaNs instead of complex numbers.
-
[function] .INV! X &KEY (N (MAT-SIZE X))
Set
X
to its elementwise inverse(/ 1 X)
. ReturnX
.
-
[function] .LOGISTIC! X &KEY (N (MAT-SIZE X))
Destructively apply the logistic function to
X
in an elementwise manner. ReturnX
.
-
[function] .+! ALPHA X
Add the scalar
ALPHA
to each element ofX
destructively modifyingX
. ReturnX
.
- [function] .*! X Y
-
[function] GEEM! ALPHA A B BETA C
Like
GEMM!
, but multiplication is elementwise. This is not a standard BLAS routine.
-
[function] GEERV! ALPHA A X BETA B
GEneric Elementwise Row - Vector multiplication.
B = beta * B + alpha a .* X*
whereX*
is a matrix of the same shape asA
whose every row isX
. Perform elementwise multiplication on each row ofA
with the vectorX
and add the scaled result to the corresponding row ofB
. ReturnB
. This is not a standard BLAS routine.
-
[function] .<! X Y
For each element of
X
andY
setY
to 1 if the element inY
is greater than the element inX
, and to 0 otherwise. ReturnY
.
-
[function] .MIN! ALPHA X
Set each element of
X
toALPHA
if it's greater thanALPHA
. ReturnX
.
-
[function] .MAX! ALPHA X
Set each element of
X
toALPHA
if it's less thanALPHA
. ReturnX
.
-
[function] ADD-SIGN! ALPHA A BETA B
Add the elementwise sign (-1, 0 or 1 for negative, zero and positive numbers respectively) of
A
timesALPHA
toBETA
*B
. ReturnB
.
-
[function] FILL! ALPHA X &KEY (N (MAT-SIZE X))
Fill matrix
X
withALPHA
. ReturnX
.
-
[function] SUM! X Y &KEY AXIS (ALPHA 1) (BETA 0)
Sum matrix
X
alongAXIS
and addALPHA
*SUM
s toBETA
*Y
destructively modifyingY
. ReturnY
. On a 2d matrix (nothing else is supported currently), ifAXIS
is 0, then columns are summed, ifAXIS
is 1 then rows are summed.
-
[function] SCALE-ROWS! SCALES A &KEY (RESULT A)
Set
RESULT
toDIAG(SCALES)*A
and return it.A
is anMxN
matrix,SCALES
is treated as a lengthM
vector.
-
[function] SCALE-COLUMNS! SCALES A &KEY (RESULT A)
Set
RESULT
toA*DIAG(SCALES)
and return it.A
is anMxN
matrix,SCALES
is treated as a lengthN
vector.
-
[function] .SIN! X &KEY (N (MAT-SIZE X))
Apply
SIN
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .COS! X &KEY (N (MAT-SIZE X))
Apply
COS
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .TAN! X &KEY (N (MAT-SIZE X))
Apply
TAN
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .SINH! X &KEY (N (MAT-SIZE X))
Apply
SINH
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .COSH! X &KEY (N (MAT-SIZE X))
Apply
COSH
elementwise toX
in a destructive manner. ReturnX
.
-
[function] .TANH! X &KEY (N (MAT-SIZE X))
Apply
TANH
elementwise toX
in a destructive manner. ReturnX
.
Finally, some neural network operations.
-
[function] CONVOLVE! X W Y &KEY START STRIDE ANCHOR BATCHED
Y
=Y
+ conv(X
,W
) and returnY
. IfBATCHED
, then the first dimension ofX
andY
is the number of elements in the batch (B), else B is assumed to be 1. The rest of the dimensions encode the input (X
) and output (Y} N dimensional feature maps.START
,STRIDE
andANCHOR
are lists of length N.START
is the multi-dimensional index of the first element of the input feature map (for each element in the batch) for which the convolution must be computed. Then (ELT
STRIDE
(- N 1)) is added to the last element ofSTART
and so on until (ARRAY-DIMENSION
X
1) is reached. Then the last element ofSTART
is reset, (ELT
STRIDE
(- N 2)) is added to the first but last element ofSTART
and we scan the last dimension again. Take a 2d example,START
is (0 0),STRIDE
is (1 2), andX
is a B*2x7 matrix.W
is:1 2 1 2 4 2 1 2 1
and
ANCHOR
is (1 1) which refers to the element ofW
whose value is 4. This anchor point ofW
is placed over elements ofX
whose multi dimensional index is in numbers in this figure (only one element in the batch is shown):0,0 . 0,2 . 0,4 . 0,6 1,0 . 1,2 . 1,4 . 1,6
When applying
W
at position P ofX
, the convolution is the sum of the products of overlapping elements ofX
andW
whenW
'sANCHOR
is placed at P. Elements ofW
over the edges ofX
are multiplied with 0 so are effectively ignored. The order of application ofW
to positions defined bySTART
,STRIDE
andANCHOR
is undefined.Y
must be a B*2x4 (or 2x4 if notBATCHED
) matrix in this example, just large enough to hold the results of the convolutions.
-
[function] DERIVE-CONVOLVE! X XD W WD YD &KEY START STRIDE ANCHOR BATCHED
Add the dF/dX to
XD
and and dF/dW toWD
whereYD
is dF/dY for some function F where Y is the result of convolution with the same arguments.
- [function] MAX-POOL! X Y &KEY START STRIDE ANCHOR BATCHED POOL-DIMENSIONS
-
[function] DERIVE-MAX-POOL! X XD Y YD &KEY START STRIDE ANCHOR BATCHED POOL-DIMENSIONS
Add the dF/dX to
XD
and and dF/dW to WD whereYD
is dF/dY for some function F whereY
is the result ofMAX-POOL!
with the same arguments.
-
[function] COPY-MAT A
Return a copy of the active portion with regards to displacement and shape of
A
.
-
[function] COPY-ROW A ROW
Return
ROW
ofA
as a new 1d matrix.
-
[function] COPY-COLUMN A COLUMN
Return
COLUMN
ofA
as a new 1d matrix.
-
[function] MAT-AS-SCALAR A
Return the first element of
A
.A
must be of size 1.
-
[function] SCALAR-AS-MAT X &KEY (CTYPE (LISP->CTYPE (TYPE-OF X)))
Return a matrix of one dimension and one element:
X
.CTYPE
, the type of the matrix, defaults to the ctype corresponding to the type ofX
.
-
[function] M= A B
Check whether
A
andB
, which must be matrices of the same size, are elementwise equal.
-
[function] TRANSPOSE A
Return the transpose of
A
.
-
[function] M* A B &KEY TRANSPOSE-A? TRANSPOSE-B?
Compute op(
A
) * op(B
). Where op is either the identity or the transpose operation depending onTRANSPOSE-A?
andTRANSPOSE-B?
.
-
[function] MM* M &REST ARGS
Convenience function to multiply several matrices.
(mm* a b c) => a * b * c
-
[function] M- A B
Return
A
-B
.
-
[function] M+ A B
Return
A
+B
.
-
[function] INVERT A
Return the inverse of
A
.
-
[function] LOGDET MAT
Logarithm of the determinant of
MAT
. Return -1, 1 or 0 (or equivalent) to correct for the sign, as the second value.
-
[function] MAP-CONCAT FN MATS MAT &KEY KEY PASS-RAW-P
Call
FN
with each element ofMATS
andMAT
temporarily reshaped to the dimensions of the current element ofMATS
and returnMAT
. For the next element the displacement is increased so that there is no overlap.MATS
is keyed byKEY
just like the CL sequence functions. Normally,FN
is called with the matrix returned byKEY
. However, ifPASS-RAW-P
, then the matrix returned byKEY
is only used to calculate dimensions and the element ofMATS
that was passed toKEY
is passed toFN
, too.(map-concat #'copy! (list (make-mat 2) (make-mat 4 :initial-element 1)) (make-mat '(2 3))) ==> #<MAT 2x3 AB #2A((0.0d0 0.0d0 1.0d0) (1.0d0 1.0d0 1.0d0))>
-
[function] MAP-DISPLACEMENTS FN MAT DIMENSIONS &KEY (DISPLACEMENT-START 0) DISPLACEMENT-STEP
Call
FN
withMAT
reshaped toDIMENSIONS
, first displaced byDISPLACEMENT-START
that's incremented byDISPLACEMENT-STEP
each iteration while there are enough elements left forDIMENSIONS
at the current displacement. ReturnsMAT
.(let ((mat (make-mat 14 :initial-contents '(-1 0 1 2 3 4 5 6 7 8 9 10 11 12)))) (reshape-and-displace! mat '(4 3) 1) (map-displacements #'print mat 4)) .. .. #<MAT 1+4+9 B #(0.0d0 1.0d0 2.0d0 3.0d0)> .. #<MAT 5+4+5 B #(4.0d0 5.0d0 6.0d0 7.0d0)> .. #<MAT 9+4+1 B #(8.0d0 9.0d0 10.0d0 11.0d0)>
-
[function] MAP-MATS-INTO RESULT-MAT FN &REST MATS
Like
CL:MAP-INTO
but forMAT
objects. Destructively modifiesRESULT-MAT
to contain the results of applyingFN
to each element in the argumentMATS
in turn.
Unless noted these work efficiently with CUDA.
-
[generic-function] COPY-RANDOM-STATE STATE
Return a copy of
STATE
be it a lisp or cuda random state.
-
[function] UNIFORM-RANDOM! MAT &KEY (LIMIT 1)
Fill
MAT
with random numbers sampled uniformly from the [0,LIMIT) interval ofMAT
's type.
-
[function] GAUSSIAN-RANDOM! MAT &KEY (MEAN 0) (STDDEV 1)
Fill
MAT
with independent normally distributed random numbers withMEAN
andSTDDEV
.
-
[function] MV-GAUSSIAN-RANDOM &KEY MEANS COVARIANCES
Return a column vector of samples from the multivariate normal distribution defined by
MEANS
(Nx1) andCOVARIANCES
(NxN). No CUDA implementation.
-
[function] ORTHOGONAL-RANDOM! M &KEY (SCALE 1)
Fill the matrix
M
with random values in such a way thatM^T * M
is the identity matrix (or something close ifM
is wide). ReturnM
.
-
[variable] *MAT-HEADERS* T
If true, a header with
MAT-CTYPE
andMAT-SIZE
is written byWRITE-MAT
before the contents andREAD-MAT
checks that these match the matrix into which it is reading.
-
[generic-function] WRITE-MAT MAT STREAM
Write
MAT
to binarySTREAM
in portable binary format. ReturnMAT
. Displacement and size are taken into account, only visible elements are written. Also see*MAT-HEADERS*
.
-
[generic-function] READ-MAT MAT STREAM
Destructively modify the visible portion (with regards to displacement and shape) of
MAT
by readingMAT-SIZE
number of elements from binarySTREAM
. ReturnMAT
. Also see*MAT-HEADERS*
.
The largest class of bugs has to do with synchronization of facets
being broken. This is almost always caused by an operation that
mispecifies the DIRECTION
argument of WITH-FACET
. For example, the
matrix argument of SCAL!
should be accessed with direciton :IO
. But
if it's :INPUT
instead, then subsequent access to the ARRAY
(0
1
) facet
will not see the changes made by AXPY!
, and if it's :OUTPUT
, then
any changes made to the ARRAY
facet since the last update of the
CUDA-ARRAY
facet will not be copied and from the wrong input SCAL!
will compute the wrong result.
Using the SLIME inspector or trying to access the
CUDA-ARRAY
facet from threads other than the one in
which the corresponding CUDA context was initialized will fail. For
now, the easy way out is to debug the code with CUDA disabled (see
*CUDA-ENABLED*
).
Another thing that tends to come up is figuring out where memory is used.
-
[function] MAT-ROOM &KEY (STREAM *STANDARD-OUTPUT*) (VERBOSE T)
Calls
FOREIGN-ROOM
andCUDA-ROOM
.
-
[macro] WITH-MAT-COUNTERS (&KEY COUNT N-BYTES) &BODY BODY
Count all
MAT
allocations and also the number of bytes they may require. May require here really means an upper bound, because(MAKE-MAT (EXPT 2 60))
doesn't actually uses memory until one of its facets is accessed (don't simply evaluate it though, printing the result will access theARRAY
facet if*PRINT-MAT*
). Also, while facets today all require the same number of bytes, this may change in the future. This is a debugging tool, don't use it in production.(with-mat-counters (:count count :n-bytes n-bytes) (assert (= count 0)) (assert (= n-bytes 0)) (make-mat '(2 3) :ctype :double) (assert (= count 1)) (assert (= n-bytes (* 2 3 8))) (with-mat-counters (:n-bytes n-bytes-1 :count count-1) (make-mat '7 :ctype :float) (assert (= count-1 1)) (assert (= n-bytes-1 (* 7 4)))) (assert (= n-bytes (+ (* 2 3 8) (* 7 4)))) (assert (= count 2)))
A MAT
is a CUBE
(see Cube Manual) whose facets are
different representations of numeric arrays. These facets can be
accessed with WITH-FACETS
(0
1
) with one of the following
FACET-NAME
locatives:
-
[facet-name] BACKING-ARRAY
The corresponding facet's value is a one dimensional lisp array or a static vector that also looks exactly like a lisp array but is allocated in foreign memory. See
*FOREIGN-ARRAY-STRATEGY*
.
-
[facet-name] ARRAY
Same as
BACKING-ARRAY
if the matrix is one-dimensional, all elements are visible (see Shaping), else it's a lisp array displaced to the backing array.
-
[facet-name] FOREIGN-ARRAY
The facet's value is a
FOREIGN-ARRAY
which is anOFFSET-POINTER
wrapping a CFFI pointer. See*FOREIGN-ARRAY-STRATEGY*
.
-
[facet-name] CUDA-HOST-ARRAY
This facet's value is a basically the same as that of
FOREIGN-ARRAY
. In fact, they share storage. The difference is that accessingCUDA-HOST-ARRAY
ensures that the foreign memory region is page-locked and registered with the CUDA Driver API function cuMemHostRegister(). Copying between GPU memory (CUDA-ARRAY
) and registered memory is significantly faster than with non-registered memory and also allows overlapping copying with computation. SeeWITH-SYNCING-CUDA-FACETS
.
-
[facet-name] CUDA-ARRAY
The facet's value is a
CUDA-ARRAY
, which is anOFFSET-POINTER
wrapping aCL-CUDA.DRIVER-API:CU-DEVICE-PTR
, allocated withCL-CUDA.DRIVER-API:CU-MEM-ALLOC
and freed automatically.
Facets bound by with WITH-FACETS
(0
1
) are to be treated as dynamic
extent: it is not allowed to keep a reference to them beyond the
dynamic scope of WITH-FACETS
.
For example, to implement the FILL!
operation using only the
BACKING-ARRAY
, one could do this:
(let ((displacement (mat-displacement x))
(size (mat-size x)))
(with-facets ((x* (x 'backing-array :direction :output)))
(fill x* 1 :start displacement :end (+ displacement size))))
DIRECTION
is :OUTPUT
because we clobber all values in X
. Armed
with this knowledge about the direction, WITH-FACETS
will not copy
data from another facet if the backing array is not up-to-date.
To transpose a 2d matrix with the ARRAY
facet:
(destructuring-bind (n-rows n-columns) (mat-dimensions x)
(with-facets ((x* (x 'array :direction :io)))
(dotimes (row n-rows)
(dotimes (column n-columns)
(setf (aref x* row column) (aref x* column row))))))
Note that DIRECTION
is :IO
, because we need the data in this facet
to be up-to-date (that's the input part) and we are invalidating all
other facets by changing values (that's the output part).
To sum the values of a matrix using the FOREIGN-ARRAY
facet:
(let ((sum 0))
(with-facets ((x* (x 'foreign-array :direction :input)))
(let ((pointer (offset-pointer x*)))
(loop for index below (mat-size x)
do (incf sum (cffi:mem-aref pointer (mat-ctype x) index)))))
sum)
See DIRECTION
for a complete description of :INPUT
, :OUTPUT
and :IO
.
For MAT
objects, that needs to be refined. If a MAT
is reshaped
and/or displaced in a way that not all elements are visible then
those elements are always kept intact and copied around. This is
accomplished by turning :OUTPUT
into :IO
automatically on such MAT
s.
We have finished our introduction to the various facets. It must be
said though that one can do anything without ever accessing a facet
directly or even being aware of them as most operations on MAT
s
take care of choosing the most appropriate facet behind the scenes.
In particular, most operations automatically use CUDA, if available
and initialized. See WITH-CUDA*
for detail.
One facet of MAT
objects is FOREIGN-ARRAY
which is
backed by a memory area that can be a pinned lisp array or is
allocated in foreign memory depending on *FOREIGN-ARRAY-STRATEGY*
.
-
[class] FOREIGN-ARRAY
FOREIGN-ARRAY
wraps a foreign pointer (in the sense ofCFFI:POINTERP
). That is, bothOFFSET-POINTER
andBASE-POINTER
return a foreign pointer. There are no other public operations that work withFOREIGN-ARRAY
objects, their sole purpose is represent facets ofMAT
objects.
-
[variable] *FOREIGN-ARRAY-STRATEGY* "-see below-"
One of
:PINNED
,:STATIC
and:CUDA-HOST
(see typeFOREIGN-ARRAY-STRATEGY
). This variable controls how foreign arrays are handled and it can be changed at any time.If it's
:PINNED
(only supported if (PINNING-SUPPORTED-P
), then no separate storage is allocated for the foreign array. Instead, it aliases the lisp array (via theBACKING-ARRAY
facet).If it's
:STATIC
, then the lisp backing arrays are allocated statically via the static-vectors library. On some implementations, explicit freeing of static vectors is necessary, this is taken care of by finalizers or can be controlled withWITH-FACET-BARRIER
.DESTROY-CUBE
andDESTROY-FACET
may also be of help.:CUDA-HOST
is the same as:STATIC
, but any copies to/from the GPU (i.e. theCUDA-ARRAY
facet) will be done via theCUDA-HOST-ARRAY
facet whose memory pages will also be locked and registered withcuMemHostRegister
which allows quicker and asynchronous copying to and from CUDA land.The default is
:PINNED
if available, because it's the most efficient. If pinning is not available, then it's:STATIC
.
-
[type] FOREIGN-ARRAY-STRATEGY
One of
:PINNED
,:STATIC
and:CUDA-HOST
. See*FOREIGN-ARRAY-STRATEGY*
for their semantics.
-
[function] PINNING-SUPPORTED-P
Return true iff the lisp implementation efficiently supports pinning lisp arrays. Pinning ensures that the garbage collector doesn't move the array in memory. Currently this is only supported on SBCL gencgc platforms.
-
[function] FOREIGN-ROOM &KEY (STREAM *STANDARD-OUTPUT*) (VERBOSE T)
Print a summary of foreign memory usage to
STREAM
. IfVERBOSE
, make the output human easily readable, else try to present it in a very concise way. Sample output withVERBOSE
:Foreign memory usage: foreign arrays: 450 (used bytes: 3,386,295,808)
The same data presented with
VERBOSE
false:f: 450 (3,386,295,808)
-
[function] CUDA-AVAILABLE-P &KEY (DEVICE-ID 0)
Check that a cuda context is already in initialized in the current thread or a device with
DEVICE-ID
is available.
-
[macro] WITH-CUDA* (&KEY (ENABLED '*CUDA-ENABLED*) (DEVICE-ID '*CUDA-DEFAULT-DEVICE-ID*) (RANDOM-SEED '*CUDA-DEFAULT-RANDOM-SEED*) (N-RANDOM-STATES '*CUDA-DEFAULT-N-RANDOM-STATES*) N-POOL-BYTES) &BODY BODY
Initializes CUDA with with all bells and whistles before
BODY
and deinitializes it after. Simply wrappingWITH-CUDA*
around a piece code is enough to make use of the first available CUDA device or fall back on blas and lisp kernels if there is none.If CUDA is already initialized, then it sets up a facet barrier which destroys
CUDA-ARRAY
andCUDA-HOST-ARRAY
facets after ensuring that theARRAY
facet is up-to-date.Else, if CUDA is available and
ENABLED
, then in addition to the facet barrier, a CUDA context is set up,*N-MEMCPY-HOST-TO-DEVICE*
,*N-MEMCPY-DEVICE-TO-HOST*
are bound to zero, a cublas handle created, and*CURAND-STATE*
is bound to aCURAND-XORWOW-STATE
withN-RANDOM-STATES
, seeded withRANDOM-SEED
, and allocation of device memory is limited toN-POOL-BYTES
(NIL
means no limit, see CUDA Memory Management).Else - that is, if CUDA is not available,
BODY
is simply executed.
-
[function] CALL-WITH-CUDA FN &KEY ((:ENABLED *CUDA-ENABLED*) *CUDA-ENABLED*) (DEVICE-ID *CUDA-DEFAULT-DEVICE-ID*) (RANDOM-SEED *CUDA-DEFAULT-RANDOM-SEED*) (N-RANDOM-STATES *CUDA-DEFAULT-N-RANDOM-STATES*) N-POOL-BYTES
Like
WITH-CUDA*
, but takes a no argument function instead of the macro'sBODY
.
-
[variable] *CUDA-ENABLED* T
Set or bind this to false to disable all use of cuda. If this is done from within
WITH-CUDA
, then cuda becomes temporarily disabled. If this is done from outsideWITH-CUDA
, then it changes the default values of theENABLED
argument of any futureWITH-CUDA*
s which turns off cuda initialization entirely.
-
[accessor] CUDA-ENABLED MAT (:CUDA-ENABLED = *DEFAULT-MAT-CUDA-ENABLED*)
The control provided by
*CUDA-ENABLED*
can be too coarse. This flag provides a per-object mechanism to turn cuda off. If it is set toNIL
, then any operation that pays attention to this flag will not create or access theCUDA-ARRAY
facet. Implementationally speaking, this is easily accomplished by usingUSE-CUDA-P
.
-
[variable] *DEFAULT-MAT-CUDA-ENABLED* T
The default for
CUDA-ENABLED
.
-
[variable] *N-MEMCPY-HOST-TO-DEVICE* 0
Incremented each time a host to device copy is performed. Bound to 0 by
WITH-CUDA*
. Useful for tracking down performance problems.
-
[variable] *N-MEMCPY-DEVICE-TO-HOST* 0
Incremented each time a device to host copy is performed. Bound to 0 by
WITH-CUDA*
. Useful for tracking down performance problems.
-
[variable] *CUDA-DEFAULT-DEVICE-ID* 0
The default value of
WITH-CUDA*
's:DEVICE-ID
argument.
-
[variable] *CUDA-DEFAULT-RANDOM-SEED* 1234
The default value of
WITH-CUDA*
's:RANDOM-SEED
argument.
-
[variable] *CUDA-DEFAULT-N-RANDOM-STATES* 4096
The default value of
WITH-CUDA*
's:N-RANDOM-STATES
argument.
The GPU (called device in CUDA terminology) has its own memory and it can only perform computation on data in this device memory so there is some copying involved to and from main memory. Efficient algorithms often allocate device memory up front and minimize the amount of copying that has to be done by computing as much as possible on the GPU.
MGL-MAT reduces the cost of device of memory allocations by
maintaining a cache of currently unused allocations from which it
first tries to satisfy allocation requests. The total size of all
the allocated device memory regions (be they in use or currently
unused but cached) is never more than N-POOL-BYTES
as specified in
WITH-CUDA*
. N-POOL-BYTES
being NIL
means no limit.
-
[condition] CUDA-OUT-OF-MEMORY STORAGE-CONDITION
If an allocation request cannot be satisfied (either because of
N-POOL-BYTES
or physical device memory limits being reached), thenCUDA-OUT-OF-MEMORY
is signalled.
-
[function] CUDA-ROOM &KEY (STREAM *STANDARD-OUTPUT*) (VERBOSE T)
When CUDA is in use (see
USE-CUDA-P
), print a summary of memory usage in the current CUDA context toSTREAM
. IfVERBOSE
, make the output human easily readable, else try to present it in a very concise way. Sample output withVERBOSE
:CUDA memory usage: device arrays: 450 (used bytes: 3,386,295,808, pooled bytes: 1,816,657,920) host arrays: 14640 (used bytes: 17,380,147,200) host->device copies: 154,102,488, device->host copies: 117,136,434
The same data presented with
VERBOSE
false:d: 450 (3,386,295,808 + 1,816,657,920), h: 14640 (17,380,147,200) h->d: 154,102,488, d->h: 117,136,434
That's it about reducing the cost allocations. The other important performance consideration, minimizing the amount copying done, is very hard to do if the data doesn't fit in device memory which is often a very limited resource. In this case the next best thing is to do the copying concurrently with computation.
-
[macro] WITH-SYNCING-CUDA-FACETS (MATS-TO-CUDA MATS-TO-CUDA-HOST &KEY (SAFEP '*SYNCING-CUDA-FACETS-SAFE-P*)) &BODY BODY
Update CUDA facets in a possibly asynchronous way while
BODY
executes. Behind the scenes, a separate CUDA stream is used to copy between registered host memory and device memory. WhenWITH-SYNCING-CUDA-FACETS
finishes either by returning normally or by a performing a non-local-exit the following are true:-
All
MAT
s inMATS-TO-CUDA
have an up-to-dateCUDA-ARRAY
facet. -
All
MAT
s inMATS-TO-CUDA-HOST
have an up-to-dateCUDA-HOST-ARRAY
facet and noCUDA-ARRAY
.
It is an error if the same matrix appears in both
MATS-TO-CUDA
andMATS-TO-CUDA-HOST
, but the same matrix may appear any number of times in one of them.If
SAFEP
is true, then the all matrices in either of the two lists are effectively locked for output untilWITH-SYNCING-CUDA-FACETS
finishes. With SAFENIL
, unsafe accesses to facets of these matrices are not detected, but the whole operation has a bit less overhead. -
-
[variable] *SYNCING-CUDA-FACETS-SAFE-P* T
The default value of the
SAFEP
argument ofWITH-SYNCING-CUDA-FACETS
.
Also note that often the easiest thing to do is to prevent the use
of CUDA (and consequently the creation of CUDA-ARRAY
facets, and allocations). This can be done either by binding
*CUDA-ENABLED*
to NIL
or by setting CUDA-ENABLED
to NIL
on specific
matrices.
New operations are usually implemented in lisp, CUDA, or by calling a foreign function in, for instance, BLAS, CUBLAS, CURAND.
-
[macro] DEFINE-LISP-KERNEL (NAME &KEY (CTYPES '(:FLOAT :DOUBLE))) (&REST PARAMS) &BODY BODY
This is very much like
DEFINE-CUDA-KERNEL
but for normal lisp code. It knows how to deal withMAT
objects and can define the same function for multipleCTYPES
. Example:(define-lisp-kernel (lisp-.+!) ((alpha single-float) (x :mat :input) (start-x index) (n index)) (loop for xi of-type index upfrom start-x below (the! index (+ start-x n)) do (incf (aref x xi) alpha)))
Parameters are either of the form
(<NAME> <LISP-TYPE)
or(<NAME> :MAT <DIRECTION>)
. In the latter case, the appropriate CFFI pointer is passed to the kernel.<DIRECTION>
is passed on to theWITH-FACET
that's used to acquire the foreign array. Note that the return type is not declared.Both the signature and the body are written as if for single floats, but one function is defined for each ctype in
CTYPES
by transforming types, constants and code by substituting them with their ctype equivalents. Currently this means that one needs to write only one kernel forSINGLE-FLOAT
andDOUBLE-FLOAT
. All such functions get the declaration from*DEFAULT-LISP-KERNEL-DECLARATIONS*
.Finally, a dispatcher function with
NAME
is defined which determines the ctype of theMAT
objects passed for:MAT
typed parameters. It's an error if they are not of the same type. Scalars declaredSINGLE-FLOAT
are coerced to that type and the appropriate kernel is called.
-
[variable] *DEFAULT-LISP-KERNEL-DECLARATIONS* ((OPTIMIZE SPEED (SB-C:INSERT-ARRAY-BOUNDS-CHECKS 0)))
These declarations are added automatically to kernel functions.
-
[function] USE-CUDA-P &REST MATS
Return true if cuda is enabled (
*CUDA-ENABLED*
), it's initialized and allMATS
haveCUDA-ENABLED
. Operations of matrices use this to decide whether to go for the CUDA implementation or BLAS/Lisp. It's provided for implementing new operations.
-
[function] CHOOSE-1D-BLOCK-AND-GRID N MAX-N-WARPS-PER-BLOCK
Return two values, one suitable as the
:BLOCK-DIM
, the other as the:GRID-DIM
argument for a cuda kernel call where both are one-dimensional (only the first element may be different from 1).The number of threads in a block is a multiple of
*CUDA-WARP-SIZE*
. The number of blocks is between 1 and and*CUDA-MAX-N-BLOCKS*
. This means that the kernel must be able handle any number of elements in each thread. For example, a strided kernel that adds a constant to each element of a lengthN
vector looks like this:(let ((stride (* block-dim-x grid-dim-x))) (do ((i (+ (* block-dim-x block-idx-x) thread-idx-x) (+ i stride))) ((>= i n)) (set (aref x i) (+ (aref x i) alpha))))
It is often the most efficient to have
MAX-N-WARPS-PER-BLOCK
around 4. Note that the maximum number of threads per block is limited by hardware (512 for compute capability < 2.0, 1024 for later versions), so*CUDA-MAX-N-BLOCKS*
timesMAX-N-WARPS-PER-BLOCK
must not exceed that limit.
-
[function] CHOOSE-2D-BLOCK-AND-GRID DIMENSIONS MAX-N-WARPS-PER-BLOCK
Return two values, one suitable as the
:BLOCK-DIM
, the other as the:GRID-DIM
argument for a cuda kernel call where both are two-dimensional (only the first two elements may be different from 1).The number of threads in a block is a multiple of
*CUDA-WARP-SIZE*
. The number of blocks is between 1 and and*CUDA-MAX-N-BLOCKS*
. Currently - but this may change - theBLOCK-DIM-X
is always*CUDA-WARP-SIZE*
andGRID-DIM-X
is always 1.This means that the kernel must be able handle any number of elements in each thread. For example, a strided kernel that adds a constant to each element of a HEIGHT*WIDTH matrix looks like this:
(let ((id-x (+ (* block-dim-x block-idx-x) thread-idx-x)) (id-y (+ (* block-dim-y block-idx-y) thread-idx-y)) (stride-x (* block-dim-x grid-dim-x)) (stride-y (* block-dim-y grid-dim-y))) (do ((row id-y (+ row stride-y))) ((>= row height)) (let ((i (* row width))) (do ((column id-x (+ column stride-x))) ((>= column width)) (set (aref x i) (+ (aref x i) alpha)) (incf i stride-x)))))
-
[function] CHOOSE-3D-BLOCK-AND-GRID DIMENSIONS MAX-N-WARPS-PER-BLOCK
Return two values, one suitable as the
:BLOCK-DIM
, the other as the:GRID-DIM
argument for a cuda kernel call where both are two-dimensional (only the first two elements may be different from 1).The number of threads in a block is a multiple of
*CUDA-WARP-SIZE*
. The number of blocks is between 1 and and*CUDA-MAX-N-BLOCKS*
. Currently - but this may change - theBLOCK-DIM-X
is always*CUDA-WARP-SIZE*
andGRID-DIM-X
is always 1.This means that the kernel must be able handle any number of elements in each thread. For example, a strided kernel that adds a constant to each element of a
THICKNESS
*HEIGHT
*WIDTH
3d array looks like this:(let ((id-x (+ (* block-dim-x block-idx-x) thread-idx-x)) (id-y (+ (* block-dim-y block-idx-y) thread-idx-y)) (id-z (+ (* block-dim-z block-idx-z) thread-idx-z)) (stride-x (* block-dim-x grid-dim-x)) (stride-y (* block-dim-y grid-dim-y)) (stride-z (* block-dim-z grid-dim-z))) (do ((plane id-z (+ plane stride-z))) ((>= plane thickness)) (do ((row id-y (+ row stride-y))) ((>= row height)) (let ((i (* (+ (* plane height) row) width))) (do ((column id-x (+ column stride-x))) ((>= column width)) (set (aref x i) (+ (aref x i) alpha)) (incf i stride-x))))))
-
[macro] DEFINE-CUDA-KERNEL (NAME &KEY (CTYPES '(:FLOAT :DOUBLE))) (RETURN-TYPE PARAMS) &BODY BODY
This is an extended
CL-CUDA:DEFKERNEL
macro. It knows how to deal withMAT
objects and can define the same function for multipleCTYPES
. Example:(define-cuda-kernel (cuda-.+!) (void ((alpha float) (x :mat :input) (n int))) (let ((stride (* block-dim-x grid-dim-x))) (do ((i (+ (* block-dim-x block-idx-x) thread-idx-x) (+ i stride))) ((>= i n)) (set (aref x i) (+ (aref x i) alpha)))))
The signature looks pretty much like in
CL-CUDA:DEFKERNEL
, but parameters can take the form of(<NAME> :MAT <DIRECTION>)
too, in which case the appropriateCL-CUDA.DRIVER-API:CU-DEVICE-PTR
is passed to the kernel.<DIRECTION>
is passed on to theWITH-FACET
that's used to acquire the cuda array.Both the signature and the body are written as if for single floats, but one function is defined for each ctype in
CTYPES
by transforming types, constants and code by substituting them with their ctype equivalents. Currently this means that one needs to write only one kernel forFLOAT
andDOUBLE
.Finally, a dispatcher function with
NAME
is defined which determines the ctype of theMAT
objects passed for:MAT
typed parameters. It's an error if they are not of the same type. Scalars declaredFLOAT
are coerced to that type and the appropriate kernel is called.
In a WITH-CUDA*
BLAS Operations will automatically use CUBLAS. No need to
use these at all.
- [condition] CUBLAS-ERROR ERROR
- [reader] CUBLAS-ERROR-FUNCTION-NAME CUBLAS-ERROR (:FUNCTION-NAME)
- [reader] CUBLAS-ERROR-STATUS CUBLAS-ERROR (:STATUS)
- [variable] *CUBLAS-HANDLE*
- [function] CUBLAS-CREATE HANDLE
- [function] CUBLAS-DESTROY &KEY (HANDLE *CUBLAS-HANDLE*)
- [macro] WITH-CUBLAS-HANDLE NIL &BODY BODY
- [function] CUBLAS-GET-VERSION VERSION &KEY (HANDLE *CUBLAS-HANDLE*)
This the low level CURAND API. You probably want Random numbers instead.
- [macro] WITH-CURAND-STATE (STATE) &BODY BODY
- [variable] *CURAND-STATE*
- [class] CURAND-XORWOW-STATE
- [reader] N-STATES CURAND-XORWOW-STATE (:N-STATES)
- 1 Links
- 2 Introduction
- 3 Basics
- 4 Synchronization
- 5 Facets
- 6 Facet Extension API
- 7 The Default Implementation of
CALL-WITH-FACET*
- 8 Lifetime
Here is the official repository and the HTML documentation for the latest version.
This is the library on which MGL-MAT (see MAT Manual) is built. The idea of automatically translating between various representations may be useful for other applications, so this got its own package and all ties to MGL-MAT has been severed.
This package defines CUBE
, an abstract base class that provides a
framework for automatic conversion between various representations
of the same data. To define a cube, CUBE
needs to be subclassed and
the Facet Extension API be implemented.
If you are only interested in how to use cubes in general, read Basics, Lifetime and Facet Barriers.
If you want to implement a new cube datatype, then see Facets,
Facet Extension API, and The Default Implementation of CALL-WITH-FACET*
.
Here we learn what a CUBE
is and how to access the data in it with
WITH-FACET
.
-
[class] CUBE
A datacube that has various representations of the same stuff. These representations go by the name `facet'. Clients must use
WITH-FACET
to acquire a dynamic extent reference to a facet. With the information provided in theDIRECTION
argument ofWITH-FACET
, the cube keeps track of which facets are up-to-date and copies data between them as necessary.The cube is an abstract class, it does not provide useful behavior in itself. One must subclass it and implement the Facet Extension API.
Also see Lifetime and Facet Barriers.
-
[macro] WITH-FACET (VAR (CUBE FACET-NAME &KEY (DIRECTION :IO) TYPE)) &BODY BODY
Find or create the facet with
FACET-NAME
inCUBE
and bindVAR
to the representation ofCUBE
's data provided by that facet. This representation is called the facet's value. The value is to be treated as dynamic extent: it is not allowed to keep a reference to it. For the description of theDIRECTION
parameter, see the typeDIRECTION
.If
TYPE
is specified, thenVAR
is declared to be of that type.
-
[type] DIRECTION
Used by
WITH-FACET
,DIRECTION
can be:INPUT
,:OUTPUT
or:IO
.-
:INPUT
promises that the facet will only be read and never written. Other up-to-date facets of the same cube remain up-to-date. If the facet in question is not up-to-date then data is copied to it from one of the up-to-date facets (seeSELECT-COPY-SOURCE-FOR-FACET*
). -
:OUTPUT
promises that all data will be overwritten without reading any data. All up-to-date facets become non-up-to-date, while this facet is marked as up-to-date. No copying of data takes place. -
:IO
promises nothing about the type of access. All up-to-date facets become non-up-to-date, while this facet is marked as up-to-date. If the facet in question is not up-to-date then data is copied to it from one of the up-to-date facets (seeSELECT-COPY-SOURCE-FOR-FACET*
).
Any number of
WITH-FACET
s with direction:INPUT
may be active at the same time, but:IO
and:OUTPUT
cannot coexists with anotherWITH-FACET
regardless of the direction. The exception for this rule is that an innerWITH-FACET
does not conflict with an enclosingWITH-FACET
if they are for the same facet (but innerWITH-FACET
s for another facet or for the same facet from another thread do).See
CHECK-NO-WRITERS
andCHECK-NO-WATCHERS
called by The Default Implementation ofCALL-WITH-FACET*
. -
-
[macro] WITH-FACETS (&REST FACET-BINDING-SPECS) &BODY BODY
A shorthand for writing nested
WITH-FACET
calls.(with-facet (f1 (c1 'name1 :direction :input)) (with-facet (f2 (c2 'name2 :direction :output)) ...))
is equivalent to:
(with-facets ((f1 (c1 'name1 :direction :input)) (f2 (c2 'name2 :direction :output))) ...)
Cubes keep track of which facets are used, which are up-to-date to
be able to perform automatic translation between facets. WITH-FACET
and other operations access and make changes to this metadata so
thread safety is a concern. In this section, we detail how to relax
the default thread safety guarantees.
A related concern is async signal safety which arises most often when C-c'ing or killing a thread or when the extremely nasty WITH-TIMEOUT macro is used. In a nutshell, changes to cube metadata are always made with interrupts disabled so things should be async signal safe.
-
[accessor] SYNCHRONIZATION CUBE (:SYNCHRONIZATION = *DEFAULT-SYNCHRONIZATION*)
By default, setup and teardown of facets by
WITH-FACET
is performed in a thread safe way. Corrupting internal data structures of cubes is not fun, but in the name of performance, synchronization can be turned off either dynamically or on a per instance basis.If
T
, then access to cube metadata is always synchronized. IfNIL
, then never. If:MAYBE
, then whether access is synchronized is determined by*MAYBE-SYNCHRONIZE-CUBE*
that's true by default.The default is the value of
*DEFAULT-SYNCHRONIZATION*
that's:MAYBE
by default.Note that the body of a
WITH-FACET
is never synchronized with anyone, apart from the implicit reader/writer conflict (seeDIRECTION
).
-
[variable] *DEFAULT-SYNCHRONIZATION* :MAYBE
The default value for
SYNCHRONIZATION
of new cubes.
-
[variable] *MAYBE-SYNCHRONIZE-CUBE* T
Determines whether access the cube metadata is synchronized for cubes with
SYNCHRONIZATION
:MAYBE
.
The basic currency for implementing new cube types is the FACET
.
Simply using a cube only involves facet names and values, never
facets themselves.
-
[function] FACETS CUBE
Return the facets of
CUBE
.
-
[function] FIND-FACET CUBE FACET-NAME
Return the facet of
CUBE
for the facet withFACET-NAME
orNIL
if no such facet exists.
-
[class] FACET STRUCTURE-OBJECT
A cube has facets, as we discussed in Basics. Facets holds the data in a particular representation, this is called the value of the facet. A facet holds one such value and some metadata pertaining to it: its
FACET-NAME
(0
1
), whether it's up-to-date (FACET-UP-TO-DATE-P
), etc.FACET
objects are never seen when simply using a cube, they are for implementing the Facet Extension API.
-
[structure-accessor] FACET-NAME
A symbol that uniquely identifies the facet within a cube.
-
[structure-accessor] FACET-VALUE
This is what's normally exposed by
WITH-FACET
.
-
[structure-accessor] FACET-DESCRIPTION
Returned by
MAKE-FACET*
as its second value, this is an arbitrary object in which additional information can be stored.
-
[structure-accessor] FACET-UP-TO-DATE-P
Whether the cube has changed since this facet has been last updated. See
FACET-UP-TO-DATE-P*
.
-
[structure-accessor] FACET-N-WATCHERS
The number of active
WITH-FACET
s. Updated byWATCH-FACET
andUNWATCH-FACET
.
-
[structure-accessor] FACET-WATCHER-THREADS
The threads (one for each watcher) that have active
WITH-FACET
s.
-
[structure-accessor] FACET-DIRECTION
The direction of the last
WITH-FACET
on this facet.
Many of the generic functions in this section take FACET
arguments.
FACET
is a structure and is not intended to be subclassed. To be
able to add specialized methods, the name of the
facet (FACET-NAME
) is also passed as the
argument right in front of the corresponding facet argument.
In summary, define EQL
(0
1
) specializers on facet name arguments, and use
FACET-DESCRIPTION
to associate arbitrary information with facets.
-
[generic-function] MAKE-FACET* CUBE FACET-NAME
Called by
WITH-FACET
(or more directlyWATCH-FACET
) when there is no facet withFACET-NAME
. As the first value, return a new object capable of storingCUBE
's data in the facet withFACET-NAME
. As the second value, return a facet description which will be available asFACET-DESCRIPTION
. As the third value, return a generalized boolean indicating whether this facet must be explicitly destroyed (in which case a finalizer will be added toCUBE
).
-
[generic-function] DESTROY-FACET* FACET-NAME FACET
Free the resources associated with
FACET
withFACET-NAME
. The cube this facet belongs to is not among the parameters because this method can be called from a finalizer on the cube (so we can't have a reference to the cube portably) which also means that it may run in an unpredictable thread.
-
[generic-function] COPY-FACET* CUBE FROM-FACET-NAME FROM-FACET TO-FACET-NAME TO-FACET
Copy the
CUBE
's data fromFROM-FACET
withFROM-FACET-NAME
toTO-FACET
withTO-FACET-NAME
. Called byWITH-FACET
(or more directlyWATCH-FACET
) when necessary.FROM-FACET
is whatSELECT-COPY-SOURCE-FOR-FACET*
returned.
-
[generic-function] CALL-WITH-FACET* CUBE FACET-NAME DIRECTION FN
Call
FN
with an up-to-dateFACET-VALUE
that belongs toFACET-NAME
ofCUBE
.WITH-FACET
is directly implemented in terms of this function. See The Default Implementation ofCALL-WITH-FACET*
for the gory details.Specializations will most likely want to call the default implementation (with
CALL-NEXT-METHOD
) but with a lambda that transformsFACET-VALUE
before passing it on toFN
.
-
[generic-function] FACET-UP-TO-DATE-P* CUBE FACET-NAME FACET
Check if
FACET
withFACET-NAME
has been updated since the latest change toCUBE
(that is, since the access to other facets withDIRECTION
of:IO
or:OUTPUT
). The default method simply callsFACET-UP-TO-DATE-P
onFACET
.One reason to specialize this is when some facets actually share common storage, so updating one make the other up-to-date as well.
-
[generic-function] SELECT-COPY-SOURCE-FOR-FACET* CUBE TO-NAME TO-FACET
Called when
TO-FACET
withTO-NAME
is about to be updated by copying data from an up-to-date facet. Return the facet (or its name) from which data shall be copied. Note that if the returned facet is notFACET-UP-TO-DATE-P
, then it will be updated first and another SELECT-COPY-SOURCE-FOR-FACET will take place, so be careful not to get into endless recursion. The default method simply returns the first up-to-date facet.
PAX integration follows, don't worry about it if you don't use PAX,
but you really should (see MGL-PAX::@PAX-MANUAL
).
-
[locative] FACET-NAME
The
FACET-NAME
locative is to refer to stuff defined withDEFINE-FACET-NAME
.
-
[macro] DEFINE-FACET-NAME SYMBOL LAMBDA-LIST &BODY DOCSTRING
Just a macro to document that
SYMBOL
refers to a facet name (as in theFACET-NAME
). This is totally confusing, so here is an example of how MGL-MAT (see MAT Manual) documents theMGL-MAT:BACKING-ARRAY
facet:(define-facet-name backing-array () "The corresponding facet is a one dimensional lisp array.")
Which makes it possible to refer to this definition (refer as in link and
M-.
to)MGL-MAT:BACKING-ARRAY
facet-name. SeeMGL-PAX::@PAX-MANUAL
for more.
Also see The Default Implementation of CALL-WITH-FACET*
.
-
[method] CALL-WITH-FACET* (CUBE CUBE) FACET-NAME DIRECTION FN
The default implementation of
CALL-WITH-FACET*
is defined in terms of theWATCH-FACET
and theUNWATCH-FACET
generic functions. These can be considered part of the Facet Extension API.
-
[generic-function] WATCH-FACET CUBE FACET-NAME DIRECTION
This is what the default
CALL-WITH-FACET*
method, in terms of whichWITH-FACET
is implemented, calls first. The default method takes care of creating facets, copying and tracking up-to-dateness.Calls
CHECK-NO-WRITERS
(unless*LET-INPUT-THROUGH-P*
) andCHECK-NO-WATCHERS
(unless*LET-OUTPUT-THROUGH-P*
) depending onDIRECTION
to detect situations with a writer being concurrent to readers/writers because that would screw up the tracking of up-to-dateness.The default implementation should suffice most of the time. MGL-MAT specializes it to override the
DIRECTION
arg, if it's:OUTPUT
but not all elements are visible due to reshaping, so that invisible elements are still copied over.
-
[generic-function] UNWATCH-FACET CUBE FACET-NAME
This is what the default
CALL-WITH-FACET*
method, in terms of whichWITH-FACET
is implemented, calls last. The default method takes care of taking down facets. External resource managers may want to hook into this to handle unused facets.
-
[variable] *LET-INPUT-THROUGH-P* NIL
If true,
WITH-FACETS
(0
1
) (more precisely, the default implementation ofCALL-WITH-FACET*
) with:DIRECTION
:INPUT
does not callCHECK-NO-WRITERS
. This knob is intended to be bound locally for debugging purposes.
-
[variable] *LET-OUTPUT-THROUGH-P* NIL
If true,
WITH-FACETS
(0
1
) (more precisely, the default implementation ofCALL-WITH-FACET*
) with:DIRECTION
:IO
or:OUTPUT
does not callCHECK-NO-WATCHERS
. This knob is intended to be bound locally for debugging purposes.
-
[function] CHECK-NO-WRITERS CUBE FACET-NAME MESSAGE-FORMAT &REST MESSAGE-ARGS
Signal an error if
CUBE
has facets (with names other thanFACET-NAME
) being written (i.e. direction is:IO
or:OUTPUT
).
-
[function] CHECK-NO-WATCHERS CUBE FACET-NAME MESSAGE-FORMAT &REST MESSAGE-ARGS
Signal an error if
CUBE
has facets (with names other thanFACET-NAME
) being regardless of the direction.
Lifetime management of facets is manual (but facets of garbage
cubes are freed automatically by a finalizer, see MAKE-FACET*
). One
may destroy a single facet or all facets of a cube with
DESTROY-FACET
and DESTROY-CUBE
, respectively. Also see
Facet Barriers.
-
[function] DESTROY-FACET CUBE FACET-NAME
Free resources associated with the facet with
FACET-NAME
and remove it fromFACETS
(0
1
) ofCUBE
.
-
[function] DESTROY-CUBE CUBE
Destroy all facets of
CUBE
withDESTROY-FACET
.
In some cases it is useful to declare the intent to use a facet in
the future to prevent its destruction. Hence, every facet has
reference count which starts from 0. The reference count is
incremented and decremented by ADD-FACET-REFERENCE-BY-NAME
and
REMOVE-FACET-REFERENCE-BY-NAME
, respectively. If it is positive,
then the facet will not be destroyed by explicit DESTROY-FACET
and
DESTROY-CUBE
calls, but it will still be destroyed by the finalizer
to prevent resource leaks caused by stray references.
-
[function] ADD-FACET-REFERENCE-BY-NAME CUBE FACET-NAME
Make sure
FACET-NAME
exists onCUBE
and increment its reference count. Return theFACET
behindFACET-NAME
.
-
[function] REMOVE-FACET-REFERENCE-BY-NAME CUBE FACET-NAME
Decrement the reference count of the facet with
FACET-NAME
ofCUBE
. It is an error if the facet does not exists or if the reference count becomes negative.
-
[function] REMOVE-FACET-REFERENCE FACET
Decrement the reference count of
FACET
. It is an error if the facet is already destroyed or if the reference count becomes negative. This function has the same purpose asREMOVE-FACET-REFERENCE-BY-NAME
, but by having a singleFACET
argument, it's more suited for use in finalizers because it does not keep the wholeCUBE
alive.
A facility to control lifetime of facets tied to a dynamic extent. Also see Lifetime.
-
[macro] WITH-FACET-BARRIER (CUBE-TYPE ENSURES DESTROYS) &BODY BODY
When
BODY
exits, destroy facets which:-
are of cubes with
CUBE-TYPE
-
have a facet name among
DESTROYS
-
were created in the dynamic extent of
BODY
Before destroying the facets, it is ensured that facets with names among
ENSURES
are up-to-date.WITH-FACET-BARRIER
s can be nested, in case of multiple barriers matching the cube's type and the created facet's name, the innermost one takes precedence.The purpose of this macro is twofold. First, it makes it easy to temporarily work with a certain facet of many cubes without leaving newly created facets around. Second, it can be used to make sure that facets whose extent is tied to some dynamic boundary (such as the thread in which they were created) are destroyed.
-
-
[function] COUNT-BARRED-FACETS FACET-NAME &KEY (TYPE 'CUBE)
Count facets with
FACET-NAME
of cubes ofTYPE
which will be destroyed by a facet barrier.