Save/Load Extensions for Neanderthal
Neanderthal is a very fast clojure matrix library but if you need to run longer computations it might not be enough to keep everything in memory. Often you need to save some results, resume when ready and continue from that point on. Or you might want to free up some memory but you don't want to discard anything computed so far.
That's where neanderthal-stick comes to help. It allows you to take your vector or a matrix and stick it to somewhere - a disk file, a network stream, somewhere durable.
The API is simple yet flexible enough to support some perks Neanderthal gives you for free - like automatically working with your GPU just by swapping a factory.
Add the necessary dependency to your project:
[neanderthal-stick "0.4.0]
The basic usage is just to
save!
something to a stream and
load!
it back later. That's it!.
For reference documentation refer to the project github pages.
Note: The API is very early stage and subject to change thus it is
in the expertimental
namespace.
If the final destination is a standard file and you only need to save or load
one thing you can use the convenience functions save-to-file!
and
load-from-file!
respectively to handle opening and closing the file for you.
(require '[uncomplicate.commons.core :refer [let-release with-release release]]
'[uncomplicate.neanderthal.native :refer [native-double dv dge dtr]]
'[uncomplicate.neanderthal.core :as core :refer [transfer!]]
'[neanderthal-stick.core :as stick]
'[neanderthal-stick.experimental :as exp])
; setup an array of 100 million one-by-one increasing numbers
(def src-vctr ^doubles (double-array (long 1e8)))
(dotimes [i (int 1e8)]
(aset ^doubles src-vctr i (double (inc i))))
; and fill a proper neanderthal vector
(def dnative (transfer! src-vctr (dv (long 1e8))))
(exp/save-to-file! dnative "/tmp/dnative.bin")
;=> nil
(exp/load-from-file! "/tmp/dnative.bin")
;=> #RealBlockVector[double, n:100000000, offset: 0, stride:1]
;[1.0 2.0 3.0 ⋯ 1.00E+8 1.00E+8 ]
If you need to handle vectors or matrices that reside in the GPU memory
you have to require the proper namespace (opencl
or cuda
) to let
it initialize.
When saving there is no other setup to do:
(require '[uncomplicate.clojurecl.core :as clojurecl :refer [*context* *command-queue* finish!]]
'[uncomplicate.neanderthal.opencl :refer [with-default-engine opencl-double]]
'[neanderthal-stick.opencl])
; use ancient OpenCL 1.x to be safe
(clojurecl/with-default-1
(with-default-engine
(let [factory (opencl-double *context* *command-queue*)]
(with-release [clge (core/ge factory 10000 10000 src-vctr)]
(exp/save-to-file! clge "/tmp/ge.bin")))))
;=> nil
and you can load the saved structure normally:
(exp/load-from-file! "/tmp/ge.bin")
;=> #RealGEMatrix[double, mxn:10000x10000, layout:column, offset:0]
; ▥ ↓ ↓ ↓ ↓ ↓ ┓
; → 1.0 1.00E+4 ⁙ 1.00E+8 1.00E+8
; → 2.0 1.00E+4 ⁙ 1.00E+8 1.00E+8
; → ⁙ ⁙ ⁙ ⁙ ⁙
; → 1.00E+4 2.00E+4 ⁙ 1.00E+8 1.00E+8
; → 1.00E+4 2.00E+4 ⁙ 1.00E+8 1.00E+8
; ┗ ┛
Note that you should handle loaded objects the same way you normally
do with neanderthal structures. For instance you should release
the ge
matrix above to free the associated direct buffers
(outside the JVM memory heap).
Loading back to host memory isn't often enough. Neanderthal-stick allows
you to switch contexts as needed (even from OpenCL to CUDA) by providing
a factory to the load function - the factory used when saving isn't
important. The entry data type (double
or float
) must match though.
(require '[uncomplicate.clojurecuda.core :as clojurecuda :refer [current-context with-context default-stream]]
'[uncomplicate.neanderthal.cuda :refer [cuda-float cuda-double]]
'[neanderthal-stick.cuda])
(clojurecuda/with-default
(let [factory (cuda-double (current-context) default-stream)]
(with-release [cuge (exp/load-from-file! factory "/tmp/ge.bin")]
(prn cuge))))
;#CUGEMatrix[double, mxn:10000x10000, layout:column, offset:0]
; ▥ ↓ ↓ ↓ ↓ ↓ ┓
; → 1.0 1.00E+4 ⁙ 1.00E+8 1.00E+8
; → 2.0 1.00E+4 ⁙ 1.00E+8 1.00E+8
; → ⁙ ⁙ ⁙ ⁙ ⁙
; → 1.00E+4 2.00E+4 ⁙ 1.00E+8 1.00E+8
; → 1.00E+4 2.00E+4 ⁙ 1.00E+8 1.00E+8
; ┗ ┛
Neanderthal-stick defines optional custom type extensions to Nippy for user-facing neanderthal types.
To use it it is sufficient to require Nippy's main namespace as well as the nippy-ext
namespace. From that point on you can use it with Nippy as any other built-in Clojure type.
(require '[taoensso.nippy :as nippy]
'[neanderthal-stick.nippy-ext :refer [with-real-factory]])
(def mytr (core/tr native-double 10 (range 50) {:diag :unit}))
(def mytr-frozen-bytes (nippy/freeze mytr))
(def mytr-copy (nippy/thaw mytr-frozen-bytes))
mytr-copy
;=> #RealUploMatrix[double, type:tr, mxn:10x10, layout:column, offset:0]
; ▥ ↓ ↓ ↓ ↓ ↓ ─
; → ·1· * * * * ⋯
; → 1.00 ·1· * * *
; → 2.00 10.00 ·1· * *
; → 3.00 11.00 18.00 ·1· *
; → 4.00 12.00 19.00 25.00 ·1·
; | ⋮ ⋱
(= mytr mytr-copy)
;=> true
You can use the Nippy extension including composite structures
with the caveat that you still need to manuallyrelease
neanderthal types
when finished.
(def composite {:single (dv 1.0)
:double (dv 1.0 2.0)
:hexa (dge 3 2 (range 1 7) (:layout :row))})
(def frozen-composite (nippy/freeze composite))
(nippy/thaw frozen-composite)
;=> {:single #RealBlockVector[double, n:1, offset: 0, stride:1]
;[ 1.00 ]
;, :double #RealBlockVector[double, n:2, offset: 0, stride:1]
;[ 1.00 2.00 ]
;, :hexa #RealGEMatrix[double, mxn:3x2, layout:column, offset:0]
; ▥ ↓ ↓ ┓
; → 1.00 4.00
; → 2.00 5.00
; → 3.00 6.00
; ┗ ┛
;}
There is no way how to pass a parameter through the Nippy's serialization engine
so if you need a different factory to create neanderthal types you need to
set a dynamic binding with the help of macro with-real-factory
.
The same caveat regarding release
applies. If you really need to
save and load composite types it is probably easier to create a new
type that also implements uncomplicate.commons.core/Releaseable
and Nippy's protocols.
(clojurecuda/with-default
(with-real-factory (cuda-double (current-context) default-stream)
(let [{:keys [single double hexa] :as composite} (nippy/thaw frozen-composite)]
(prn composite)
(release single)
(release double)
(release hexa))))
;{:single #CUBlockVector[double, n:1, offset:0 stride:1]
;[ 1.00 ]
;, :double #CUBlockVector[double, n:2, offset:0 stride:1]
;[ 1.00 2.00 ]
;, :hexa #CUGEMatrix[double, mxn:3x2, layout:column, offset:0]
; ▥ ↓ ↓ ┓
; → 1.00 4.00
; → 2.00 5.00
; → 3.00 6.00
; ┗ ┛
;}
;=> true
For performance reasons you can only save structures that provide a dense vector view. The rule of the thumb is that if its stride is bigger than one you have to make a copy.
(with-release [bigm (dge 10000 10000 src-vctr)
small-view (core/submatrix bigm 3 2)]
(exp/save-to-file! small-view "/tmp/small_ge_view.bin"))
;Execution error (ExceptionInfo) at uncomplicate.neanderthal.internal.host.buffer_block.RealGEMatrix/view_vctr (buffer_block.clj:848).
;Strided GE matrix cannot be viewed as a dense vector.
(with-release [bigm (dge 10000 10000 src-vctr)
small-view (core/submatrix bigm 3 2)
small-copy (dge 3 2 small-view)]
(exp/save-to-file! small-copy "/tmp/small_ge_view.bin"))
=> nil
(with-release [small-view (exp/load-from-file!"/tmp/small_ge_view.bin" )]
(prn small-view))
;#RealGEMatrix[double, mxn:3x2, layout:column, offset:0]
; ▥ ↓ ↓ ┓
; → 1.0 1.00E+4
; → 2.0 1.00E+4
; → 3.0 1.00E+4
; ┗ ┛
;=> nil
Split-saving is an advanced feature. It allows you save the data descriptor (shape) into one place and the actual contents into another.
When loading a split-saved structure its empty skeleton is created
(with the correct parameters) and you need to transfer!
the data
back to it. As a source you can use a different object or even a data
stream. The transmission size is adjusted automatically according
to the destination capacity.
(require '[clojure.java.io :as io])
(import '[java.io DataOutputStream DataInputStream])
(def myge (dge 3 2 src-vctr {:layout :row}))
;=> #'user/myge
myge
;=> #RealGEMatrix[double, mxn:3x2, layout:row, offset:0]
; ▤ ↓ ↓ ┓
; → 1.00 2.00
; → 3.00 4.00
; → 5.00 6.00
; ┗ ┛
(stick/describe myge)
;=> {:entry-type :double, :matrix-type :ge, :n 2, :m 3, :options {:layout :row}}
(exp/save-to-file! myge "/tmp/ge_desc_only.bin" {:omit-data true})
;=> nil
(with-open [out-data (DataOutputStream. (io/output-stream (io/file "/tmp/data.bin")))]
(transfer! myge out-data))
;=> #object[java.io.DataOutputStream 0x1379d050 "java.io.DataOutputStream@1379d050"]
(def empty-ge-copy (exp/load-from-file! "/tmp/ge_desc_only.bin"))
;=> #'user/myge-copy
empty-ge-copy
;=> #RealGEMatrix[double, mxn:3x2, layout:row, offset:0]
; ▤ ↓ ↓ ┓
; → 0.00 0.00
; → 0.00 0.00
; → 0.00 0.00
; ┗ ┛
(with-open [in-data (DataInputStream. (io/input-stream (io/file "/tmp/data.bin")))]
(transfer! in-data empty-ge-copy))
;=> #RealGEMatrix[double, mxn:3x2, layout:row, offset:0]
; ▤ ↓ ↓ ┓
; → 1.00 2.00
; → 3.00 4.00
; → 5.00 6.00
; ┗ ┛
If you need to define your own structures (new structure kinds) you can still have it supported by neanderthal-stick but it is a little bit more involved.
There is an example in the repository: examples/def-new-kind.
TLDR; you need to define a new self-description ability, a constructor
from that descriptor and the actual content save and load transfer!
multimethods.
Copyright © 2019-2020 Kamil Toman
This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.
This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.