Releases: nlpodyssey/spago
v1.1.0
Changed
- Replace sync.Cond with a close of channel in the broadcasting mechanism of autograd ag package
- Refactor Matrix interface to extend from Tensor
- Replace ag.Node and ag.DualValue with mat.Tensor
- Breaking change: New single NewDense function accepting functional options (WithShape, WithBacking, WithGrad)
- Refactor BaseParam struct to embed mat.Matrix
- Remove Variable and enable gradient accumulation in Matrix
- Revise backward method and eliminate truncated backpropagation
- Backward function to require explicit output gradients to be accumulated beforehand
- Move package fn out of package ag and rename it to gradfn as mat subpackage
Removed
- Remove the parameter name to simplify the code and reduce unnecessary complexity (e.g., Introspect())
- Remove concurrent matrix-vector multiplication as it requires more careful considerations
- Remove the export feature in dot format
- Remove "spago" tag annotations on embeddings
v1.0.1
Added
- Method
embeddings.Model.EmbeddingFast
.
v1.0.0
First stable release!
Fixed
- Fix bug preventing the embeddings model from being traversed on
nn.Apply
. - Fix incorrect use of self-attention cache when used for cross-attention.
Changed
- Optimize implementation of some Dense matrix functions, especially on
amd64 with AVX.
v1.0.0-alpha
With this release we introduce breaking changes that bring significant
improvements to the project's structure, API and performance.
It would be difficult and confusing to list every single API change. Instead,
the following sections will broadly describe the most relevant changes,
arranged by topic.
Project structure
Until this release, the project was essentially a monorepo in disguise: the
core packages for handling matrices and computational graphs were accompanied
by many models implementations (from the very simple up to the most
sophisticated ones) and commands (models management utilities and servers).
We now prefer to keep in this very repository only the core components of spaGO,
only enriched with an (opinionated) set of popular models and functionalities.
Bigger sub-packages and related commands are moved to separate repositories.
The moved content includes, most notably, code related to Transformers
and Flair.
Please refer to the section Projects using Spago from the README for an
updated list of references to separate projects (note: some of them are still
work in progress).
If you have the feeling that something big is missing in spaGO, chances are
it was moved to one of these separate projects: just have a look there first.
The arrangement of packages has been simplified: there's no need anymore to
distinguish between cmd
and pkg
; all the main subpackages are located in
the project's root path. Similarly, many packages, previously nested under
pkg/ml
, can now be found at root level too.
Go version and dependencies
The minimum required Go version is 1.18
, primarily needed for the
introduction of type parameters (generics).
Thanks to the creation of separate projects, discussed above, and further
refactoring, the main set of required dependencies is limited to the ones
for testing.
Only the subpackage embeddings/store/diskstore
requires something more, so
we defined it as "opt-in" submodule, with its own dependencies.
float32 vs. float64
Instead of separate packages mat32
and mat64
, there is now a single unified
package mat
. Many parts of the implementation make use of type parameters
(generics), however the package's public API makes a rather narrow use of them.
In particular, we abstained from adding type parameters to widely-used types,
such as the Matrix
interface. Where suitable, we are simply favoring float64
values, the de-facto preferred floating point type in Go (just think about Go
math
package). For other situations, we introduced a new subpackage
mat/float
. It provides simple types, holding either float32
or float64
values, as scalars or slices, and makes it easy to convert values between
different precisions, all without making explicit use of generics.
This design prevents the excessive spreading of type arguments to tons of other
types that need to manipulate matrices, bot from other spaGO packages and
from your own code.
Matrices
- The type
mat.Matrix
is the primary interface for matrices and vectors
throughout the project. - The type
mat.Dense
is the concrete implementation for a dense matrix.
Unlike the interface, it has a type argument to distinguish betweenfloat32
andfloat64
. - We removed implementation and support for sparse matrices, since their
efficacy and utility were marginal. A better implementation might come back
in the future. - A new dense matrix can be created "from scratch" by calling one of the several
functionsmat.New***
(NewDense
,NewVecDense
, ...). Here you must choose
which data type to use, specifying it as type parameter (unless implicit). - Once you have an existing matrix, you can create new instances preserving
the same data type of the initial one: simply use one of theNew***
methods
on the matrix instance itself, rather than their top-level function
counterparts. - Any other operation performed on a matrix that creates a new instance will
operate with the same type of the receiver, and returns an instance of that
type too. - Operations with matrices of different underlying data types are allowed, just
beware the memory and computation overheads introduced by the necessary
conversions.
Auto-grad package
- The package
ag
now implicitly works in "define-by-run" mode only.
It's way more performant compared to the previous releases, and there would
be no significant advantage in re-using a pre-defined graph ("define-and-run"). - There is no
Graph
anymore! At least, not as a first citizen: an implicit
"virtual" graph is progressively formed each time an operation over some
nodes is applied. The virtual graph can be observed by simply walking the
tree of operations. Most methods of the former Graph are now simple
functions in theag
package. - We still provide a way to explicitly "free" some resources after use,
both for helping the garbage collector and for returning some objects
to theirsync.Pool
. The functionag.ReleaseGraph
operates on the
virtual graph described above, usually starting from the given output nodes. - Forward operations are executed concurrently. As soon as an Operator is
created (usually by calling one of the functions inag
, such asAdd
,
Prod
, etc.), the related Function'sForward
procedure is performed
on a new goroutine. Nevertheless, it's always safe to ask for the Operator's
Value
without worries: if it's called too soon, the function will lock
until the result is computed, and only then return the value. - To maximize performance, we removed the possibility to set a custom limit
for concurrent computations. Thanks to the new design, we now let the Go
runtime itself manage this problem for us, so that you can still limit
and finetune concurrency with theGOMAXPROCS
variable. - The implementation of backpropagation is also redesigned and improved.
Instead of invoking the backward procedure on an explicit Graph, you can call
ag.Backward
orag.BackwardMany
, specifying the output node (or nodes)
of your computation (such as loss values, in traditional scenarios).
The backward functions traverse the virtual graph and propagate the gradients,
leveraging concurrency and making use of goroutines and locks in a way that's
very similar to the forward procedure. The backward functions will lock and
wait until the whole gradients propagation is complete before returning.
The locking mechanism implemented in the nodes'Grad
methods, will still
prevent troubles in case your own code reads the gradients concurrently
(that would be very uncommon). - We also modified the implementation of time-steps handling and truncated
backpropagation. Since we don't have the support of a concrete Graph
structure anymore, we introduced a new dedicated typeag.TimeStepHandler
,
and related functions, such asNodeTimeStep
. For performing a truncated
backpropagation, we provide the functionag.BackwardT
and
ag.BackwardManyT
: they work similarly to the normal backpropagation
functions described above, only additionally requiring a time-step
handler and the desired amount of back steps. - We simplified and polished the API for creating new node-variables. Instead
of having multiple functions for simple variables, scalars, constants,
with/without name or grads, and various combination of those, you can now
create any new variable withag.Var
, which accepts a Matrix value and
creates a new node-variable with gradients accumulation disabled by default.
To enable gradients propagation, or setting an explicit name (useful for
model params or constants), you can use the Variable's chainable methods
WithGrad
andWithName
. As a shortcut to create a scalar-matrix variable
you can useag.Scalar
. - The package
ag/encoding
provides generic structures and functions to obtain
a sort of view of a virtual graph, with the goal of facilitating the
encoding/marshaling of a graph in various formats.
The packageag/encoding/dot
is a rewriting of the formerpkg/ml/graphviz
,
that uses theag/encoding
structures to represent a virtual graph in
Graphviz DOT format.
Models
- As before, package
nn
provides types and functions for defining and
handling models. Its subpackages are implementations of most common models.
The set of built-in models has been remarkably revisited, moving some of them
to separate projects, as previously explained. - The
Model
interface has been extremely simplified: it only requires the
special empty structModule
to be embedded in a model type. This is
necessary only to distinguish an actual model from any other struct, which
is especially useful for parameters traversal, or other similar operations. - Since the Graph has been removed from
ag
, the models clearly don't need
to hold a reference to it anymore. Similarly, there is no need for any other
model-specific field, like the ones available from the formerBaseModel
.
This implies the elimination of some seldomly used properties.
Notable examples are the "processing mode" (from the old Graph) and the time
step (from the old BaseModel).
In situations where a removed value or feature is still needed, we suggest to
either reintroduce the missing elements on the models that needs them, or
to extract them to separate types and functions. An example of
extracted behavior is the handling of time steps, already mentioned in the
previous section. - There is no distinction anymore between "pure" models and processors,
making "reification" no longer necessary: once a model is created (or loaded),
it can be immediately used, even for multiple concurrent inferences. - A side effect of removing processor instances is that it's not possible
to...
v0.7.0
Added
- New package
ml/ag/encoding/dot
, for simple serialization of a Graph to
DOT (Graphviz) format. - New package
ml/nn/sgu
, implementing a Spatial Gating Unit (SGU) model. - New package
ml/nn/conv1x1
, implementing a simple 1-dimensional
1-sized-kernel convolution model. - New package
ml/nn/gmlp
, implementing a gMLP
model.
Changed
ml/nn/activation/Model.Forward
now simply returns the input as it is if
the activation function is the identity.
v0.6.0
Added
ml/losses.WeightedCrossEntropy()
ml/losses.FocalLoss()
ml/losses.WeightedFocalLoss()
nlp/sequencelabeler.LoadModel()
(it replacesLoad()
andLoadEmbeddings()
)nlp/charlm.LoadModel()
nlp/transformers/bert.Model.PredictMLM()
nlp/transformers/bart/tasks
packagenlp/transformers/bert.Model.Vectorize()
ml/ag.Graph.Nodes()
andml/ag.Nodes()
ml/nn.Model.Close()
ml/nn.ReifyForTraining()
andml/nn.ReifyForInference()
ml/ag.Graph.Backward()
now panics if it is executed with nodes belonging to
different graphs.- The new
ml/graphviz
package allows exporting a Graph to Graphviz
DOT format. To make it possible,
we introduced a new go-mod dependency gographviz. - A custom name can be optionally set to a Graph's Variables. This can be
useful for debugging purposes and visual graph representation.
You can now useGraph.NewVariableWithName()
andGraph.NewScalarWithName()
to create named Variables, and get the name of a Variable with
Variable.Name()
.
Changed
- All
UnaryElementwise
functions provided by the packageag/fn
have been
promoted to separate dedicated structs. This improves debuggability and you
can get appropriate function names when using reflection. Here is the full
list of the modified functions:Tan
,Tanh
,Sigmoid
,HardSigmoid
,
HardTanh
,ReLU
,Softsign
,Cos
,Sin
,Exp
,Log
,Neg
,
Reciprocal
,Abs
,Mish
,GELU
,Sqrt
,Swish
.
For the same reason, a dedicatedSquare
function is introduced, replacing
Prod
with both operands set to the same value. ml/ag
typesOperator
,Variable
,Wrapper
are now public.ml/nn.Reify()
now expects a Graph and a Processing Mode arguments
instead of aContext
object (removed).ml/nn.BaseModel
has been modified, replacing the fieldCtx Context
with
a direct reference to the model's Graph and the Processing Mode (fieldsG
andProcessingMode
).- Refactoring server implementation of
nlp/sequencelabeler
,
nlp/transformers/bert
, andnlp/transformers/bart
. - Upgrade various dependencies.
- Regenerate protocol buffers files (with
protoc-gen-go
v1.26.0 and
protoc
v3.16.0).
Removed
nlp/sequencelabeler.Load()
andLoadEmbeddings()
(now replaced by
nlp/sequencelabeler.LoadModel()
)ml/nn.Context
(see related changes onReify()
andBaseModel
)
v0.5.2
Added
- Handle multiple BERT pooling strategies (i.e.
CLS_TOKEN
,REDUCE_MEAN
,REDUCE_MAX
) innlp.transformers.bert.server_encode.go
.
v0.5.1
Added
- Add
nlp.charlm.flair_converter.go
to import Flair character language models.
Changed
- Improve
nlp.transformer.generation
algorithms:- optimize
Generator.getTopKScoredTokens()
. - optimize
Generator.updateTokensScores()
.
- optimize
- Simplify
mat32.Dense.Mul
when doing Matrix-Vector multiplication. - Refactor
math32
functions using chewxy/math32 functions. - Improve
ag.Graph
efficiency:- Use pre-computed cache doing
ag.Graph.groupNodesByHeight()
. - Use
sync.pool
to reduce allocations of graph's operators.
- Use pre-computed cache doing
Fixed
- Fix past key-values usage on self-attention and cross-attention
v0.5.0
Added
- Implement a beam-search algorithm for conditional generation:
nlp.transformer.generation
package.
- Add implementation of the Sentence-Piece tokenizer:
nlp.tokenizers.sentencepiece
package.
- BART improvements:
- gRPC and HTTP API to perform Text Generation.
- Add support for "Marian" architecture (used for translation tasks).
- Add sinusoidal positional encoder (used by Marian).
- Add "head" for conditional generation:
nlp.transformers.bart.head.conditionalgeneration
package.
- Add
nn.Closer
interface (e.g.embeddings.Model
needs to close the underlying key-value store). - Add Swish act. function without trainable parameters.
- Add SiLU act. function (it is just an alias for Swish).
- New
pe.SinusoidalPositionalEncoder
(this implementation replaces unusedpe.PositionalEncoder
andpe.AxialPositionalEncoder
)
Changed
- Update urfave/cli to v2.
- Update dgraph-io/badger to v3.
- Make the BART positional encoder an interface to support various encoding (i.e. trainable vs static).
- Rename to
fn.NewSwish
intofn.NewSwishB
as this was the Swish variant with trainable parameters (B). - Relax
ag.GetOpName
to match operator names in lower-case. - Allow arbitrary activation function on BART encoder/decoder layers.
- Use precomputed "keys" and "values" in self-attention, multi-head attention and BART decoder.
Removed
- In relation to the aforementioned positional encoding changes:
pe.PositionalEncoder
and related functionspe.AxialPositionalEncoder
and related functions
Fixed
- Fix causal-mask used by
nn.ScaledDotProductAttention
v0.4.1
Added
- New function
ReleaseMatrix
to packagesmat32
andmat64
. - New methods to
Matrix
interface, frommat32
andmat64
:Minimum
,Maximum
,MulT
,Inverse
,DoNonZero
. However, the implementation on sparse matrices is not implemented yet (it always panics).
Changed
- Prefer handling
Matrix
interface values over specificDense
orSparse
matrices, also avoiding unnecessary type casts. Relevant changes to the public API are listed below.mat(32|64).Stack
function's arguments and returned value are nowMatrix
interfaces, instead of explicitDense
matrices.Dense.Minimum
andDense.Maximum
, from packagesmat32
andmat64
, return aMatrix
interface, instead of a specificDense
type.- The return values of
fofe.EncodeDense
,fofe.Encode
, andfofe.BiEncode
are slices ofMatrix
values, instead ofDense
orSparse
. - The
z
argument of the functionfofe.Decode
is of typeMatrix
, instead ofDense
. ml.optimizers.de
(Differential Evolution optimizer) API was changed handlingMatrix
values, instead of specificDense
matrices. Changes include:Member.TargetVector
,Member.DonorVector
,ScoredVector.Vector
, thevector
argument ofNewMember
function, thesolution
argument ofscore
andvalidate
functions passed toNewOptimizer
.PositionalEncoder.Cache
andAxialPositionalEncoder.Cache
are slices ofMatrix
, instead of slices ofDense
.AxialPositionalEncoder.EncodingAt
returns aMatrix
value, instead ofDense
.nn.DumpParamsVector
returns aMatrix
value, instead ofDense
.- The
vector
argument of the functionnn.LoadParamsVector
is aMatrix
, instead ofDense
. - The
value
argument of the methodembeddings.Model.SetEmbedding
is of typeMatrix
, instead ofDense
. - The type of the struct field
evolvingembeddings.WordVectorPair.Vector
isMatrix
, instead ofDense
.