Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions doc/KMeans.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:digest: Cluster data points with K-Means
:species: data
:sc-categories: FluidManipulation
:sc-related: Classes/FluidDataSet, Classes/FluidLabelSet, Classes/FluidKNNClassifier, Classes/FluidKNNRegressor
:see-also: KNNClassifier, MLPClassifier, DataSet
:sc-related: Classes/FluidDataSet, Classes/FluidLabelSet, Classes/FluidKNNClassifier, Classes/FluidKNNRegressor, Classes/FluidSKMeans
:see-also: SKMeans, KNNClassifier, MLPClassifier, DataSet
:description:

Uses the K-means algorithm to learn clusters from a :fluid-obj:`DataSet`.
Expand Down Expand Up @@ -63,7 +63,7 @@

:arg action: A function to run when the server responds.

Given a trained object, return for each item of a provided :fluid-obj:`DataSet` its distance to each cluster as an array, often reffered to as the cluster-distance space.
Given a trained object, return for each item of a provided :fluid-obj:`DataSet` its distance to each cluster as an array, often referred to as the cluster-distance space.

:message fitTransform:

Expand All @@ -87,15 +87,15 @@

:message getMeans:

:arg dataSet: A :fluid-obj:`DataSet` of clusers with a mean per column.
:arg dataSet: A :fluid-obj:`DataSet` of clusters with a mean per column.

:arg action: A function to run when complete.

Given a trained object, retrieve the means (centroids) of each cluster as a :fluid-obj:`DataSet`

:message setMeans:

:arg dataSet: A :fluid-obj:`DataSet` of clusers with a mean per column.
:arg dataSet: A :fluid-obj:`DataSet` of clusters with a mean per column.

:arg action: A function to run when complete.

Expand Down
113 changes: 113 additions & 0 deletions doc/SKMeans.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
:digest: K-Means with Spherical Distances
:species: data
:sc-categories: FluidManipulation
:sc-related: Classes/FluidDataSet, Classes/FluidLabelSet, Classes/FluidKNNClassifier, Classes/FluidKNNRegressor, Classes/FluidKMeans
:see-also: KMeans, KNNClassifier, MLPClassifier, DataSet
:description:

Uses K-means algorithm with cosine similarity to learn clusters and features from a :fluid-obj:`DataSet`.

:discussion:

:fluid-obj:`SKMeans` is an implementation of KMeans based on cosine distances instead of euclidian ones, measuring the angles between the normalised vectors.
One common application of spherical KMeans is to try and learn features directly from input data (via a :fluid-obj:`DataSet`) without supervision. See this reference for a more technical explanation: https://machinelearningcatalogue.com/algorithm/alg_spherical-k-means.html and https://www-cs.stanford.edu/~acoates/papers/coatesng_nntot2012.pdf for feature extractions.

:control numClusters:

The number of clusters to partition data into.

:control encodingThreshold:

The encoding threshold (aka the alpha parameter). When used for feature learning, this can be used to produce sparser output features by setting the least active output dimensions to 0.

:control maxIter:

The maximum number of iterations the algorithm will use whilst fitting.

:message fit:

:arg dataSet: A :fluid-obj:`DataSet` of data points.

:arg action: A function to run when fitting is complete, taking as its argument an array with the number of data points for each cluster.

Identify ``numClusters`` clusters in a :fluid-obj:`DataSet`. It will optimise until no improvement is possible, or up to ``maxIter``, whichever comes first. Subsequent calls will continue training from the stopping point with the same conditions.

:message predict:

:arg dataSet: A :fluid-obj:`DataSet` containing the data to predict.

:arg labelSet: A :fluid-obj:`LabelSet` to retrieve the predicted clusters.

:arg action: A function to run when the server responds.

Given a trained object, return the cluster ID for each data point in a :fluid-obj:`DataSet` to a :fluid-obj:`LabelSet`.

:message fitPredict:

:arg dataSet: A :fluid-obj:`DataSet` containing the data to fit and predict.

:arg labelSet: A :fluid-obj:`LabelSet` to retrieve the predicted clusters.

:arg action: A function to run when the server responds

Run :fluid-obj:`KMeans#*fit` and :fluid-obj:`KMeans#*predict` in a single pass: i.e. train the model on the incoming :fluid-obj:`DataSet` and then return the learned clustering to the passed :fluid-obj:`LabelSet`

:message predictPoint:

:arg buffer: A |buffer| containing a data point.

:arg action: A function to run when the server responds, taking the ID of the cluster as its argument.

Given a trained object, return the cluster ID for a data point in a |buffer|

:message encode:

:arg srcDataSet: A :fluid-obj:`DataSet` containing the data to encode.

:arg dstDataSet: A :fluid-obj:`DataSet` to contain the new cluster-activation space.

:arg action: A function to run when the server responds.

Given a trained object, return for each item of a provided :fluid-obj:`DataSet` its encoded activations to each cluster as an array, often referred to as the cluster-activation space.

:message fitEncode:

:arg srcDataSet: A :fluid-obj:`DataSet` containing the data to fit and encode.

:arg dstDataSet: A :fluid-obj:`DataSet` to contain the new cluster-activation space.

:arg action: A function to run when the server responds

Run :fluid-obj:`SKMeans#*fit` and :fluid-obj:`SKMeans#*encode` in a single pass: i.e. train the model on the incoming :fluid-obj:`DataSet` and then return its encoded cluster-activation space in the destination :fluid-obj:`DataSet`

:message encodePoint:

:arg sourceBuffer: A |buffer| containing a data point.

:arg targetBuffer: A |buffer| to write in the activation to all the cluster centroids.

:arg action: A function to run when complete.

Given a trained object, return the encoded activation of the provided point to each cluster centroid. Both points are handled as |buffer|

:message getMeans:

:arg dataSet: A :fluid-obj:`DataSet` of clusters with a mean per column.

:arg action: A function to run when complete.

Given a trained object, retrieve the means (centroids) of each cluster as a :fluid-obj:`DataSet`

:message setMeans:

:arg dataSet: A :fluid-obj:`DataSet` of clusters with a mean per column.

:arg action: A function to run when complete.

Overwrites the means (centroids) of each cluster, and declare the object trained.

:message clear:

:arg action: A function to run when complete.

Reset the object status to not fitted and untrained.
140 changes: 140 additions & 0 deletions example-code/sc/SKMeans.scd
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
code::

(
//Make some clumped 2D points and place into a DataSet
~points = (4.collect{
64.collect{(1.sum3rand) + [1,-1].choose}.clump(2)
}).flatten(1) * 0.5;
fork{
~dataSet = FluidDataSet(s);
d = Dictionary.with(
*[\cols -> 2,\data -> Dictionary.newFrom(
~points.collect{|x, i| [i, x]}.flatten)]);
s.sync;
~dataSet.load(d, {~dataSet.print});
}
)


// Create an SKMeans instance and a LabelSet for the cluster labels in the server
~clusters = FluidLabelSet(s);
~skmeans = FluidSKMeans(s);

// Fit into 4 clusters
(
~skmeans.fitPredict(~dataSet,~clusters,action: {|c|
"Fitted.\n # Points in each cluster:".postln;
c.do{|x,i|
("Cluster" + i + "->" + x.asInteger + "points").postln;
}
});
)

// Cols of SKMeans should match DataSet, size is the number of clusters

~skmeans.cols;
~skmeans.size;
~skmeans.dump;

// Retrieve labels of clustered points by sorting the IDs
~clusters.dump{|x|~assignments = x.at("data").atAll(x.at("data").keys.asArray.sort{|a,b|a.asInteger < b.asInteger}).flatten.postln;}

//Visualise: we're hoping to see colours neatly mapped to quandrants...
(
d = ((~points + 1) * 0.5).flatten(1).unlace;
w = Window("scatter", Rect(128, 64, 200, 200));
~colours = [Color.blue,Color.red,Color.green,Color.magenta];
w.drawFunc = {
Pen.use {
d[0].size.do{|i|
var x = (d[0][i]*200);
var y = (d[1][i]*200);
var r = Rect(x,y,5,5);
Pen.fillColor = ~colours[~assignments[i].asInteger];
Pen.fillOval(r);
}
}
};
w.refresh;
w.front;
)

// single point query on arbitrary value
~inbuf = Buffer.loadCollection(s,0.5.dup);
~skmeans.predictPoint(~inbuf,{|x|x.postln;});
::

subsection:: Accessing the means

We can get and set the means for each cluster, their centroid.

code::
// with the dataset and skmeans generated and trained in the code above
~centroids = FluidDataSet(s);
~skmeans.getMeans(~centroids, {~centroids.print});

// We can also set them to arbitrary values to seed the process
~centroids.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\0, [0.5,0.5], \1, [-0.5,0.5], \2, [0.5,-0.5], \3, [-0.5,-0.5]])]));
~centroids.print
~skmeans.setMeans(~centroids, {~skmeans.predict(~dataSet,~clusters,{~clusters.dump{|x|var count = 0.dup(4); x["data"].keysValuesDo{|k,v|count[v[0].asInteger] = count[v[0].asInteger] + 1;};count.postln}})});

// We can further fit from the seeded means
~skmeans.fit(~dataSet)
// then retreive the improved means
~skmeans.getMeans(~centroids, {~centroids.print});
//subtle in this case but still.. each quadrant is where we seeded it.
::

subsection:: Cluster-distance Space

We can get the spherical distance of a given point to each cluster. SKMeans differ from KMeans as it takes the angular distance (cosine) of the vector. This is often referred to as the cluster-distance space as it creates new dimensions for each given point, one distance per cluster.

code::
// with the dataset and skmeans generated and trained in the code above
b = Buffer.sendCollection(s,[0.5,0.5])
c = Buffer(s)

// get the distance of our given point (b) to each cluster, thus giving us 4 dimensions in our cluster-distance space
~skmeans.encodePoint(b,c,{|x|x.query;x.getn(0,x.numFrames,{|y|y.postln})})

// we can also encode a full dataset
~srcDS = FluidDataSet(s)
~cdspace = FluidDataSet(s)
// make a new dataset with 4 points
~srcDS.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\pp, [0.5,0.5], \np, [-0.5,0.5], \pn, [0.5,-0.5], \nn, [-0.5,-0.5]])]));
~skmeans.encode(~srcDS, ~cdspace, {~cdspace.print})
::

subsection:: Queries in a Synth

This is the equivalent of predictPoint, but wholly on the server

code::
(
{
var trig = Impulse.kr(5);
var point = WhiteNoise.kr(1.dup);
var inputPoint = LocalBuf(2);
var outputPoint = LocalBuf(1);
Poll.kr(trig, point, [\pointX,\pointY]);
point.collect{ |p,i| BufWr.kr([p],inputPoint,i)};
~skmeans.kr(trig,inputPoint,outputPoint);
Poll.kr(trig,BufRd.kr(1,outputPoint,0,interpolation:0),\cluster);
}.play;
)

// to sonify the output, here are random values alternating quadrant, generated more quickly as the cursor moves rightwards
(
{
var trig = Impulse.kr(MouseX.kr(0,1).exprange(0.5,ControlRate.ir / 2));
var step = Stepper.kr(trig,max:3);
var point = TRand.kr(-0.1, [0.1, 0.1], trig) + [step.mod(2).linlin(0,1,-0.6,0.6),step.div(2).linlin(0,1,-0.6,0.6)] ;
var inputPoint = LocalBuf(2);
var outputPoint = LocalBuf(1);
point.collect{|p,i| BufWr.kr([p],inputPoint,i)};
~skmeans.kr(trig,inputPoint,outputPoint);
SinOsc.ar((BufRd.kr(1,outputPoint,0,interpolation:0) + 69).midicps,mul: 0.1);
}.play;
)

::