diff --git a/doc/BufMFCC.rst b/doc/BufMFCC.rst index e8e4184..700b721 100644 --- a/doc/BufMFCC.rst +++ b/doc/BufMFCC.rst @@ -4,59 +4,64 @@ :sc-related: Guides/FluidCorpusManipulationToolkit, Guides/FluidBufMultiThreading, Classes/FluidBufMelBands :see-also: :description: - This class implements a classic spectral descriptor, the Mel-Frequency Cepstral Coefficients (https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). The input is first filtered in to **numBands** perceptually-spaced bands, as in Classes/FluidMelBands. It is then analysed into **numCoeffs** number of cepstral coefficients. It has the advantage of being amplitude invariant, except for the first coefficient. It is part of the Guides/FluidCorpusManipulationToolkit of Guides/FluidCorpusManipulationToolkit. For more explanations, learning material, and discussions on its musicianly uses, visit http://www.flucoma.org/ - - The process will return a single multichannel buffer of **numCoeffs** per input channel. Each frame represents a value, which is every hopSize. + + MFCC stands for Mel-Frequency Cepstral Coefficients ("cepstral" is pronounced like "kepstral"). This analysis is often used for timbral description and timbral comparison. It compresses the overall spectrum into a smaller number of coefficients that, when taken together, describe the general contour the the spectrum. + The MFCC values are derived by first computing a mel-frequency spectrum, just as in :fluid-obj:`MelBands`. ``numCoeffs`` coefficients are then calculated by using that mel-frequency spectrum as input to the discrete cosine transform. This means that the shape of the mel-frequency spectrum is compared to a number of cosine wave shapes (different cosines shapes created from different different frequencies). Each MFCC value (i.e., "coefficient") represents how similar the mel-frequency spectrum is to one of these cosine shapes. + Other that the 0th coefficient, MFCCs are unchanged by differences in the overall energy of the spectrum (which relates to how we perceive loudness). This means that timbres with similar spectral contours, but different volumes, will still have similar MFCC values, other than MFCC 0. To remove any indication of loudness but keep the information about timbre, we can ignore MFCC 0 by setting the parameter ``startCoeff`` to 1. + For more information visit https://learn.flucoma.org/reference/mfcc/. + + For an interactive explanation of this relationship, visit https://learn.flucoma.org/reference/mfcc/explain. + :control source: - The index of the buffer to use as the source material to be described through the various descriptors. The different channels of multichannel buffers will be processing sequentially. + The index of the buffer to use as the source material to be analysed. The different channels of multichannel buffers will be processing sequentially. :control startFrame: - Where in the srcBuf should the process start, in sample. + Where in the ``srcBuf`` the analysis should start, in samples. The default is 0. :control numFrames: - How many frames should be processed. + How many frames should be analysed. The default of -1 indicates to analyse to the end of the buffer. :control startChan: - For multichannel srcBuf, which channel should be processed first. + For a multichannel ``srcBuf``, which channel should be processed first. The default is 0. :control numChans: - For multichannel srcBuf, how many channel should be processed. + For a multichannel ``srcBuf``, how many channels should be processed. The default of -1 indicates to analyse through the last channel. :control features: - The destination buffer for the numCoeffs coefficients describing the spectral shape. + The destination buffer to write the MFCC analysis into. :control numCoeffs: - The number of cepstral coefficients to be outputed. It will decide how many channels are produce per channel of the source. + The number of cepstral coefficients to return. The default is 13. :control numBands: - The number of bands that will be perceptually equally distributed between **minFreq** and **maxFreq**. + The number of bands that will be perceptually equally distributed between ``minFreq`` and ``maxFreq``. The default is 40. :control startCoeff: - The lowest index of the output cepstral coefficient, zero-counting. + The lowest index of the output cepstral coefficients to return, zero-counting. This can be useful to skip over the 0th coefficient (by indicating ``startCoeff`` = 1), because the 0th coefficient is representative of the overall energy in spectrum, while the rest of the coefficients are not affected by overall energy, only the mel-frequency spectral contour. The default is 0. :control minFreq: - The lower boundary of the lowest band of the model, in Hz. + The lower bound of the frequency band to use in analysis, in Hz. The default is 20. :control maxFreq: - The highest boundary of the highest band of the model, in Hz. + The upper bound of the frequency band to use in analysis, in Hz. The default is 20000. :control windowSize: - The window size. As MFCC computation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty + The window size. As MFCC computation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty. The default is 1024. :control hopSize: @@ -68,13 +73,4 @@ :control padding: - Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centering the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material. - -:control maxNumCoeffs: - - The maximum number of cepstral coefficients that can be computed. This sets the number of channels of the output, and therefore cannot be modulated. - -:control maxFFTSize: - - How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated. - + Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centring the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material. diff --git a/doc/MFCC.rst b/doc/MFCC.rst index 8cdee16..e36579a 100644 --- a/doc/MFCC.rst +++ b/doc/MFCC.rst @@ -5,12 +5,25 @@ :see-also: BufMFCC, Pitch, MelBands, Loudness, SpectralShape :description: This class implements a classic spectral descriptor, the Mel-Frequency Cepstral Coefficients (MFCCs) :discussion: - See https://en.wikipedia.org/wiki/Mel-frequency_cepstrum. The input is first decomposed into perceptually spaced bands (the number of bands specified by numBands), just as in the MelBands object. It is then analysed in numCoefs number of cepstral coefficients. It has the avantage to be amplitude invarient, except for the first coefficient. - The process will return a multichannel control steam of maxNumCoeffs, which will be repeated if no change happens within the algorithm, i.e. when the hopSize is larger than the host vector size. + MFCC stands for Mel-Frequency Cepstral Coefficients ("cepstral" is pronounced like "kepstral"). This analysis is often used for timbral description and timbral comparison. It compresses the overall spectrum into a smaller number of coefficients that, when taken together, describe the general contour the the spectrum. + + The MFCC values are derived by first computing a mel-frequency spectrum, just as in :fluid-obj:`MelBands`. ``numCoeffs`` coefficients are then calculated by using that mel-frequency spectrum as input to the discrete cosine transform. This means that the shape of the mel-frequency spectrum is compared to a number of cosine wave shapes (different cosines shapes created from different different frequencies). Each MFCC value (i.e., "coefficient") represents how similar the mel-frequency spectrum is to one of these cosine shapes. + + Other that the 0th coefficient, MFCCs are unchanged by differences in the overall energy of the spectrum (which relates to how we perceive loudness). This means that timbres with similar spectral contours, but different volumes, will still have similar MFCC values, other than MFCC 0. To remove any indication of loudness but keep the information about timbre, we can ignore MFCC 0 by setting the parameter ``startCoeff`` to 1. + + .. only_in:: sc + + When ``numCoeffs`` is less than ``maxNumCoeffs`` the result will be zero-padded on the right so the control stream returned by this object is always ``maxNumCoeffs`` channels. + + For more information visit https://learn.flucoma.org/reference/mfcc/. + + For an interactive explanation of this relationship, visit https://learn.flucoma.org/reference/mfcc/explain. :process: The audio rate in, control rate out version of the object. -:output: A KR signal of STRONG::maxNumCoefs:: channels. The latency is windowSize. +:output: + + The process will return a stream of ``maxNumCoeffs`` MFCCs, which will be repeated if no change happens within the algorithm, i.e. when the hopSize is larger than the host vector size. When ``numCoeffs`` is less than ``maxNumCoeffs`` the result will be zero-padded on the right so the control stream returned by this object is always ``maxNumCoeffs`` channels. Latency is ``windowSize`` samples. :control in: @@ -19,23 +32,23 @@ :control numCoeffs: - The number of cepstral coefficients to be outputed. It is limited by the maxNumCoefs parameter. When the number is smaller than the maximum, the output is zero-padded. + The number of cepstral coefficients to output. It is limited by the ``maxNumCoeffs`` parameter. When the number is smaller than the maximum, the output is zero-padded. :control numBands: - The number of bands that will be perceptually equally distributed between minFreq and maxFreq to describe the spectral shape before it is converted to cepstral coefficients. + The number of mel-bands that will be perceptually equally distributed between ``minFreq`` and ``maxFreq`` to describe the spectral shape before the cepstral coefficients are computed. :control startCoeff: - The lowest index of the output cepstral coefficient, zero-counting. + The lowest index of the output cepstral coefficients to return, zero-counting. This can be useful to skip over the 0th coefficient (by indicating ``startCoeff`` = 1), because the 0th coefficient is representative of the overall energy in spectrum, while the rest of the coefficients are not affected by overall energy, only the mel-frequency spectral contour. :control minFreq: - The lower boundary of the lowest band of the model, in Hz. + The lower bound of the frequency band to use in analysis, in Hz. :control maxFreq: - The highest boundary of the highest band of the model, in Hz. + The upper bound of the frequency band to use in analysis, in Hz. :control maxNumCoeffs: @@ -56,4 +69,3 @@ :control maxFFTSize: How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated. - diff --git a/example-code/sc/BufMFCC.scd b/example-code/sc/BufMFCC.scd index 8f80582..694b371 100644 --- a/example-code/sc/BufMFCC.scd +++ b/example-code/sc/BufMFCC.scd @@ -1,50 +1,49 @@ - code:: -// create some buffers -( -b = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); -c = Buffer.new(s); -) -// run the process with basic parameters +// load a sound +~buf = Buffer.read(s,FluidFilesPath("Harker-DS-TenOboeMultiphonics-M.wav")); + +// do a FluidBufMFCC analysis ( -Routine{ - t = Main.elapsedTime; - FluidBufMFCC.process(s, b, features: c).wait; - (Main.elapsedTime - t).postln; -}.play +~mfccs = Buffer(s); +FluidBufMFCC.processBlocking(s,~buf,features:~mfccs,action:{"done".postln;}) ) -// listen to the source and look at the buffer -b.play; -c.plot(separately:true) +// plot it in FluidWaveform -- it's not *too* informative, but may be useful to get a sense of what these MFCC curves look like +FluidWaveform(~buf,featuresBuffer:~mfccs,stackFeatures:true,bounds:Rect(0,0,1600,400)); + :: +strong::Load a lot of MFCC analyses to a data set for later data processing:: +code:: -STRONG::A stereo buffer example.:: -CODE:: +// load a sound +~buf = Buffer.readChannel(s,FluidFilesPath("Tremblay-CF-ChurchBells.wav"),channels:[0]); -// load two very different files +// do a FluidBufMFCC analysis ( -b = Buffer.read(s,FluidFilesPath("Tremblay-SA-UprightPianoPedalWide.wav")); -c = Buffer.read(s,FluidFilesPath("Tremblay-AaS-AcousticStrums-M.wav")); +~mfccs = Buffer(s); +FluidBufMFCC.processBlocking(s,~buf,features:~mfccs,action:{"done".postln;}) ) -// composite one on left one on right as test signals -FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b) -b.play - -// create a buffer as destinations -c = Buffer.new(s); - -//run the process on them +// create a dataset to put the mfccs in ( -Routine{ - t = Main.elapsedTime; - FluidBufMFCC.process(s, b, numCoeffs:5, features: c).wait; - (Main.elapsedTime - t).postln; -}.play +~ds = FluidDataSet(s).fromBuffer(~mfccs); +~ds.print; ) -// look at the buffer: 5 coefs for left, then 5 coefs for right (the first of each is linked to the loudness) -c.plot(separately:true) -:: +// dimensionally reduce the 13 MFCCs into 2D space +( // this will take a bit of time to process! +fork{ + ~umap = FluidUMAP(s); + ~norm = FluidNormalize(s); + s.sync; + ~umap.fitTransform(~ds,~ds); + ~norm.fitTransform(~ds,~ds); + ~dict = ~ds.dump({ + arg dict; + ~plotter = FluidPlotter(bounds:Rect(0,0,800,800),dict:dict); + }); +}; +) + +:: \ No newline at end of file diff --git a/example-code/sc/MFCC.scd b/example-code/sc/MFCC.scd index d571653..37d6a88 100644 --- a/example-code/sc/MFCC.scd +++ b/example-code/sc/MFCC.scd @@ -1,239 +1,155 @@ code:: -//create a monitoring window for the values +// a window to watch the MFCC analyses in real-time ( -b = Bus.new(\control,0,13); -w = Window("MFCCs Monitor", Rect(10, 10, 420, 320)).front; -a = MultiSliderView(w,Rect(10, 10, 400, 300)).elasticMode_(1).isFilled_(1); -a.reference_(Array.fill(13,{0.5})); //make a center line to show 0 -) - -//run the window updating routine. -( -~winRange = 20; - -r = Routine { - { - b.get({ arg val; - { - if(w.isClosed.not) { - //val.postln; - a.value = val.linlin(~winRange.neg,~winRange,0,1); - } - }.defer - }); - 0.01.wait; - }.loop -}.play +~win = Window("MFCCs Monitor",Rect(0,0,800,400)).front; +~ms = MultiSliderView(~win,Rect(0,0,~win.bounds.width,~win.bounds.height)).elasticMode_(1).isFilled_(1); +~ms.reference_(Array.fill(13,{0.5})); //make a center line to show 0 ) //play a simple sound to observe the values ( -x = {arg type = 0; +~synth = { + arg type = 0; var source = Select.ar(type,[SinOsc.ar(220),Saw.ar(220),Pulse.ar(220)]) * LFTri.kr(0.1).exprange(0.01,0.1); - Out.kr(b,FluidMFCC.kr(source,maxNumCoeffs:13)); + var mfccs = FluidMFCC.kr(source,numCoeffs:13,startCoeff:0,maxNumCoeffs:13); + SendReply.kr(Impulse.kr(30),"/mfccs",mfccs); source.dup; }.play; + +~mfccRange = 40; +OSCdef(\mfccs,{ + arg msg; + {~ms.value_(msg[3..].linlin(~mfccRange.neg,~mfccRange,0,1))}.defer; +},"/mfccs"); ) -// change the wave types, observe the amplitude invariance of the descriptors, apart from the leftmost coefficient -x.set(\type, 1) -~winRange = 40; //adjust the range above and below 0 to zoom in or out on the MFCC -x.set(\type, 2) -x.set(\type, 0) -// free this source -x.free +// change the wave types, observe that, apart from the 0th coefficient, different loudness does not change the values +~synth.set(\type, 1) // sawtooth wave +~synth.set(\type, 2) // pulse wave +~synth.set(\type, 0) // sine wave -// load a more exciting one -c = Buffer.read(s,FluidFilesPath("Tremblay-AaS-SynthTwoVoices-M.wav")); +~synth.free; -// analyse with parameters to be changed +// load a more complex souond +~tbone = Buffer.read(s,FluidFilesPath("Olencki-TenTromboneLongTones-M.wav")); + +// notice now that all these trombone sounds look relatively similar because they're the same timbre, even when the trombone changes pitches ( -x = {arg bands = 40, low = 20, high = 20000; - var source = PlayBuf.ar(1,c,loop:1); - Out.kr(b,FluidMFCC.kr(source, numCoeffs: 13, numBands: bands, minFreq: low, maxFreq: high, maxNumCoeffs: 13) / 10); +x = {arg bands = 40; + var source = PlayBuf.ar(1,~tbone,loop:1); + var mfccs = FluidMFCC.kr(source, numCoeffs: 13, numBands: bands, maxNumCoeffs: 13); + SendReply.kr(Impulse.kr(30),"/mfccs",mfccs); source.dup; }.play; -) -~winRange = 10; //adjust the range above and below 0 to zoom in or out on the MFCC -// observe the number of bands. The unused ones at the top are not updated -x.set(\bands,20) - -// back to the full range -x.set(\bands,40) +~mfccRange = 70; +OSCdef(\mfccs,{ + arg msg; + {~ms.value_(msg[3..].linlin(~mfccRange.neg,~mfccRange,0,1))}.defer; +},"/mfccs"); +) -// focus all the bands on a mid range -x.set(\low,320, \high, 800) +// compare with the timbres of oboe multiphonics +~oboe = Buffer.read(s,FluidFilesPath("Harker-DS-TenOboeMultiphonics-M.wav")); -// focusing on the low end shows the fft resolution issue. One could restart the analysis with a larger fft to show more precision -x.set(\low,20, \high, 160) +( +x = { + arg bands = 40; + var source = PlayBuf.ar(1,~oboe,loop:1); + var mfccs = FluidMFCC.kr(source, numCoeffs: 13, numBands: bands, maxNumCoeffs: 13); + SendReply.kr(Impulse.kr(30),"/mfccs",mfccs); + source.dup; +}.play; -// back to full range -x.set(\low,20, \high, 20000) +~mfccRange = 70; +OSCdef(\mfccs,{ + arg msg; + {~ms.value_(msg[3..].linlin(~mfccRange.neg,~mfccRange,0,1))}.defer; +},"/mfccs"); +) -// free everything -x.free;b.free;c.free;r.stop; :: - -STRONG::A musical example:: - +STRONG::Comparing MFCC Analyses in real-time:: CODE:: -//program that freezes mfcc spectra, then looks for matches between two frozen spectra -( -SynthDef("MFCCJamz", {arg freq=220, source = 0, buffer, mfccBus, distBus, t_freeze0=0, t_freeze1=0, onsetsOn0=0, onsetsOn1=0, restart = 1; - var sound, mfcc, mfccFreeze0, mfccFreeze1, dist0, dist1, closest, slice; - - sound = SelectX.ar(source, [ - SinOsc.ar(freq, 0, 0.1), - LFTri.ar(freq, 0, 0.1), - LFSaw.ar(freq, 0, 0.1), - Pulse.ar(freq, 0.5, 0.1), - WhiteNoise.ar(0.1), - PinkNoise.ar(0.1), - PlayBuf.ar(1, buffer, 1, loop:1, trigger:restart) - ]); - slice = FluidOnsetSlice.ar(sound); //onset detection for mfcc freeze on onset +// we'll compare trombone to trombone (but at different playback rates to fake 2 different players +~buf = Buffer.read(s,FluidFilesPath("Olencki-TenTromboneLongTones-M.wav")); - mfcc = FluidMFCC.kr(sound,maxNumCoeffs:13); - mfccFreeze0 = Latch.kr(mfcc, t_freeze0+(slice*onsetsOn0)); - mfccFreeze1 = Latch.kr(mfcc, t_freeze1+(slice*onsetsOn1)); - - Out.kr(mfccBus,mfcc.addAll(mfccFreeze0).addAll(mfccFreeze1)); +// the more similar the timbres of the "2" trombonists, the lower the measured "distance" between them will be +// here, MFCC's timbre measure captures differences in timbre across the trombone range +// watch for how the more similar the trombone sound, the smaller distance measurement between the 2 analyses +( +{ + var sigA = PlayBuf.ar(1,~buf,BufRateScale.ir(~buf) * 0.9,loop:1); + var sigB = PlayBuf.ar(1,~buf,BufRateScale.ir(~buf),loop:1); + + var mfccA = FluidMFCC.kr(sigA,startCoeff:1); + var mfccB = FluidMFCC.kr(sigB,startCoeff:1); + var dist = Mix((mfccA - mfccB).squared).sqrt; + SendReply.kr(Impulse.kr(30),"/dists",dist); + [sigA,sigB]; +}.play; - //distance calculations +OSCdef(\dists,{ + arg msg; + "\ndistance:\t%\t".format(msg[3].round).post; + {"*".post} ! (msg[3] / 5); +},"/dists"); +) - dist0 = Mix((mfcc.copyRange(1,12) - mfccFreeze0.copyRange(1,12)).squared).sqrt; - dist1 = Mix((mfcc.copyRange(1,12) - mfccFreeze1.copyRange(1,12)).squared).sqrt; +:: +STRONG::Using Dimensionality Reduction to plot MFCCs in 2D Space:: - Out.kr(distBus, [dist0, dist1]); +CODE:: - //sends a trigger when the item with a closer euclidean distance changes - SendTrig.kr(Trig1.kr(dist1-dist0, 0.001)+Trig1.kr(dist0-dist1, 0.001), 0, dist1