diff --git a/doc/BufMelBands.rst b/doc/BufMelBands.rst index 21b33e6..a311be5 100644 --- a/doc/BufMelBands.rst +++ b/doc/BufMelBands.rst @@ -3,65 +3,64 @@ :sc-categories: Libraries>FluidDecomposition :sc-related: Guides/FluidCorpusManipulationToolkit, Classes/FluidBufMFCC :see-also: MelBands, BufPitch, BufLoudness, BufMFCC, BufSpectralShape, BufStats -:description: A spectral shape descriptor where the amplitude is given for a number of equally spread perceptual bands. +:description: Magnitudes for a number of perceptually-evenly spaced bands. :discussion: - The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which is one of the first attempt to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically. - The process will return a single multichannel buffer of ``numBands`` per input channel. Each frame represents a value, which is every hopSize. + :fluid-obj:`BufMelBands` returns a Mel-Frequency Spectrum comprised of the user-defined ``numBands``. The Mel-Frequency Spectrum is a histogram of FFT bins bundled according their relationship to the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which represents frequency space logarithmically, mimicking how humans perceive pitch distance. The name "Mel" derives from the word "melody". The Hz-to-Mel conversion used by :fluid-obj:`BufMelBands` is ``mel = 1127.01048 * log(hz / 700.0 + 1.0)``. + + This implementation allows to select the range and number of bands dynamically. The ``numBands`` MelBands will be perceptually equally distributed between ``minFreq`` and ``maxFreq``. - When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands. + When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands. + + Visit https://learn.flucoma.org/reference/melbands to learn more. -:process: This is the method that calls for the spectral shape descriptors to be calculated on a given source buffer. -:output: Nothing, as the destination buffer is declared in the function call. +:process: This is the method that calls for the analysis to be calculated on a given source buffer. +:output: Nothing, as the ``features`` buffer is declared in the function call. :control source: - The index of the buffer to use as the source material to be described through the various descriptors. The different channels of multichannel buffers will be processing sequentially. + The index of the buffer to use as the source material to be analysed. The different channels of multichannel buffers will be processing sequentially. :control startFrame: - Where in the srcBuf should the process start, in sample. + Where in the ``source`` to begin the analysis, in samples. The default is 0. :control numFrames: - How many frames should be processed. + How many frames should be analysed, in samples. The default of -1 indicates to analyse to the end of the buffer. :control startChan: - For multichannel srcBuf, which channel should be processed first. + For a multichannel ``source``, which channel to begin analysis from. The default is 0. :control numChans: - For multichannel srcBuf, how many channel should be processed. + For multichannel ``source``, how many channels should be processed, starting from ``startChan`` and counting up. The default of -1 indicates to analyse through the last channel in the ``source``. :control features: - The destination buffer for the STRONG::numBands:: amplitudes describing the spectral shape. + The buffer to write the MelBands magnitudes into. :control numBands: - The number of bands that will be perceptually equally distributed between STRONG::minFreq:: and STRONG::maxFreq::. It will decide how many channels are produce per channel of the source. + The number of bands that will be returned. This determines how many channels are in the ``features`` buffer (``numBands`` * ``numChans``). The default is 40. :control minFreq: - The lower boundary of the lowest band of the model, in Hz. + The lower bound of the frequency band to use in analysis, in Hz. The default is 20. :control maxFreq: - The highest boundary of the highest band of the model, in Hz. - -:control maxNumBands: - - The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated. + The upper bound of the frequency band to use in analysis, in Hz. The default is 20000. :control normalize: - This flag enables the scaling of the output to preserve the energy of the window. It is on (1) by default. + This flag indicates whether to use normalized triangle filters, which account for the number of FFT magnitudes used to calculate the MelBands. When normalization is off (`normalize` = 0) the higher MelBands tend to be disproportionately large because they are summing more FFT magnitudes. The default is to have normalization on (`normalize` = 1). :control scale: - This flag sets the scaling of the output value. It is either linear (0, by default) or in dB (1). + This flag sets the scaling of the output value. It is either linear (0, by default) or in dB (1). :control windowSize: @@ -69,21 +68,16 @@ :control hopSize: - The window hop size. As spectral description relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. + The window hop size. As this analysis relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2). :control fftSize: - The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. + The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize. :control padding: - Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centering the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material. - -:control maxFFTSize: - - How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated. + Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centring the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material. :control action: A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [features] as an argument. - diff --git a/doc/MelBands.rst b/doc/MelBands.rst index 1815800..5ae8d0b 100644 --- a/doc/MelBands.rst +++ b/doc/MelBands.rst @@ -1,19 +1,19 @@ -:digest: A Perceptually Spread Spectral Contour Descriptor in Real-Time +:digest: A Perceptually Spread Spectral Contour Descriptor :species: descriptor :sc-categories: Libraries>FluidDecomposition :sc-related: Guides/FluidCorpusManipulationToolkit, Classes/FluidMFCC :see-also: BufMelBands, Pitch, Loudness, MFCC, SpectralShape -:description: Amplitude for a number of equally spread perceptual bands. +:description: Magnitudes for a number of perceptually-evenly spaced bands. :discussion: - The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which was one of the first attempts to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically. - The process will return a multichannel control steam of size maxNumBands, which will be repeated if no change happens within the algorithm, i.e. when the hopSize is larger than the signal vector size. + :fluid-obj:`MelBands` returns a Mel-Frequency Spectrum comprised of the user-defined ``numBands``. The Mel-Frequency Spectrum is a histogram of FFT bins bundled according their relationship to the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which represents frequency space logarithmically, mimicking how humans perceive pitch distance. The name "Mel" derives from the word "melody". The Hz-to-Mel conversion used by :fluid-obj:`MelBands` is ``mel = 1127.01048 * log(hz / 700.0 + 1.0)``. This implementation allows to select the range and number of bands dynamically. When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands. + Visit https://learn.flucoma.org/reference/melbands to learn more. + :process: The audio rate in, control rate out version of the object. -:output: A KR signal of maxNumBands channels, giving the measure amplitudes for each band. The latency is windowSize. - +:output: A KR signal of ``maxNumBands channels``, giving the measured magnitudes for each band. The latency is windowSize. :control in: @@ -21,23 +21,23 @@ :control numBands: - The number of bands that will be perceptually equally distributed between minFreq and maxFreq. It is limited by the maxNumBands parameter. When the number is smaller than the maximum, the output is zero-padded. + The number of bands that will be perceptually equally distributed between ``minFreq`` and ``maxFreq``. It is limited by the maxNumBands parameter. When the number is smaller than the maximum, the output is zero-padded. :control minFreq: - The lower boundary of the lowest band of the model, in Hz. + The lower bound of the frequency band to use in analysis, in Hz. The default is 20. :control maxFreq: - The highest boundary of the highest band of the model, in Hz. + The upper bound of the frequency band to use in analysis, in Hz. The default is 20000. :control maxNumBands: - The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated. + The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated. The default is 120. :control normalize: - This flag enables the scaling of the output to preserve the energy of the window. It is on (1) by default. + This flag indicates whether to use normalized triangle filters, which account for the number of FFT magnitudes used to calculate the MelBands. When normalization is off (`normalize` = 0) the higher MelBands tend to be disproportionately large because they are summing more FFT magnitudes. The default is to have normalization on (`normalize` = 1). :control scale: @@ -57,5 +57,4 @@ :control maxFFTSize: - How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated. - + How large the FFT can be, by allocating memory at instantiation time. This cannot be modulated. diff --git a/example-code/sc/BufMelBands.scd b/example-code/sc/BufMelBands.scd index 2f0e57f..f3e0381 100644 --- a/example-code/sc/BufMelBands.scd +++ b/example-code/sc/BufMelBands.scd @@ -1,50 +1,64 @@ - +STRONG::Use a buffer of MelBands to drive a bank of oscillators:: code:: -// create some buffers ( -b = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); -c = Buffer.new(s); +~bells = Buffer.readChannel(s,FluidFilesPath("Tremblay-CF-ChurchBells.wav"),channels:[0]); +~melBands = Buffer(s); +~numBands = 100; ) -// run the process with basic parameters +// listen to the original +~bells.play; + +// analyse +FluidBufMelBands.processBlocking(s,~bells,features:~melBands,numBands:~numBands,action:{"done".postln}); + +// playback ( -Routine{ - t = Main.elapsedTime; - FluidBufMelBands.process(s, b, features: c, numBands:10).wait; - (Main.elapsedTime - t).postln; -}.play +x = { + arg rate = 0.1, freqMul = 1, freqAdd = 0; + var phs = Phasor.kr(0,rate,0,BufFrames.ir(~melBands)); + var melBands = BufRd.kr(~numBands,~melBands,phs,1,4); + var lowMel = 1127.010498 * ((20/700) + 1).log; // convert from hz to mels + var highMel = 1127.010498 * ((20000/700) + 1).log; // convert from hz to mels + var rangeMel = highMel - lowMel; + var stepMel = rangeMel / (~numBands+1); + var freqMel = Array.fill(~numBands,{arg i; (stepMel * (i+1)) + lowMel}); + var freqHz = ((freqMel/ 1127.01048).exp - 1) * 700; // convert from mel to hz + var sig = SinOsc.ar((freqHz * freqMul) + freqAdd,0,melBands); + Splay.ar(sig) * 24.dbamp; +}.play; ) -// listen to the source and look at the buffer -b.play; -c.plot -:: - -STRONG::A stereo buffer example.:: -CODE:: +// manipulate the oscillator bank +x.set(\rate,0.3); +x.set(\rate,0.04); +x.set(\freqMul,0.5); +x.set(\freqAdd,-2000); -// load two very different files +:: +STRONG::Look at the MelBands in FluidWaveform (as "features"):: +code:: +// create some buffers ( -b = Buffer.read(s,FluidFilesPath("Tremblay-SA-UprightPianoPedalWide.wav")); -c = Buffer.read(s,FluidFilesPath("Tremblay-AaS-AcousticStrums-M.wav")); +~src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); +~melBands = Buffer.new(s); ) -// composite one on left one on right as test signals -FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b) -b.play - -// create a buffer as destinations -c = Buffer.new(s); +// run the process with basic parameters +FluidBufMelBands.processBlocking(s,~src,features:~melBands,action:{"done".postln}); -//run the process on them +// look at the mel bands as feature curves (a bit messy...) +FluidWaveform(~src,featuresBuffer:~melBands,bounds:Rect(0,0,1600,400),stackFeatures:true,normalizeFeaturesIndependently:false); +:: +STRONG::Do a higher resolution analysis and plot it as an image in FluidWaveform:: +code:: +// create some buffers ( -Routine{ - t = Main.elapsedTime; - FluidBufMelBands.process(s, b, features: c, numBands:10).wait; - (Main.elapsedTime - t).postln; -}.play +~src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); +~melBands = Buffer.new(s); ) -// look at the buffer: 10 bands for left, then 10 bands for right -c.plot(separately:true) -:: +FluidBufMelBands.processBlocking(s,~src,features:~melBands,numBands:400,fftSize:4096,action:{"done".postln}); + +FluidWaveform(imageBuffer:~melBands,bounds:Rect(0,0,1600,400),imageColorScheme:1,imageColorScaling:1); +:: \ No newline at end of file diff --git a/example-code/sc/MelBands.scd b/example-code/sc/MelBands.scd index 0c64cdc..e40f4c4 100644 --- a/example-code/sc/MelBands.scd +++ b/example-code/sc/MelBands.scd @@ -1,37 +1,104 @@ - +STRONG::Use the magnitudes of the melbands analysis to drive a bank of sine oscillators to "resynthesize" the drum loop:: code:: -//create a monitoring bus for the descriptors -b = Bus.new(\control,0,40); +//load a source +~drums = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); -//create a monitoring window for the values +( +x = { + arg mix = 0.5; + var source = PlayBuf.ar(1,~drums,BufRateScale.ir(~drums),loop:1); + var numBands = 40; + var windowSize = 1024; + var hopSize = windowSize / 2; + var melBands = FluidMelBands.kr( + source, + numBands, + maxNumBands:numBands, + windowSize:windowSize, + hopSize:hopSize + ); + var lowMel = 1127.010498 * ((20/700) + 1).log; // convert from hz to mels + var highMel = 1127.010498 * ((20000/700) + 1).log; // convert from hz to mels + var rangeMel = highMel - lowMel; + var stepMel = rangeMel / (numBands+1); + var freqMel = Array.fill(numBands,{arg i; (stepMel * (i+1)) + lowMel}); + var freqHz = ((freqMel/ 1127.01048).exp - 1) * 700; // convert from mel to hz + var sines = SinOsc.ar(freqHz,0,melBands.lag(hopSize*SampleDur.ir)).sum; + var sig = [ + DelayN.ar(source,delaytime:windowSize*SampleDur.ir), // compensate for latency + sines + ]; + sig = sig * [1-mix,mix]; + sig; +}.play; +) + +x.set(\mix,1); + +:: +STRONG::Use the mouse to select a different band of analysis:: +code:: +~drums = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); ( -w = Window("Mel Bands Monitor", Rect(10, 10, 620, 320)).front; -a = MultiSliderView(w,Rect(10, 10, 600, 300)).elasticMode_(1).isFilled_(1); +x = { + arg mix = 0.5; + var source = PlayBuf.ar(1,~drums,BufRateScale.ir(~drums),loop:1); + var numBands = 40; + var windowSize = 1024; + var hopSize = windowSize / 2; + var melBands = FluidMelBands.kr( + source, + numBands, + minFreq:MouseX.kr.exprange(20,600), + maxFreq:MouseY.kr.exprange(650,20000), + maxNumBands:numBands, + windowSize:windowSize, + hopSize:hopSize + ); + var lowMel = 1127.010498 * ((20/700) + 1).log; // convert from hz to mels + var highMel = 1127.010498 * ((20000/700) + 1).log; // convert from hz to mels + var rangeMel = highMel - lowMel; + var stepMel = rangeMel / (numBands+1); + var freqMel = Array.fill(numBands,{arg i; (stepMel * (i+1)) + lowMel}); + var freqHz = ((freqMel/ 1127.01048).exp - 1) * 700; // convert from mel to hz + var sines = SinOsc.ar(freqHz,0,melBands.lag(hopSize*SampleDur.ir)).sum; + var sig = [ + DelayN.ar(source,delaytime:windowSize*SampleDur.ir), // compensate for latency + sines + ]; + sig = sig * [1-mix,mix]; + sig; +}.play; ) -//run the window updating routine. +x.set(\mix,1); +:: +STRONG::Display a chart to see the MelBands:: +code:: + +//create a monitoring window for the values ( -~winRange = 0.1; -r = Routine { - { - b.get({ arg val; - { - if(w.isClosed.not) { - a.value = val/~winRange; - } - }.defer - }); - 0.01.wait; - }.loop -}.play +~win = Window("Mel Bands Monitor", Rect(10, 10, 620, 320)).front; +~ms = MultiSliderView(~win, + Rect(0,0,~win.bounds.width,~win.bounds.height) +).elasticMode_(1).isFilled_(1); ) //play a simple sound to observe the values ( -x = { +OSCdef(\melBands,{ + arg msg; + var melBands, numBands = msg[3]; + melBands = msg[4..(4+(numBands-1)).asInteger]; + defer{~ms.value_(melBands)}; +},"/melBands"); + +x = { + arg numBands = 40; var source = SinOsc.ar(LFTri.kr(0.1).exprange(80,800),0,0.1); - Out.kr(b,FluidMelBands.kr(source,maxNumBands:40)); + var melBands = FluidMelBands.kr(source,numBands:numBands,maxNumBands:40); + SendReply.kr(Impulse.kr(30),"/melBands",[numBands] ++ melBands); source.dup; }.play; ) @@ -44,111 +111,37 @@ c = Buffer.read(s,FluidFilesPath("Tremblay-AaS-SynthTwoVoices-M.wav")); // analyse with parameters to be changed ( -x = {arg bands = 40, low = 20, high = 20000; +OSCdef(\melBands,{ + arg msg; + var melBands, numBands = msg[3]; + melBands = msg[4..(4+(numBands-1)).asInteger]; + defer{~ms.value_(melBands)}; +},"/melBands"); + +x = { + arg numBands = 40, minFreq = 20, maxFreq = 20000; var source = PlayBuf.ar(1,c,loop:1); - Out.kr(b,FluidMelBands.kr(source, bands, low, high, 40) / 10); + var melBands = FluidMelBands.kr(source,numBands,minFreq,maxFreq); + SendReply.kr(Impulse.kr(30),"/melBands",[numBands] ++ melBands); source.dup; }.play; ) -//set the winRange to a more informative value -~winRange = 0.05; - -// observe the number of bands. The unused ones at the top are not updated -x.set(\bands,20) +// observe the number of bands +x.set(\numBands,10); // back to the full range -x.set(\bands,40) +x.set(\numBands,40); -// focus all the bands on a mid range: nothing to see! -x.set(\low,320, \high, 800) +// focus all the bands on a mid range +x.set(\minFreq,320, \maxFreq, 800); // focusing on the low end shows the fft resolution issue. One could restart the analysis with a larger fft to show more precision -x.set(\low,20, \high, 160) +x.set(\minFreq,20, \maxFreq, 160); // back to full range -x.set(\low,20, \high, 20000) +x.set(\minFreq,20, \maxFreq, 20000); // free everything -x.free;b.free;c.free;r.stop; -:: - -STRONG::A musical example: a perceptually spread vocoder:: - -CODE:: -//load a source and define control bus for the resynthesis cluster -( -b = Bus.control(s,40); -c = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); -d = Group.new; -) - -//play the source and send the analysis on the -( -x = { - arg dry = 0.2; - var source = PlayBuf.ar(1,c,loop:1); - Out.kr(b,FluidMelBands.kr(source,maxNumBands:40)); - Out.ar(0, DelayN.ar(source,delaytime:1024*SampleDur.ir,mul:dry)); -}.play; -) - -// set the dry playback volume -x.set(\dry, 0.5) - -// create a cluster of sines tuned on each MelBand center frequency, as a sort of vocoder. -( -var lowMel = 1127.010498 * ((20/700) + 1).log; -var highMel = 1127.010498 * ((20000/700) + 1).log; -var rangeMel = highMel - lowMel; -var stepMel = rangeMel / 41; -40.do({ - arg i; - var freqMel = (stepMel * (i +1)) + lowMel; - var freq = ((freqMel/ 1127.01048).exp - 1 ) * 700; - {SinOsc.ar(freq,mul:Lag.kr(In.kr(b,40)[i],512*SampleDur.ir,0.5))}.play(d,1,addAction:\addToTail); -}); -) - -// free all -d.free; x.free; b.free; c.free; - -///////////////////////////////////// -// instantiate a more dynamic vocoder: -// MouseX defines the bottom frequency and MouseY define the top frequency, between which the 40 bands of analysis and synthesis are perceptually equally spred - -// the bus, source and group -( -b = Bus.control(s,40); -c = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav")); -d = Group.new; -) - -// the modified source -( -x = { - arg dry = 0.2; - var source = PlayBuf.ar(1,c,loop:1); - Out.kr(b,FluidMelBands.kr(source,maxNumBands:40,minFreq:MouseX.kr().exprange(20,600),maxFreq:MouseY.kr().exprange(650,20000))); - Out.ar(0, DelayN.ar(source,delaytime:1024*SampleDur.ir,mul:dry)); -}.play; -) - -// the modified vocoder -( -40.do({ - arg i; - { - var lowMel = 1127.010498 * ((MouseX.kr().exprange(20,600)/700) + 1).log; - var highMel = 1127.010498 * ((MouseY.kr().exprange(650,20000)/700) + 1).log; - var rangeMel = highMel - lowMel; - var stepMel = rangeMel / 41; - var freqMel = (stepMel * (i +1)) + lowMel; - var freq = ((freqMel/ 1127.01048).exp - 1 ) * 700; - SinOsc.ar(freq,mul:Lag.kr(In.kr(b,40)[i],512*SampleDur.ir,0.5))}.play(d,1,addAction:\addToTail); -}); -) - -// free all -d.free; x.free; b.free; c.free; -:: +x.free; b.free; c.free; r.stop; +:: \ No newline at end of file