From 78222726a74bca8b5aa97e51593c573ea1d5291a Mon Sep 17 00:00:00 2001 From: towsey Date: Tue, 2 Mar 2021 13:38:13 +1100 Subject: [PATCH] Changes to code and documentation requested by Anthony Issue #451 Changes to documentation involved shifted some documentation to other guides and to class summaries. --- docs/guides/generic_recognizers.md | 219 +++++++++++------- docs/theory/spectrograms.md | 42 ++-- .../Extensions/EnumerableExtensions.cs | 5 +- src/AudioAnalysisTools/Events/EventFilters.cs | 11 +- .../Events/Types/EventPostProcessing.cs | 14 +- 5 files changed, 183 insertions(+), 108 deletions(-) diff --git a/docs/guides/generic_recognizers.md b/docs/guides/generic_recognizers.md index 588026f63..cec59fb41 100644 --- a/docs/guides/generic_recognizers.md +++ b/docs/guides/generic_recognizers.md @@ -78,17 +78,19 @@ These ideas are summarized in the following table: To summarize, the advantages of a hand-crafted DIY call recognizer are: 1. You can do it yourself! -2. You can start with just one or two calls -3. Allows you to collect a larger dataset (and refine it) for machine learning purposes -4. Exposes the variability of the target call as you go + +2. You can start with just one or two calls. + +3. Allows you to collect a larger dataset (and refine it) for machine learning purposes. + +4. Exposes the variability of the target call as you go. ## 2. Calls, syllables, harmonics -The algorithmic approach of **DIY Call Recognizer** makes particular assumptions about animals calls and how they are -structured. A *call* is taken to be any sound of animal origin (whether for communication purposes or not) and include -bird songs/calls, animal vocalizations of any kind, the stridulation of insects, the wingbeats of birds and bats and the -various sounds produced by aquatic animals. Calls typically have temporal and spectral structure. For example they may -consist of a temporal sequence of two or more *syllables* (with "gaps" in between) or a set of simultaneous *harmonics* +The algorithmic approach of **DIY Call Recognizer** makes particular assumptions about animals calls and how they are structured. +A *call* is taken to be any sound of animal origin (whether for communication purposes or not) and include +bird songs/calls, animal vocalizations of any kind, the stridulation of insects, the wingbeats of birds and bats and the various sounds produced by aquatic animals. Calls typically have temporal and spectral structure. +For example they may consist of a temporal sequence of two or more *syllables* (with "gaps" in between) or a set of simultaneous *harmonics* or *formants*. (The distinction between harmonics and formants does not concern us here.) @@ -106,16 +108,22 @@ There are seven types of acoustic events: 1. [Shrieks](xref:theory-acoustic-events#shrieks): diffuse events treated as "blobs" of acoustic energy. A typical example is a parrot shriek. + 2. [Whistles](xref:theory-acoustic-events#whistles): - "pure" tones (often imperfect) appearing as horizontal lines on a spectrogram + "pure" tones (often imperfect) appearing as horizontal lines on a spectrogram. + 3. [Chirps](xref:theory-acoustic-events#chirps): whistle like events that increases in frequency over time. Appears like a sloping line in a spectrogram. + 4. [Whips](xref:theory-acoustic-events#whips): sound like a "whip crack". They appear as steeply ascending or descending *spectral track* in the spectrogram. + 5. [Clicks](xref:theory-acoustic-events#clicks): appear as a single vertical line in a spectrogram and sounds, like the name suggests, as a very brief click. + 6. [Oscillations](xref:theory-acoustic-events#oscillations): An oscillation is the same (or nearly the same) syllable (typically whips or clicks) repeated at a fixed periodicity over several to many time-frames. + 7. [Harmonics](xref:theory-acoustic-events#harmonics): Harmonics are the same/similar shaped *whistle* or *chirp* repeated simultaneously at multiple intervals of frequency. Typically, the frequency intervals are similar as one ascends the stack of harmonics. @@ -135,12 +143,15 @@ A **DIY Call Recognizer** attempts to recognize calls in a noise-reduced [spectr 1. Preprocessing—steps to prepare the recording for subsequent analysis. 1. Input audio is broken up into 1-minute chunks 2. Audio resampling + 2. Processing—steps to identify target syllables as _"generic"_ acoustic events 1. Spectrogram preparation 1. Call syllable detection + 3. Postprocessing—steps which simplify the output combining related acoustic events and filtering events to remove false-positives 1. Combining syllable events into calls 1. Syllable/call filtering + 4. Saving Results To execute these detection steps, suitable _parameter values_ must be placed into a [_configuration file_](xref:basics-config-files). @@ -171,18 +182,22 @@ Config files contain a list of parameters, each of which is written as a name-va ResampleRate: 22050 ``` -Changing these parameters allows for the construction of a generic recognizer. This guide will explain the various -parameters than can be changed and their typical values. However, this guide will not produce a functional recognizer; +Changing these parameters allows for the construction of a generic recognizer. This guide will explain the various parameters than can be changed and their typical values. +However, this guide will not produce a functional recognizer; each recognizer has to be "tuned" to the target syllables for species to be recognized. Only you can do that. There are many parameters available. To make config files easier to read we order these parameters roughly in the order that they are applied. This aligns with the [basic recognition](#4-detecting-acoustic-events) steps from above. 1. Parameters for preprocessing + 2. Parameters for processing + 3. Parameters for postprocessing + 4. Parameters for saving Results + ### Profiles [Profiles](xref:basics-config-files#profiles) are a list of acoustic event detection algorithms to use in our processing stage. @@ -190,16 +205,18 @@ order that they are applied. This aligns with the [basic recognition](#4-detecti > [!TIP] > For an introduction to profiles see the page. -Each algorithm is designed to detect a syllable type. Thus to make a generic recognizer there should be at least one (1) -profile in the `Profiles` list. A config file may target more than one syllable or acoustic event, in which case there will be profile for each target syllable or acoustic event. +Each algorithm is designed to detect a syllable type. Thus to make a generic recognizer there should be at least one (1) profile in the `Profiles` list. +A config file may target more than one syllable or acoustic event, in which case there will be a profile for each target syllable or acoustic event. The `Profiles` list contains one or more profile items, and each profile has several parameters. So we have a three level hierarchy: 1. The key-word `Profiles` that heads the list. + 2. One or more _profile_ declarations. - There are two parts to each profile declaration: 1. A user defined name 2. And the algorithm type to use with this profile (prefixed with an exclamation mark (`!`)) + 3. The profile _parameters_ consisting of a list of name:value pairs Here is an (abbreviated) example: @@ -255,10 +272,9 @@ the names of the algorithms (used to find those events) describe how the algorit | Oscillation | `Oscillation` | `!OscillationParameters` | | Harmonic | `Harmonic` | `!HarmonicParameters` | -Each of these detection algorithms has some common parameters because all "generic" events are characterized by -common properties, such as their minimum and maximum temporal duration, their minimum and maximum frequencies, and their decibel intensity. In fact, every -acoustic event is bounded by an _implicit_ rectangle or marquee whose height represents the bandwidth of the event and -whose width represents the duration of the event. Even a _chirp_ or _whip_ which consists only of a single sloping +Each of these detection algorithms has some common parameters because all "generic" events are characterized by common properties, such as their minimum and maximum temporal duration, their minimum and maximum frequencies, and their decibel intensity. +In fact, every acoustic event is bounded by an _implicit_ rectangle or marquee whose height represents the bandwidth of the event and whose width represents the duration of the event. +Even a _chirp_ or _whip_ which consists only of a single sloping *spectral track*, is enclosed by a rectangle, two of whose vertices sit at the start and end of the track. See for more details. @@ -319,42 +335,39 @@ All algorithms have some [common parameters](xref:AnalysisPrograms.Recognizers.B - Spectrogram settings - Noise removal settings -- and basic limits for the allowed length and bandwidth of an event +- Parameters that set basic limits to the allowed duration and bandwidth of an event -Each algorithm has its own spectrogram settings, so parameters such as _window size_ can be varied for _each_ type of acoustic event you want to detect. +Each algorithm has its own spectrogram settings, so parameters such as `WindowSize` can be varied for _each_ type of acoustic event you want to detect. -#### [Common Parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters): Spectrogram preparation +### [Common Parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters): Spectrogram preparation By convention, we list the spectrogram parameters first (after the species name) in each algorithm entry: [!code-yaml[spectrogram](./Ecosounds.NinoxBoobook.yml#L11-L19 "Spectrogram parameters")] -- `FrameSize` is the size of the FFT window used to make the spectrogram. Use this to control the resolution tradeoff - between the time and frequency domains. Must be a power of 2, a good default is `512` and `1024` is also common. -- `FrameStep` sets the number of samples between the start of one frame and the next. Therefore it controls frame overlap. `FrameStep` must be less than `FrameSize` and need not be a power of 2. By default `FrameStep` equals `FrameSize`. -- `WindowFunction` can be one of the values from . `Hanning` is the default because we find it the most versatile. -- `BgNoiseThreshold` stands for _background noise threshold_ and controls the amount of noise removal. - - The units are in decibels - - `0` is the least severe and is a good default. - - Increasing the value to `3`–`4` decibels increases the likelihood that you will lose some important components of your target calls -For a discussion on these parameters, refer to the document. +- `FrameSize` sets the size of the FFT window. + +- `FrameStep` sets the number of samples between frame starts. + +- `WindowFunction` sets the FFT window function. -#### [Common Parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters): Call syllable limits +- `BgNoiseThreshold` sets the degree of background noise removal. + +Since these parameters are so important for the success of call detection, you are strongly advised to refer to the document for more information about setting their values. + +### [Common Parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters): Call syllable limits A complete definition of the `BoobookSyllable` follows. [!code-yaml[full_profile](./Ecosounds.NinoxBoobook.yml#L11-L30 "A complete profile")] -The extra parameters direct the actual search for target syllables in the spectrogram. +The additional parameters direct the actual search for target syllables in the spectrogram. -`MinHertz` and `MaxHertz` define the frequency band in which a search is to be made for the target event. Note that -these parameters define the bounds of the search band _not_ the bounds of the event itself. +- `MinHertz` and `MaxHertz` set the frequency band in which to search for the target event. Note that these parameters define the bounds of the search band, _not_ the bounds of the event itself. These limits are hard bounds. -`MinDuration` and `MaxDuration` set the minimum and maximum time duration (in seconds) of the target event. - -Each of these limits are are hard bounds. +- `MinDuration` and `MaxDuration` set the minimum and maximum time duration (in seconds) of the target event. These limits are hard bounds.
@@ -365,11 +378,9 @@ Each of these limits are are hard bounds. ### Adding profiles with algorithms -If your target syllable is not a chirp, you'll want to use a different algorithm. - For brevity, we've broken up the descriptions of each algorithm to their own pages. Some of these algorithms have extra parameters, some do not, but all do have the -[common parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters) we've previously discussed. +[common parameters](xref:AnalysisPrograms.Recognizers.Base.CommonParameters) we've previously described. | I want to find a | I'll use this algorithm | |------------------|------------------------------------------------------------------------------------------| @@ -385,22 +396,40 @@ Some of these algorithms have extra parameters, some do not, but all do have the ### [PostProcessing](xref:AudioAnalysisTools.Events.Types.EventPostProcessing.PostProcessingConfig) -Post-processing of events is performed after event detection. Post-processing is performed once for each of the DecibelThresholds. As an example: suppose you have three decibel thresholds (6, 9 and 12 dB is a typical set of values) in each of two profiles. There will be three rounds of post-processing: +Post-processing of events is performed after event detection. Post-processing is performed once for each of the DecibelThresholds. +As an example: suppose you have three decibel thresholds (6, 9 and 12 dB is a typical set of values) in each of two profiles. +There will be three rounds of post-processing: + - All the events detected at threshold 6 dB (by both profiles) will be collected together and subjected to the post-processing steps. Typically some or all of the events may fail to be accepted as "true" events based on your post-processing parameters. + - Next all the events detected at 9 dB will be collected and independently subjected to post-processing. + - Next all events detected at the 12 dB threshold will be post-processed. -This sequence of multiple post-processing steps gives rise to one or more temporally nested events. Think of them as Russion doll events! The final post-processing step is to remove all but the longest duration event in any nested set of events. +This sequence of multiple post-processing steps gives rise to one or more temporally nested events. Think of them as Russion doll events! +The final post-processing step is to remove all but the longest duration event in any nested set of events. [!code-yaml[post_processing](./Ecosounds.NinoxBoobook.yml#L34-L34 "Post Processing")] -Post processing is optional - you may decide to combine or filter the "raw" events using code you have written yourself. To add a post-processing section to your config file, insert the `PostProcessing` keyword and indent its parameters. There are five post-processing possibilities, each of which you may choose to use or not. Note that the post-processing steps are performed in the following order which cannot be changed by the user: - - Combine events having temporal _and_ spectral overlap. - - Combine possible sequences of events that constitute a "call". - - Remove (filter) events whose duration is outside an acceptable range. - - Remove (filter) events whose bandwidth is outside an acceptable range. - - Remove (filter) events having excessive acoustic activity in their sidebands. +Post processing is optional - you may decide to combine or filter the "raw" events using code you have written yourself. +To add a post-processing section to your config file, insert the `PostProcessing` keyword and indent its parameters. +There are five post-processing possibilities, each of which is optional. +However the order in which these steps are performed _cannot_ be changed by the user. The post-processing sequence is: + +> 1. Combine events having temporal _and_ spectral overlap. + +> 2. Combine possible sequences of events that constitute a "call". + +> 3. Remove (filter) events whose duration is outside an acceptable range. + +> 4. Remove (filter) events whose bandwidth is outside an acceptable range. + +> 5. Remove (filter) events having excessive acoustic activity in their sidebands. + > [!NOTE]: + If you do not wish to include a post-processing step, delete or comment out the key word and all its component parameters using a `#` at the start of each relevant line in the config file. + The only exception to this is to set boolean parameters to `false` where this option exists. + Removing a post-processing filter means that all events are accepted for that step. ### Combine events having temporal _and_ spectral overlap @@ -409,6 +438,7 @@ Post processing is optional - you may decide to combine or filter the "raw" even The `CombineOverlappingEvents` parameter is typically set to `true`, but it depends on the target call. You would typically set this to true for two reasons: - the target call is composed of two or more overlapping syllables that you want to join as one event. + - whistle events often require this step to unite whistle fragment detections into one event. @@ -420,68 +450,73 @@ Unlike overlapping events, if you want to combine a group of events (like syllab `SyllableStartDifference` and `SyllableHertzGap` set the allowed tolerances when combining events into sequences -- `SyllableStartDifference` sets the maximum allowed time difference (in seconds) between the starts of two events -- `SyllableHertzGap` sets the maximum allowed frequency difference (in Hertz) between the minimum frequencies of two events. +- `SyllableStartDifference` sets the maximum allowed time difference (in seconds) between the starts of two events. -> NOTE: In order to disable the combining of event sequences, "comment out" the `SyllableSequence` keyword by inserting the `#` symbol before it. You must also do the same for each of its five component parameters. +- `SyllableHertzGap` sets the maximum allowed frequency difference (in Hertz) between the minimum frequencies of two events. Once you have combined possible sequences, you may wish to remove sequences that do not satisfy the periodicity constraints for your target call, that is, the maximum number of syllables permitted in a sequence and the average time gap between syllables. To enable filtering on syllable periodicity, set `FilterSyllableSequence` to true and assign values to `SyllableMaxCount` and `ExpectedPeriod`. - `SyllableMaxCount` sets an upper limit on the number of events that constitute an allowed sequence. -- `ExpectedPeriod` sets an expectation value for the average period (in seconds) of an allowed combination of events. - > NOTE: When setting `ExpectedPeriod`, you are actually setting a permissible range of values for the period. The maximum permitted period will be the value assigned to `SyllableStartDifference` and the minimum period will be the `ExpectedPeriod` minus (`SyllableStartDifference` - `ExpectedPeriod`). For example: if `SyllableStartDifference` = 3 seconds and `ExpectedPeriod` = 2.5 seconds, then the minimum allowed period will be 2 seconds. +- `ExpectedPeriod` sets an expectation value for the average period (in seconds) of an allowed combination of events. - > NOTE: If you do not want to filter events on their periodicity, set `FilterSyllableSequence` to false. In this case, all events are accepted regardless of the periodicity of their component syllables. + > [!NOTE]: + > This property interacts with `SyllableStartDifference`. Refer to the following documentation for more information: + . -See the document for more information. ### Remove events whose duration is outside an acceptable range [!code-yaml[post_processing_filtering](./Ecosounds.NinoxBoobook.yml?start=34&end=62&highlight=20- "Post Processing: filtering")] Use the parameter `Duration` to filter out events that are too long or short. -This filter removes events whose duration lies outside three standard deviations (SDs) of an expected value. There are two parameters: +There are two parameters: + +- `ExpectedDuration` defines the _expected_ or _average_ duration (in seconds) for the target events. -- `ExpectedDuration` defines the _expected_ or _average_ duration (in seconds) for the target events -- `DurationStandardDeviation` defines _one_ SD of the assumed distribution. Assuming the duration is normally distributed, three SDs sets hard upper and lower duration bounds that includes 99.7% of instances. The filtering algorithm calculates these hard (3 SD) bounds and removes acoustic events that fall outside the bounds. +- `DurationStandardDeviation` defines _one_ SD of the assumed distribution. Refer to the following documentation for more information: . - > NOTE: If you do not want to filter events on their duration, comment out the `Duration` keyword _and both_ parameters with a `#`. Once `Duration` is commented out, all events are accepted regardless of their duration. ### Remove events whose bandwidth is outside an acceptable range Use the parameter `Bandwidth` to filter out events whose bandwidth is too small or large. -This filter removes events whose bandwidth lies outside three standard deviations (SDs) of an expected value. As with `Duration`, commenting out the `Bandwidth` keyword and its parameters will allow all events to be accepted regardless of their bandwidth. There are two parameters: +There are two parameters: -- `ExpectedBandwidth` defines the _expected_ or _average_ bandwidth (in Hertz) for the target events -- `BandwidthStandardDeviation` defines one SD of the assumed distribution. Assuming the bandwidth is normally distributed, three SDs sets hard upper and lower bandwidth bounds that includes 99.7% of instances. The filtering algorithm calculates these hard bounds and removes acoustic events that fall outside the bounds. +- `ExpectedBandwidth` defines the _expected_ or _average_ bandwidth (in Hertz) for the target events. + +- `BandwidthStandardDeviation` defines one SD of the assumed distribution. Refer to the following documentation for more information: . ### Remove events that have excessive noise or acoustic activity in their side-bands [!code-yaml[post_processing_sideband](./Ecosounds.NinoxBoobook.yml?start=34&end=69&highlight=30- "Post Processing: sideband noise removal")] The intuition of this filter is that an unambiguous event (representing a call or syllable) should have an "acoustic-free zone" above and below it. -This filter removes an event that has "excessive" acoustic activity spilling into its sidebands. Such events are likely to be _broadband_ events unrelated to the target event. Since this is a common occurrence, a sideband filter is useful. +This filter removes an event that has "excessive" acoustic activity spilling into its sidebands. +Such events are likely to be _broadband_ events unrelated to the target event. +Since this is a common occurrence, a sideband filter is useful. Use the keyword `SidebandAcousticActivity` to enable sideband filtering. There are four parameters, the first two set the width of the sidebands and the second two set decibel thresholds for the amount acoustic noise/activity in those sidebands. 1. `LowerSidebandWidth` sets the width of the desired sideband "zone" below the target event. + 2. `UpperSidebandWidth` sets the width of the desired sideband "zone" above the target event. There are two tests for determining if the acoustic activity in a sideband is excessive, each having a single parameter: 3. `MaxBackgroundDecibels` sets a threshold value for the maximum permitted background or average decibel value in each sideband. The average is taken over all spectrogram cells included in a sideband, excluding those adjacent to the event. -4. `MaxActivityDecibels` sets a threshold value for the maximum permitted average decibel value in any one frequency bin or timeframe of a sideband. The averages are over all relevant spectrogram cells in a frame or bin, excluding the cell adjacant to the event. This test covers the possibility that there is an acoustic event concentrated in a few frequency bins or timeframes within a sideband. Only one sideband bin or frame is allowed to contain acoustic activity exceeding the threshold. +4. `MaxActivityDecibels` sets a threshold value for the maximum permitted average decibel value in any one frequency bin or timeframe of a sideband. The averages are over all relevant spectrogram cells in a frame or bin, excluding the cell adjacant to the event. +This test covers the possibility that there is an acoustic event concentrated in a few frequency bins or timeframes within a sideband. +Only one sideband bin or frame is allowed to contain acoustic activity exceeding the threshold. > [!TIP] - > If you want to exclude a sideband or not perform a test, comment out its parameter with a `#`. In the example config file for _Ninox boobook_, two of the four parameters are commented as follows: + > To exclude a sideband or not perform a test, comment out its parameter with a `#`. In the example config file for _Ninox boobook_, two of the four parameters are commented as follows: ```yaml LowerSidebandWidth: 150 #UpperSidebandWidth: 200 MaxBackgroundDecibels: 12 #MaxActivityDecibels: 12 ``` - > This ensures that only one test (for background noise) will be performed on only one sideband (the lower). If you comment out the keyword `SidebandAcousticActivity` and its four parameters, all events will be accepted with no sideband tests performed. + > In this example, only one test (for background noise) will be performed on only one sideband (the lower). If no sideband tests are performed, all events will be accepted regardless of the acoustic activity in their sidebands. ### Parameters for saving results @@ -496,13 +531,16 @@ Each of the parameters controls whether extra diagnostic files are saved while d > that are in total larger than the input audio data—you will fill your harddrive quickly! - `SaveSonogramImages` will save a spectrogram for each analysed segment (typically one-minute) + - `SaveIntermediateWavFiles` will save the converted WAVE file used to analyze each segment Both parameters accept three values: - `Never`: disables the output. + - `WhenEventsDetected`: only outputs the spectrogram/WAVE file when an event is found in the current segment. This choice is the most useful for debugging a new recognizer. + - `Always`: always save the diagnostic files. Don't use this option if you're going to analyze a lot of files ### The completed example @@ -518,26 +556,36 @@ tune parameters in the sequence in which they appear in the config file, keeping or "unrestrictive" as possible. Here is a suggested tuning strategy: 1. Turn off all post-processing steps. That is, comment out all post-processing keywords/parameters AND set all post-processing booleans to false. + 2. Initially set all profile parameters so as to catch the maximum possible number of target calls/syllables. - 1. Set the array of decibel thresholds to cover the expected range of call amplitudes from minimum to maximum decibels. - 2. Set the minimum and maximum duration values to catch every target call by a wide margin. At this stage, do not + >a. Set the array of decibel thresholds to cover the expected range of call amplitudes from minimum to maximum decibels. + + >b. Set the minimum and maximum duration values to catch every target call by a wide margin. At this stage, do not worry that you are also catching a lot of false-positive events. - 3. Set the minimum and maximum frequency bounds to catch every target call by a wide margin. Once again, do not + + >c. Set the minimum and maximum frequency bounds to catch every target call by a wide margin. Once again, do not worry that you are also catching a lot of false-positive events. - 4. Set other parameters to their least "restrictive" values in order to catch maximum possible target events. + + >d. Set other parameters to their least "restrictive" values in order to catch maximum possible target events. At this point you should have "captured" all the target calls/syllables (i.e. there should be minimal false-negatives), _but_ you are likely to have many false-positives. 3. Gradually constrain the parameter bounds (i.e. increase minimum values and decrease maximum values) until you start to lose obvious target calls/syllables. Then back off so that once again you just capture all the target events—but you will still have several to many false-positives. + 4. Event combining: You are now ready to set parameters that determine the *post-processing* of events. The first post-processing steps combine events that are likely to be *syllables* that are part of the same *call*. + 5. Event Filtering: Now add in the event filters in the same sequence as they appear in the config file. This sequence cannot currently be changed because it is determined by the underlying code. There are event filters for duration, bandwidth, periodicity of component syllables within a call and finally acoustic activity in the sidebands of an event. - 1. Set the `periodicity` parameters for filtering based on syllable sequences. + + 1. Set the `periodicity` parameters for filtering events based on syllable sequences. + 2. Set the `duration` parameters for filtering events on their time duration. - 3. Set the `bandwidth` parameters for filtering events on their bandwidth. + + 3. Set the `bandwidth` parameters for filtering events on their bandwidth. + 4. Set the `SidebandAcousticActivity` parameters for filtering based on sideband _acoustic activity_. > [!NOTE] You are unlikely to want to use all filters. Some may be irrelevant to your target call. @@ -548,11 +596,10 @@ values so that the final FN/FP ratio reflects the relative costs of FN and FP er For example, lowering a decibel threshold may pick up more TPs but almost certainly at the cost of more FPs. > [!NOTE] -> A working DIY Call Recognizer can be built with just one example or training call. A machine learning algorithm -> typically requires 100 true and false examples. The price that you (the ecologist) pays for this simplicity is the -> need to exercise some of the "intelligence" that would otherwise be exercised by the machine learning algorithm. -> That is, you must select calls and set parameter values that reflect the variability of the target calls and the -> relative costs of FN and FP errors. +> A working DIY Call Recognizer can be built with just one example or training call. A machine learning algorithm typically requires 100 true and false examples. +> The price that you (the ecologist) pays for this simplicity is the +need to exercise some of the "intelligence" that would otherwise be exercised by the machine learning algorithm. +> That is, you must select calls and set parameter values that reflect the variability of the target calls and the relative costs of FN and FP errors. ## 8. Eight steps to building a DIY Call Recognizer @@ -563,18 +610,25 @@ We described above the steps required to tune parameter values in a recognizer c environment. If this is difficult, one trick to try is to play examples of your target call through a loud speaker in a location that is similar to your intended operational environment. You can then record these calls using your intended Acoustic Recording Unit (ARU). + 2. Assign parameter values into your config.yml file for the target call(s). + 3. Run the recognizer, using the command line described in the next section. + 4. Review the detection accuracy and try to determine reasons for FP and FN detections. + 5. Tune or refine parameter values in order to increase the detection accuracy. + 6. Repeat steps 3, 4 and 5 until you appear to have achieved the best possible accuracy. In order to minimize the number of iterations of stages 3 to 5, it is best to tune the configuration parameters in the sequence described in the previous section. + 7. At this point you should have a recognizer that performs "as accurately as possible" on your training examples. The next step is to test your recognizer on one or a few examples that it has not seen before. - That is, repeat steps 3, 4, 5 and 6 adding in a new example each time as they become available. It is also useful - at this stage to accumulate a set of recordings that do *not* contain the target call. See Section 10 for more - suggestions on building datasets. + That is, repeat steps 3, 4, 5 and 6 adding in a new example each time as they become available. + It is also useful at this stage to accumulate a set of recordings that do *not* contain the target call. + See Section 10 for more suggestions on building datasets. + 8. At some point you are ready to use your recognizer on recordings obtained from the operational environment. ## 9. Running a generic recognizer @@ -584,6 +638,7 @@ _AP_ performs several functions. Each function is selected by altering the comma For running a generic recognizer we need to to use the [`audio2csv`](xref:command-analyze-long-recording) command. - For an introduction to running commands see + - For detailed help on the audio2csv command see The basic form of the command line is: @@ -604,8 +659,9 @@ AnalysisPrograms.exe audio2csv birds.wav NinoxBoobook.yml BoobookResults --analy > The analysis-identifier (`--analysis-identifier` followed by the `"Ecosounds.GenericRecognizer"`) is required for > generic recognizers. Using `--analysis-identifier` informs _AP_ that this is generic recognition task and enables it to perform the correct analysis. -If you want to run your generic recognizer more than once, you might want to -[use powershell](xref:guides-scripting-pwsh) or [use R](xref:guides-scripting-r) to script _AP_. +If you want to run your generic recognizer more than once, you might want to use +[powershell](xref:guides-scripting-pwsh) or [R](xref:guides-scripting-r) to script _AP_. + ## 10. Building a larger data set @@ -619,6 +675,7 @@ effect the changes have on both data sets. Eventually, these two labelled data sets can be used for - validating the efficacy of your recognizer + - or for machine learning purposes. _Egret_ is software designed to assess large datasets for recognizer performance, in an **automated** fashion. diff --git a/docs/theory/spectrograms.md b/docs/theory/spectrograms.md index daa2edc65..ce30f2406 100644 --- a/docs/theory/spectrograms.md +++ b/docs/theory/spectrograms.md @@ -7,29 +7,37 @@ uid: theory-spectrograms A spectrogram is processed as a matrix of real values but visualized as a grey-scale image. Each row of pixels is a frequency bin and each column of pixels is a time-frame. The value in each spectrogram/matrix cell (represented visually by one image pixel) is the acoustic intensity in decibels with respect to the background noise baseline. Note that the decibel values in a noise-reduced spectrogram are always positive. -Throughout _AP_ you'll see references to spectrogram parameters, such as in the +Throughout _AP_ you'll see references to four spectrogram parameters, such as in the [parameters for generic recognizer algorithms](xref:AnalysisPrograms.Recognizers.Base.CommonParameters). +They are `FrameSize`, `FrameStep`, `WindowFunction` and `BgNoiseThreshold`. -`FrameSize` and `FrameStep` determine the time/frequency -resolution of the spectrogram. Typical values are 512 and 0 samples respectively. There is a trade-off between time -resolution and frequency resolution; finding the best compromise is really a matter of trial and error. -If your target syllable is of long duration with little temporal variation (e.g. a whistle) then `FrameSize` can be -increased to `1024` or even `2048`. +## FrameSize +Sets the size of the FFT window used to make the spectrogram. A good default value for detecting typical animal calls is `512`. If your target syllable is of long duration with little temporal variation (e.g. a one-second long bird whistle) then `FrameSize` can be increased to `1024` or even `2048`. > [!NOTE] > The value of `FrameSize` must be a power of 2. -To capture more temporal -variation in your target syllables, decrease `FrameSize` and/or decrease `FrameStep`. A typical `FrameStep` might be -half the `FrameSize` but does *not* need to be a power of 2. +> [!NOTE] +> `FrameSize` determines the time and frequency resolutions along the x-axis and y-axis (respectively) of the spectrogram. There is a trade-off between these; that is, increasing the resolution of one will decrease the resolution of the other. Finding the best compromise is really a matter of trial and error. + +## FrameStep +Sets the number of samples between the start of one frame and the next. Therefore it controls frame overlap. `FrameStep` must be less than `FrameSize` but need not be a power of 2. By default `FrameStep` equals `FrameSize` but it is frequently set to half the frame size. + +> [!NOTE] +> To capture more temporal +variation in your target syllables, decrease `FrameSize` and/or decrease `FrameStep`. + + -The default value for *WindowFunction* is `HANNING`. There should never be a need to change this but you might like to -try a `HAMMING` window if you are not satisfied with the appearance of your spectrograms. -## Noise reduction +## WindowFunction +Sets the FFT window function. It can be one of the values from . `Hanning` is the default because we find it the most versatile. +There should never be a need to change this but you might like to try a `HAMMING` window if you are not satisfied with the appearance of your spectrograms. -The "Bg" in `BgNoiseThreshold` means *background*. This parameter determines the degree of severity of noise removal -from the spectrogram. The units are decibels. Zero sets the least severe noise removal. It is the safest default value -and probably does not need to be changed. Increasing the value to say 3-4 decibels increases the likelihood that you -will lose some important components of your target calls. For more on the noise removal algorithm used by _AP_ see -[Towsey, Michael W. (2013) Noise removal from wave-forms and spectrograms derived from natural recordings of the environment.](https://eprints.qut.edu.au/61399/). \ No newline at end of file +## BgNoiseThreshold + Sets the degree of severity of noise removal from the spectrogram. + The "Bg" in `BgNoiseThreshold` means *background*. + The units are decibels. + Zero sets the least severe noise removal. This is the safest default value and probably does not need to be changed. + Increasing the value to say 3-4 decibels increases the likelihood that you will lose some important components of your target calls. For more on the noise removal algorithm used by _AP_ see +[Towsey, Michael W. (2013) Noise removal from wave-forms and spectrograms derived from natural recordings of the environment.](https://eprints.qut.edu.au/61399/). diff --git a/src/Acoustics.Shared/Extensions/EnumerableExtensions.cs b/src/Acoustics.Shared/Extensions/EnumerableExtensions.cs index 34e9be436..bffe94b9e 100644 --- a/src/Acoustics.Shared/Extensions/EnumerableExtensions.cs +++ b/src/Acoustics.Shared/Extensions/EnumerableExtensions.cs @@ -317,13 +317,12 @@ public static string Join(this IEnumerable items, string delimiter = " ") return result.ToString(0, result.Length - delimiter.Length); } - public static string JoinFormatted(this IEnumerable items, string delimiter = " ") + public static string JoinFormatted(this IEnumerable items, string delimiter = " ", string formatString = "{0:f2}") { var result = new StringBuilder(); foreach (var item in items) { - string number = string.Format("{0:f2}", item); - result.Append(number); + result.Append(string.Format(formatString, item)); result.Append(delimiter); } diff --git a/src/AudioAnalysisTools/Events/EventFilters.cs b/src/AudioAnalysisTools/Events/EventFilters.cs index 86be9e2cc..f6a4967e2 100644 --- a/src/AudioAnalysisTools/Events/EventFilters.cs +++ b/src/AudioAnalysisTools/Events/EventFilters.cs @@ -90,12 +90,12 @@ public static List FilterOnBandwidth(List events, doub var bandwidth = ((SpectralEvent)ev).BandWidthHertz; if ((bandwidth > minBandwidth) && (bandwidth < maxBandwidth)) { - Log.Debug($" Event{count} accepted: Actual bandwidth = {bandwidth}"); + Log.Debug($" Event[{count}] accepted: Actual bandwidth = {bandwidth}"); filteredEvents.Add(ev); } else { - Log.Debug($" Event{count} rejected: Actual bandwidth = {bandwidth}"); + Log.Debug($" Event[{count}] rejected: Actual bandwidth = {bandwidth}"); continue; } } @@ -178,12 +178,12 @@ public static List FilterOnDuration(List events, doubl var duration = ((SpectralEvent)ev).EventDurationSeconds; if ((duration > minimumDurationSeconds) && (duration < maximumDurationSeconds)) { - Log.Debug($" Event{count} accepted: Actual duration = {duration:F3}s"); + Log.Debug($" Event[{count}] accepted: Actual duration = {duration:F3}s"); filteredEvents.Add(ev); } else { - Log.Debug($" Event{count} rejected: Actual duration = {duration:F3}s"); + Log.Debug($" Event[{count}] rejected: Actual duration = {duration:F3}s"); continue; } } @@ -256,7 +256,8 @@ public static List FilterEventsOnSyllableCountAndPeriodicity(List maxSyllableCount) diff --git a/src/AudioAnalysisTools/Events/Types/EventPostProcessing.cs b/src/AudioAnalysisTools/Events/Types/EventPostProcessing.cs index 89f80cb36..527b3cd00 100644 --- a/src/AudioAnalysisTools/Events/Types/EventPostProcessing.cs +++ b/src/AudioAnalysisTools/Events/Types/EventPostProcessing.cs @@ -177,7 +177,10 @@ public class PostProcessingConfig } /// - /// The next two properties determine filtering of events based on their duration. + /// The two properties in this class determine filtering of events based on their duration. + /// The filter removes events whose duration lies outside three standard deviations (SDs) of an expected value. + /// Assuming the duration is normally distributed, three SDs sets hard upper and lower duration bounds that includes 99.7% of instances. + /// The filtering algorithm calculates these hard (3 SD) bounds and removes acoustic events that fall outside the bounds. /// public class DurationConfig { @@ -194,6 +197,9 @@ public class DurationConfig /// /// The next two properties determine filtering of events based on their bandwidth. + /// This filter removes events whose bandwidth lies outside three standard deviations (SDs) of an expected value. + /// Assuming the bandwidth is normally distributed, three SDs sets hard upper and lower bandwidth bounds that includes 99.7% of instances. + /// The filtering algorithm calculates these hard bounds and removes acoustic events that fall outside the bounds. /// public class BandwidthConfig { @@ -280,7 +286,11 @@ public class SyllableSequenceConfig /// Gets or sets a value indicating the expected periodicity in seconds. /// This value is used only where FilterSyllableSequence = true. /// Important Note: This property interacts with SyllableStartDifference. - /// SyllableStartDifference - ExpectedPeriod = 3 x SD of the period. + /// When setting ExpectedPeriod, you are actually setting a permissible range of values for the Period. + /// The maximum permitted period will be the value assigned to SyllableStartDifference. + /// The minimum period will be the ExpectedPeriod minus (SyllableStartDifference - ExpectedPeriod). + /// For example: if SyllableStartDifference = 3 seconds and ExpectedPeriod = 2.5 seconds, then the minimum allowed period will be 2 seconds. + /// THese bounds are hard bounds. /// public double ExpectedPeriod { get; set; }