Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update KWS, MSNoise, Signalmixer Data Loaders & Evaluation Notebook, Add New Scripts for Mixed Signals #299

Merged
merged 108 commits into from
Jun 28, 2024

Conversation

EyubogluMerve
Copy link
Contributor

Updates

  • Updates on kws20.py:

    • Dataset functions for signalmixer are added to the KWS data loader.
    • White Noise application is updated using SNR levels (generated between [-5, 20] dB).
  • Updates on msnoise.py:

    • Unnecessary reading operations are removed (filter_classes & filter_dtype)
    • A probability array is added as a parameter to MSnoise class. (Holds the noise type probabilities)
    • Naming differences in the raw Train/Test data are handled during download operation.
  • Updates on signalmixer.py:

    • Several parameters are implemented:
      apply_prob: noise application probability (0.8, default)
      snr_range: SNR range to be applied to the dataset ([-5 ,10] dB SNR, default)
      noise_type: array that contains the noise type names (All Noise Types, default)
  • Evaluation Notebook for KWS is updated.

  • Scripts for KWS NAS & v3 models are added for mixing with MSnoise applications.

@EyubogluMerve
Copy link
Contributor Author

For kws20.py benchmark dataset option is added. According get_dataset function is added. benchmark parameter is set to False in order to get the regular kws20 dataset. Scripts for benchmark train + evaluate is added to the PR.

Copy link
Contributor

@ermanok ermanok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions for code refactor.

datasets/kws20.py Outdated Show resolved Hide resolved
datasets/signalmixer.py Show resolved Hide resolved
datasets/msnoise.py Outdated Show resolved Hide resolved
datasets/signalmixer.py Outdated Show resolved Hide resolved
datasets/kws20.py Outdated Show resolved Hide resolved
datasets/kws20.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ermanok ermanok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add --print-freq 100 to the training scripts? Now, the logs are too long to go over...

@alicangok
Copy link
Contributor

alicangok commented Jun 17, 2024

@EyubogluMerve, could you please update your PR with the following patch I created?: MervePR.patch Thank you.

List of changes:

  • class names now follow the original dataset convention (e.g. SILENCE -> _silence_)
  • training and validation examples for the _silence_ class now come from all possible _background_noise_ types, instead of exclusively using running_tap for validation (as we discussed in Teams)
  • the benchmark testing dataset now includes the _silence_ samples
  • for the MSNoise dataset, the download parameter is now set to True
  • modifications and additions to various comments and printed logs

Note: This PR contains the evaluate_kws12_nas_benchmark.sh script, which requires a model checkpoint for the 12-class case which currently does not exist in the ongoing PR to the synthesis repo. We will either need to remove this script from this PR, or add the trained checkpoint file to the synthesis repo. (If we decide on the latter, I will make a separate PR to the synthesis repo)

Copy link
Contributor

@rotx-eva rotx-eva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make "data_type" into constants and add comment to index field merge

@alicangok
Copy link
Contributor

alicangok commented Jun 24, 2024

@EyubogluMerve, could you apply MervePR2.patch and update the PR? Other than that, it LGTM.

As @rotx-maxim requested here, this patch introduces constants to deal with the "data_type"s and replaces the | symbol with the explicit torch.logical_or() function for the index merging operation for better readability.

@analogdevicesinc analogdevicesinc deleted a comment from github-actions bot Jun 26, 2024
@rotx-eva rotx-eva changed the title Updates on KWS, MSnoise, Signalmixer Data Loaders & Evaluation Notebook, New Scripts for Mixed Signals Update KWS, MSNoise, Signalmixer Data Loaders & Evaluation Notebook, Add New Scripts for Mixed Signals Jun 27, 2024
@rotx-eva rotx-eva merged commit ba6c02b into analogdevicesinc:develop Jun 28, 2024
3 checks passed
rotx-eva added a commit that referenced this pull request Jul 5, 2024
* Update KWS, MSNoise, Signalmixer Data Loaders & Evaluation Notebook, Add New Scripts for Mixed Signals (#299)
* Remove librosa
* Update README
* Remove .keys() from kws20.py
* Remove noise_type argument from kws20 get_datasets()

---------

Co-authored-by: Merve Eyuboglu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants