Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR evaluator #5728

Merged
merged 38 commits into from
Jan 11, 2023
Merged

ASR evaluator #5728

merged 38 commits into from
Jan 11, 2023

Conversation

fayejf
Copy link
Collaborator

@fayejf fayejf commented Jan 3, 2023

What does this PR do ?

A tool to thoroughly evaluate an ASR model.

Collection: ASR

Changelog

  • Simple step to evaluate a model in all three modes currently supported by NeMo: offline, chunked, and offline_by_chunked.
  • On-the-fly data augmentation (silence, noise, etc.,) for ASR robustness evaluation.
  • Investigate the model's performance by detailed insertion, deletion, and substitution error rates for each and all samples.
  • Evaluate models' reliability on different target groups such as gender, and audio length if metadata is presented.

Usage

python asr_evaluator.py \
engine.pretrained_name="stt_en_conformer_transducer_large" \
engine.inference_mode.mode="offline" \
engine.test_ds.augmentor.noise.manifest_path=<manifest file for noise data>

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Note because the PR is getting bigger, on-the-fly augmentation for buffered inference mode would be included in another PR soon.

Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
@github-actions github-actions bot added the ASR label Jan 3, 2023
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
@fayejf fayejf marked this pull request as ready for review January 6, 2023 07:02
Copy link
Collaborator

@stevehuang52 stevehuang52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, just a few small comments. We should make sure that scripts inside nemo.collection do not depend on scripts in examples/ and functions in NeMo be language-agnostic.

@@ -182,6 +186,9 @@ def transcribe(
'channel_selector': channel_selector,
}

if augmentor_config:
config['augmentor_config'] = augmentor_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'augmentor_config' can be simplified to just 'augmentor', since it's part of the config variable we can drop the _config suffix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. updated.

if (cfg.model_path and cfg.pretrained_name) or (not cfg.model_path and not cfg.pretrained_name):
raise ValueError("Please specify either cfg.model_path or cfg.pretrained_name!")

if cfg.inference_mode.mode == "offline":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg.inference_mode.mode seems a bit redundant, can we change to just cfg.inference.mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

f"model_stride={cfg.inference_mode.model_stride} ",
shell=True,
check=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unsafe for things inside NeMo collections to depend on NeMo example scripts, since examples may not be installed or can be lost/modified easily. If this function has to use something from the example folder, it's better to move the whole function to an example scripts other than putting it in NeMo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! Moved eval_utils.py to utils.py in tools/asr_evaluator

f"eval_config_yaml={temp_eval_config_yaml_file} ",
shell=True,
check=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, It's unsafe for things inside NeMo collections to depend on NeMo example scripts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

cfg.output_filename = model_name + "-" + dataset_name + "-" + mode_name + ".json"

temp_eval_config_yaml_file = "temp_eval_config.yaml"
with open(temp_eval_config_yaml_file, "w") as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider (but not mandatory) using tempfile.NamedTemporaryFile, which automatically manages temp files:

import tempfile
with tempfile.NamedTemporaryFile(mode='w', encoding='utf-8') as f:
    OmegaConf.save(cfg, f)
    f.seek(0)  # reset file pointer
    subprocess.run(
        xxxxxxx
        eval_config_yaml=f.name
    )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! updated

return cfg, total_res


def target_metadata_wer(manifest: str, target: str, meta_cfg: DictConfig, eval_metric: str = "wer",) -> dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cal_target_metadata_wer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


words = sample["words"]
wer_each_class[target_class]["words"] += words
wer_each_class[target_class]["errors"] += words * sample[eval_metric]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to check whether the input eval_metric is supported, and raise exceptions if it's not valid

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

logging.info(f"metadata '{target}' does not present in manifest. Skipping! ")
return None

values = ['samples', 'words', 'errors', 'inss', 'dels', 'subs']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some doctrings on each value to explain what each one means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in function docstring

logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')

# Set and save random seed and git hash for reproducibility
random.seed(cfg.env.random_seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about numpy and pytorch random seeds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Need to fix rng in perturb in another PR.

)
cfg = run_chunked_inference(cfg)

elif cfg.inference_mode.mode == "offline_by_chunked":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the fundamental difference between "chunked" and "offline_by_chunked"? It seems that the only difference is only that "offline_by_chunked" will set default parameters while "chunked" will not. Could you please add more docstrings to explain the different use cases?

Copy link
Collaborator Author

@fayejf fayejf Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you said is totally correct. Added more explanation in docstring.

tools/asr_evaluator/asr_evaluator.py Fixed Show fixed Hide fixed
tools/asr_evaluator/utils.py Fixed Show fixed Hide fixed
- **ENGINE**. To conduct ASR inference.
- **ANALYST**. To evaluate model performance based on predictions.

In Analyst, you can evaluate on all metadata if it presents in manifest. For exmaple, you can evaluate the peformance of model based on duration of each sample, such as how's the model peforms on samples smaller than 5s and longer than 5s by [[0,5][5,100000]] and get wer/cer of each slot. Or how's the model performs on postive (happy, laugh) or neural (neural) or negative mood (sad) as below. And if you set save_wer_per_class=True, it will calculate wer for all (i.e. above 5 classes + cry) classes presented in the data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean "neutral"? The meanings for metadata needs more explanation. For example, we can say that, with the following config, we can calculate WERs for audios in different interval groups, where each group (in seconds) is defined by [[0,2],[2,5],[5,10],[10,20],[20,100000]]. Also, we calculate the WERs for three groups of emotions, where each group is defined by [['happy','laugh'],['neural'],['sad']] (words in the same group will be treated as the same).


emotion:
enable: True
slot: [['happy','laugh'],['neural'],['sad']] # we could have 'cry' in data but not in slot we focus on.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean "neutral"?

Copy link
Collaborator

@stevehuang52 stevehuang52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@fayejf fayejf merged commit c6b1ea5 into main Jan 11, 2023
@fayejf fayejf deleted the asr_evaluator_engine branch January 11, 2023 16:57
erastorgueva-nv pushed a commit that referenced this pull request Jan 12, 2023
* backbone

Signed-off-by: fayejf <[email protected]>

* engineer and analyzer

Signed-off-by: fayejf <[email protected]>

* offline_by_chunked

Signed-off-by: fayejf <[email protected]>

* test_ds wip

Signed-off-by: fayejf <[email protected]>

* temp remove inference

Signed-off-by: fayejf <[email protected]>

* mandarin yaml

Signed-off-by: fayejf <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* augmentor and a few updates

Signed-off-by: fayejf <[email protected]>

* address alerts and revert unnecessary changes

Signed-off-by: fayejf <[email protected]>

* Add readme

Signed-off-by: fayejf <[email protected]>

* rename

Signed-off-by: fayejf <[email protected]>

* typo fix

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add missing header

Signed-off-by: fayejf <[email protected]>

* rename augmentor_config to augmentor

Signed-off-by: fayejf <[email protected]>

* raname inference_mode to inference

Signed-off-by: fayejf <[email protected]>

* move utils.py

Signed-off-by: fayejf <[email protected]>

* update temp file

Signed-off-by: fayejf <[email protected]>

* make wer cer clear

Signed-off-by: fayejf <[email protected]>

* seed_everything

Signed-off-by: fayejf <[email protected]>

* fix missing rn augmentor_config in rnnt

Signed-off-by: fayejf <[email protected]>

* fix rnnt transcribe

Signed-off-by: fayejf <[email protected]>

* add more docstring and style fix

Signed-off-by: fayejf <[email protected]>

* address codeQL

Signed-off-by: fayejf <[email protected]>

* reflect comments

Signed-off-by: fayejf <[email protected]>

* update readme

Signed-off-by: fayejf <[email protected]>

* clearer

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
* backbone

Signed-off-by: fayejf <[email protected]>

* engineer and analyzer

Signed-off-by: fayejf <[email protected]>

* offline_by_chunked

Signed-off-by: fayejf <[email protected]>

* test_ds wip

Signed-off-by: fayejf <[email protected]>

* temp remove inference

Signed-off-by: fayejf <[email protected]>

* mandarin yaml

Signed-off-by: fayejf <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* augmentor and a few updates

Signed-off-by: fayejf <[email protected]>

* address alerts and revert unnecessary changes

Signed-off-by: fayejf <[email protected]>

* Add readme

Signed-off-by: fayejf <[email protected]>

* rename

Signed-off-by: fayejf <[email protected]>

* typo fix

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add missing header

Signed-off-by: fayejf <[email protected]>

* rename augmentor_config to augmentor

Signed-off-by: fayejf <[email protected]>

* raname inference_mode to inference

Signed-off-by: fayejf <[email protected]>

* move utils.py

Signed-off-by: fayejf <[email protected]>

* update temp file

Signed-off-by: fayejf <[email protected]>

* make wer cer clear

Signed-off-by: fayejf <[email protected]>

* seed_everything

Signed-off-by: fayejf <[email protected]>

* fix missing rn augmentor_config in rnnt

Signed-off-by: fayejf <[email protected]>

* fix rnnt transcribe

Signed-off-by: fayejf <[email protected]>

* add more docstring and style fix

Signed-off-by: fayejf <[email protected]>

* address codeQL

Signed-off-by: fayejf <[email protected]>

* reflect comments

Signed-off-by: fayejf <[email protected]>

* update readme

Signed-off-by: fayejf <[email protected]>

* clearer

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants