Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention encoder-decoder models for multiple speech-to-text tasks … #8324

Merged
merged 1 commit into from
Feb 3, 2024

Commits on Feb 3, 2024

  1. Attention encoder-decoder models for multiple speech-to-text tasks (#…

    …8242)
    
    * Rebasing canary changes at current main
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Move the changes from asr transformer to nlp transformer as originally intended
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * update eval to strip spaces before punctuations
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * update pc strip
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247)
    
    * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`.
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252)
    
    * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    ---------
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Move tokenization into `prompt_format_fn`, fix usage, add docs
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Backward-compatible utterance validation
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Improve type annotations
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * config and prompt_fn registration changes from review
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    ---------
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * fix transcribe config
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Refactor Canary to follow schema of remaining ASR models (#8260)
    
    * Initial draft of multi task beam decoding strategy
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Stabilize inference
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config
    
    Signed-off-by: smajumdar <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Add change decoding strategy
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Remove redundant imports
    
    Signed-off-by: smajumdar <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Cleanup
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Cleanup
    
    Signed-off-by: smajumdar <[email protected]>
    
    * remove asr transformer dependency on nlp
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * clean up
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * copy token_classifier from nlp to asr
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Address comments
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Add typing to beam decoding
    
    Signed-off-by: smajumdar <[email protected]>
    
    * Make prompt format configurable
    
    Signed-off-by: smajumdar <[email protected]>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * drop asr dependency on nlp
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    ---------
    
    Signed-off-by: smajumdar <[email protected]>
    Signed-off-by: stevehuang52 <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: stevehuang52 <[email protected]>
    
    * fix transcribe, update asr evaluator
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Extend the docs for the canary prompt_fn
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Incorporate changes from Nithin's code review
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * training bug fix and adding launch script for speech_multitask (#8270)
    
    * bug fix and adding launch script for speech_multitask
    
    Signed-off-by: Krishna Puvvada <[email protected]>
    
    * update launch script example in speech_to_text_aed.py
    
    Signed-off-by: Krishna Puvvada <[email protected]>
    
    ---------
    
    Signed-off-by: Krishna Puvvada <[email protected]>
    Co-authored-by: Krishna Puvvada <[email protected]>
    
    * Fix: drop_last must be true in validation/test otherwise the training will hang
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * revert to current transcribe API
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * revert changes to NLP, update docs
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * update eval utils
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * update docs
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Remove DALI; rename compute_audio_loss to compute_loss
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * set default use_model_transcribe=False
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * change os.path.dirname to pathlib
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * [canary] Test for CanaryTokenizer + refactoring (#8285)
    
    * Test for CanaryTokenizer
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Attempt at refactor...
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    ---------
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Update config for AED models (#8294)
    
    Signed-off-by: smajumdar <[email protected]>
    
    * set default calculate_wer=False in transcribe_speech.py
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Attention encoder-decoder models for multiple speech-to-text tasks
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Apply suggestions from code review, part 1
    
    Co-authored-by: Nithin Rao <[email protected]>
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Apply suggestions from code review, part 2
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * Document compute_loss
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    * update transcribe_speech.py
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * add docstring
    
    Signed-off-by: stevehuang52 <[email protected]>
    
    * Attention encoder-decoder models for multiple speech-to-text tasks
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    
    ---------
    
    Signed-off-by: Piotr Żelasko <[email protected]>
    Signed-off-by: stevehuang52 <[email protected]>
    Signed-off-by: smajumdar <[email protected]>
    Signed-off-by: Krishna Puvvada <[email protected]>
    Signed-off-by: Piotr Żelasko <[email protected]>
    Co-authored-by: stevehuang52 <[email protected]>
    Co-authored-by: Somshubra Majumdar <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Krishna Puvvada <[email protected]>
    Co-authored-by: Krishna Puvvada <[email protected]>
    Co-authored-by: He Huang (Steve) <[email protected]>
    Co-authored-by: Nithin Rao <[email protected]>
    (cherry picked from commit d10726d)
    pzelasko authored and titu1994 committed Feb 3, 2024
    Configuration menu
    Copy the full SHA
    7d24821 View commit details
    Browse the repository at this point in the history