Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor fix for missing chat attr #6671

Merged
merged 2 commits into from
May 18, 2023
Merged

minor fix for missing chat attr #6671

merged 2 commits into from
May 18, 2023

Conversation

arendu
Copy link
Collaborator

@arendu arendu commented May 17, 2023

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@arendu arendu requested a review from yidong72 May 17, 2023 23:08
@arendu arendu marked this pull request as ready for review May 17, 2023 23:08
@github-actions github-actions bot added the NLP label May 17, 2023
@arendu arendu merged commit 8aa80ee into main May 18, 2023
8 checks passed
@arendu arendu deleted the adithyare/chatfix branch May 18, 2023 02:49
KunalDhawan added a commit that referenced this pull request May 18, 2023
* Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY (#6549) (#6553)

* Added fastconfomer hybrid asr models for en, es, it, de, pl, hr, ua, by



* updated ASR docs with the fastconformer hybrid checkpoints



* added the fastconformer RNNT and CTC models



---------

Signed-off-by: KunalDhawan <[email protected]>
Co-authored-by: Kunal Dhawan <[email protected]>

* Add scores for FastConformer models (#6557) (#6558)

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Fix fp16 (#6543) (#6544)

Signed-off-by: MaximumEntropy <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Patch transcribe and support offline transcribe for hybrid model (#6550) (#6559)

Signed-off-by: fayejf <[email protected]>
Co-authored-by: fayejf <[email protected]>

* Fix notebook bad json (#6561)

Signed-off-by: smajumdar <[email protected]>

* Change Megatron Enc Dec model to use persistent_workers (#6548) (#6552)

* persistent workers



* fix



---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Make KenLM with PC for AggregateTokenizer and merge it (#6081)

* do_lowercase, rm_punctuation

Signed-off-by: Nikolay Karpov <[email protected]>

* support beam_strategy = beam

Signed-off-by: Nikolay Karpov <[email protected]>

* black

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix config and^Cunctuation capitalization

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm math

Signed-off-by: Nikolay Karpov <[email protected]>

* update kenlm

Signed-off-by: Nikolay Karpov <[email protected]>

* black

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add opengrm

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* mv install_beamsearch_decoders

Signed-off-by: Nikolay Karpov <[email protected]>

* punctuation_to_preserve

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Only tikenizer opion

Signed-off-by: Nikolay Karpov <[email protected]>

* Black

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* DEFAULT_TOKEN_OFFSET

Signed-off-by: Nikolay Karpov <[email protected]>

* aggregate_tokenizer

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install kenlm with more than 5gram

Signed-off-by: Nikolay Karpov <[email protected]>

* install_beamsearch_decoders

Signed-off-by: Nikolay Karpov <[email protected]>

* ngram_bin_path kenlm_bin_path

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* black

Signed-off-by: Nikolay Karpov <[email protected]>

* fix greedy PC bug

Signed-off-by: Nikolay Karpov <[email protected]>

* move global params

Signed-off-by: Nikolay Karpov <[email protected]>

* fix description and perplexity

Signed-off-by: Nikolay Karpov <[email protected]>

* fix description

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* NEMO_PATH

Signed-off-by: Nikolay Karpov <[email protected]>

* nemo:23.01

Signed-off-by: Nikolay Karpov <[email protected]>

* License

Signed-off-by: Nikolay Karpov <[email protected]>

* description

Signed-off-by: Nikolay Karpov <[email protected]>

* isinstance

Signed-off-by: Nikolay Karpov <[email protected]>

* refactor kenlm stdin

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* black

Signed-off-by: Nikolay Karpov <[email protected]>

* add cmd arg

Signed-off-by: Nikolay Karpov <[email protected]>

* use new iter_files

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* EncDecHybridRNNTCTCModel

Signed-off-by: Nikolay Karpov <[email protected]>

* punctuation

Signed-off-by: Nikolay Karpov <[email protected]>

* train_kenlm args

Signed-off-by: Nikolay Karpov <[email protected]>

* add docstrings

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add ngram_merge docs

Signed-off-by: Nikolay Karpov <[email protected]>

* ngram_prune

Signed-off-by: Nikolay Karpov <[email protected]>

* rename to ngram_merge

Signed-off-by: Nikolay Karpov <[email protected]>

* rename to ngram

Signed-off-by: Nikolay Karpov <[email protected]>

* add comments

Signed-off-by: Nikolay Karpov <[email protected]>

* Ngram

Signed-off-by: Nikolay Karpov <[email protected]>

* nemo_model_file

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* install_opengrm_ngram

Signed-off-by: Nikolay Karpov <[email protected]>

* install opengrm

Signed-off-by: Nikolay Karpov <[email protected]>

* rename to install_opengrm.sh

Signed-off-by: Nikolay Karpov <[email protected]>

* rm extra import

Signed-off-by: Nikolay Karpov <[email protected]>

* train_paths

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* text_processing

Signed-off-by: Nikolay Karpov <[email protected]>

* fix ngram_bin_path

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* DECODERS_PATH

Signed-off-by: Nikolay Karpov <[email protected]>

* farcompile

Signed-off-by: Nikolay Karpov <[email protected]>

* rm text processing

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* text_processing

Signed-off-by: Nikolay Karpov <[email protected]>

* AggregateTokenizer.DummyTokenizer

Signed-off-by: Nikolay Karpov <[email protected]>

* comments

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* TextProcessingConfig

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo

Signed-off-by: Nikolay Karpov <[email protected]>

* doc

Signed-off-by: Nikolay Karpov <[email protected]>

* types

Signed-off-by: Nikolay Karpov <[email protected]>

* nemo_model_file

Signed-off-by: Nikolay Karpov <[email protected]>

* rm assert

Signed-off-by: Nikolay Karpov <[email protected]>

* import kenlm_utils

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* return None

Signed-off-by: Nikolay Karpov <[email protected]>

* Copyright

Signed-off-by: Nikolay Karpov <[email protected]>

* 2022

Signed-off-by: Nikolay Karpov <[email protected]>

* 2023

Signed-off-by: Nikolay Karpov <[email protected]>

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* temp rtd fix (#6568) (#6569)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* [TTS] Add script for mapping speaker names to indices (#6509)

Signed-off-by: Ryan <[email protected]>

* whitespace (#6574)

Signed-off-by: Nikolay Karpov <[email protected]>

* Update manifest.py for speedup (#6565) (#6573)

* Update manifest.py

Re-order the checks for faster processing audio filepaths that are already absolute paths



* Update manifest.py



---------

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>

* More streaming conformer export fixes (#6567) (#6578)

Signed-off-by: Greg Clark <[email protected]>
Co-authored-by: Greg Clark <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>

* user selected max_seq_len should be less than model's max_seq_len (#6333) (#6386)

* user selection should not break model max limit



* eval max seq length



---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Framework for PEFT via mixins  (#6391)

* init commit ptuning via mixin

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates

Signed-off-by: arendu <[email protected]>

* gpt ptuning places virtual tokens on the left only

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* encoder input modified when pre_process is true

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* optimizer group and state dict updates

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adapter ptuning working for pp>1

Signed-off-by: arendu <[email protected]>

* adapter defaults

Signed-off-by: arendu <[email protected]>

* adapter ptuining config defaults

Signed-off-by: arendu <[email protected]>

* training works

Signed-off-by: arendu <[email protected]>

* loading and saving adapter only params during training

Signed-off-by: arendu <[email protected]>

* added checks and comments

Signed-off-by: arendu <[email protected]>

* clean up

Signed-off-by: arendu <[email protected]>

* checks for grad is None before calling all_reduce

Signed-off-by: arendu <[email protected]>

* load adapter .nemo file working

Signed-off-by: arendu <[email protected]>

* resume training for adapters

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* peft tuning

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor

Signed-off-by: arendu <[email protected]>

* file not needed

Signed-off-by: arendu <[email protected]>

* undo prompt learning dataset changes

Signed-off-by: arendu <[email protected]>

* undo updates to gpt prompt learning model

Signed-off-by: arendu <[email protected]>

* naming updates

Signed-off-by: arendu <[email protected]>

* decoding

Signed-off-by: arendu <[email protected]>

* predict_step in gpt_sft_model

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed inference from tuning config

Signed-off-by: arendu <[email protected]>

* no test in peft training

Signed-off-by: arendu <[email protected]>

* answer only loss and correct defaults for val_loss

Signed-off-by: arendu <[email protected]>

* hybrid adapters and ptuning

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* eval working..

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* prepending tokens for ptuning

Signed-off-by: arendu <[email protected]>

* cleaned up eval config

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: arendu <[email protected]>

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* default prompt template

Signed-off-by: arendu <[email protected]>

* Lora added

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Support synamic length with GPT SFT

Signed-off-by: Abhinav Khattar <[email protected]>

* make branch functional

Signed-off-by: Abhinav Khattar <[email protected]>

* defaults to max_pad_length=False in GPT SFT dataset

Signed-off-by: arendu <[email protected]>

* adapter parallel_adapters to support Lora

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added early stopping by default

Signed-off-by: arendu <[email protected]>

* eval script for peft and eval config. bug fixes in predict step and added out_features to t5 adapter config

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* docs

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* better defaults

Signed-off-by: arendu <[email protected]>

* updates

Signed-off-by: arendu <[email protected]>

* update

Signed-off-by: arendu <[email protected]>

* docs

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <[email protected]>

* cache and reuse inputs (#6422) (#6452)

Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Add patches for Virtual Parallel conversion (#6589)

* Add patches for Virtual Parllel conversion

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pass `.scale` instead of scaler object to core (#6551)

* pass .scale instead of scaler object to core (#6545)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Update megatron_gpt_model.py

Signed-off-by: Abhinav Khattar <[email protected]>

* scale changes for main

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Documentation for ASR-TTS models (#6594) (#6595)

* Add docs about hybrid ASR-TTS models



* Add docs about text-only datasets



* Add docs about ASR-TTS checkpoints



* Add docs about ASR-TTS configs and training



* Clean up



* ASR-TTS docs: add to api, fix imports



* Clean up



* Wrap optional import



* Revert general ASR import



---------

Signed-off-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>

* [TTS] Fix aligner nan loss in fp32 (#6435)

* Fix nan loss in fp32

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: hsiehjackson <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update SDP docs (#6485) (#6596)

* add info about SDP e.g. processor classes in docs



* add link to SDP docs in README



* address code review comments and add SDP overview diagram



* Fix spelling typo



---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>

* Bug/typo fixes (#6599)

Signed-off-by: Igor Gitman <[email protected]>

* Manual garbage collection with an interval (#6469) (#6482)

* Manual garbage collection with an interval



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use trainer.global_step for tracking the interval of GC



---------

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* Make tensor split contiguous (#6580) (#6593)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* [ASR] Fix for old models in change_attention_model (#6608)

* fixes

Signed-off-by: sam1373 <[email protected]>

* done already

Signed-off-by: sam1373 <[email protected]>

---------

Signed-off-by: sam1373 <[email protected]>

* Update manifest.py to use os.path for get_full_path (#6598)

* Update manifest.py to use os.path for get_full_path

Signed-off-by: He Huang (Steve) <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update manifest.py to get rid of pathlib

Signed-off-by: He Huang (Steve) <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update manifest.py

Signed-off-by: He Huang (Steve) <[email protected]>

* Update manifest.py

Signed-off-by: He Huang (Steve) <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <[email protected]>

* Cherry pick commits in #6601 to main (#6611)

* fix write

Signed-off-by: fayejf <[email protected]>

* decoding ctc

Signed-off-by: fayejf <[email protected]>

* temp set rnnt decoding return_best_hypothesis to true

Signed-off-by: fayejf <[email protected]>

* add wer cal back to transcribe_speech as requested

Signed-off-by: fayejf <[email protected]>

* add wer cal back to speech_to_text_buffered_infer_rnnt  as requested

Signed-off-by: fayejf <[email protected]>

* add wer cal back to speech_to_text_buffered_infer_ctc as requested

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* reflect change in asr_evaluator

Signed-off-by: fayejf <[email protected]>

* reflect som and vahid comment

Signed-off-by: fayejf <[email protected]>

* remove return_best_hy=true in transcribe_speech

Signed-off-by: fayejf <[email protected]>

* no text skip

Signed-off-by: fayejf <[email protected]>

* revert partial

Signed-off-by: fayejf <[email protected]>

---------

Signed-off-by: fayejf <[email protected]>

* Create dummy iters to satisy len checks (#6600) (#6603)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* add GPT eval mode fix for interleaved to main (#6610)

Signed-off-by: Abhinav Khattar <[email protected]>

* Fix batch size reconf for T5 FT for multi-validation (#6582) (#6588)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Not doing CastToFloat by default (#6524) (#6563)

* Not doing CastToFloat by default



* Added docustring



* Dummy commit



---------

Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Turn autocast off when precision is fp32 (#6576)

* Turn autocast off when precision is fp32 (#6554)

* Turn autocast off when precision is fp32

Signed-off-by: Abhinav Khattar <[email protected]>

* address review

Signed-off-by: Abhinav Khattar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Abhinav Khattar <[email protected]>

* merge

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* correct auto-merge

Signed-off-by: Abhinav Khattar <[email protected]>

* correct auto-merge

Signed-off-by: Abhinav Khattar <[email protected]>

* add to GPT SFT

Signed-off-by: Abhinav Khattar <[email protected]>

---------

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>

* update core commit hash in readme (#6622) (#6623)

Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* add hat image to docs (#6619) (#6621)

Signed-off-by: andrusenkoau <[email protected]>
Co-authored-by: Andrei Andrusenko <[email protected]>

* Allow indices exchange via distributed (#6618) (#6624)

Signed-off-by: Mikołaj Błaż <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>

* Offline and streaming inference support for hybrid model (#6570)

* streaming buffered for hybrid + ctc

Signed-off-by: fayejf <[email protected]>

* change default model_stride in eval.yaml

Signed-off-by: fayejf <[email protected]>

* add fc model_stride

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* check whether model and decoding match

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* streaming buffered for hybrid + rnnt

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* fix yaml

Signed-off-by: fayejf <[email protected]>

* reflect comment wip

Signed-off-by: fayejf <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: fayejf <[email protected]>

* refactor and verified

Signed-off-by: fayejf <[email protected]>

* add get_full_path to buffered

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* add RNNTDecodingConfig

Signed-off-by: fayejf <[email protected]>

* model name & instruction of changing decoding

Signed-off-by: fayejf <[email protected]>

---------

Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Patch decoding for PC models (#6630) (#6631)

* Patch decoding logic for PC models



* Patch decoding logic for PC models



---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>

* Fix wer.py where 'errors' variable was not set (#6633) (#6634)

Fix wer.py where 'errors' variable was not set when both reference and hypothesis are empty strings

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>

* Restore GPT support for interleaved pipeline parallelism (#6528) (#6613)

* Restore logic for data-parallel communication with pipeline parallelism in GPT



* Support dynamic attention masks in GPT



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Debug typos



* Debug data iterator caching with interleaved pipeline parallelism

Each model chunk accesses the data iterator multiple times, so we need to cache multiple samples.



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Megatron-LM commit



* Distinguish between list of data iterators and data iterator that is a list



* Create dummy iters to satisy len checks



* Kludge while waiting for Megatron-LM update



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set transformers offline to avoid rate limiting



---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Eric Harper <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>

* bugfix (#6636)

Signed-off-by: fayejf <[email protected]>

* Disable interctc tests (#6638)

Signed-off-by: Igor Gitman <[email protected]>

* Add megatron_core to requirements (#6639) (#6640)

* add megatron_core to requirements



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove from jenkins (#6642)

* Remove from jenkins (#6641)

* add megatron_core to requirements

Signed-off-by: ericharper <[email protected]>

* remove from jenkins

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove dup

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* sft model can use this script for eval (#6637)

* sft model can use this script for eval

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* please fix me

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Fix TTS audio preprocessing bugs (#6628)

Signed-off-by: Ryan <[email protected]>

* Move black parameters to pyproject.toml (#6647)

Signed-off-by: Vladimir Bataev <[email protected]>

* ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. (#6620)

* ASR-TTS: support hybrid RNNT-CTC models
* Do not warn on optional import
* Explain adding options to config
* Fix import guard docs
* Add docs for ConcatDataset
* Add explanation for sampling parameters
* Initial docs for the enhancer model
* Fix use_start_end_token parameter usage

---------

Signed-off-by: Vladimir Bataev <[email protected]>

* fix conversion and eval (#6648)

* fix conversion and eval

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Confidence ensembles implementation (#6614)

* Working version to train conf model + save ensemble class

Signed-off-by: Igor Gitman <[email protected]>

* Working version

Signed-off-by: Igor Gitman <[email protected]>

* Remove copy of transcribe_speech.py

Signed-off-by: Igor Gitman <[email protected]>

* Move models parameter to config

Signed-off-by: Igor Gitman <[email protected]>

* Add explicit parameters to transcribe

Signed-off-by: Igor Gitman <[email protected]>

* Small cleanups

Signed-off-by: Igor Gitman <[email protected]>

* Add temperature and integration tests

Signed-off-by: Igor Gitman <[email protected]>

* Add more tests

Signed-off-by: Igor Gitman <[email protected]>

* Add pc removal config

Signed-off-by: Igor Gitman <[email protected]>

* Cleanup

Signed-off-by: Igor Gitman <[email protected]>

* Fix typo

Signed-off-by: Igor Gitman <[email protected]>

* Address review comments

Signed-off-by: Igor Gitman <[email protected]>

---------

Signed-off-by: Igor Gitman <[email protected]>

* Patch memory used for NeMo Megatron models (#6615)

* Patch memory used for NeMo Megatron models

Signed-off-by: smajumdar <[email protected]>

* Cleanup the dtype of embeddings

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor util function for parsing precision

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor util function for parsing precision

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Try patch for Megatron O2

Signed-off-by: smajumdar <[email protected]>

* Refactor to incorporate megatron amp 02 state

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor to incorporate megatron amp 02 state

Signed-off-by: smajumdar <[email protected]>

* Correct indent

Signed-off-by: smajumdar <[email protected]>

* Correct utils import

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* handle artifacts when path is dir (#6658)

Signed-off-by: arendu <[email protected]>

* remove upgrading setuptools in reinstall.sh (#6659)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: fayejf <[email protected]>

* merge lora weights into base model (#6597)

* merge lora weights into base model

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo fix

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor update

Signed-off-by: arendu <[email protected]>

* update copyright

Signed-off-by: arendu <[email protected]>

* eval needs to know the PEFT class

Signed-off-by: arendu <[email protected]>

* add target class in training script so that we can use it in eval

Signed-off-by: arendu <[email protected]>

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update to work for tp1

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set restore model path

Signed-off-by: arendu <[email protected]>

* peft can be none

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated merge script so that eval works easily

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* eval with peft or sft model

Signed-off-by: arendu <[email protected]>

* keep sentences in jsonl format

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* convert sft using correct classpath

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated to force sft yaml to have the correct target

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updated docs

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix conversion and eval

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* upgrade to 23.04 (#6660)

Signed-off-by: ericharper <[email protected]>

* Merge r1.18.0 bugfixes and doc updates to main (#6655)

* update branch

Signed-off-by: ericharper <[email protected]>

* Remove from jenkins (#6641)

* add megatron_core to requirements

Signed-off-by: ericharper <[email protected]>

* remove from jenkins

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>

* remove dup

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* [TTS] reformat NeMo versions in the tts logging messages to avoid batch process them when upgrading NeMo versions.

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>

* Confidence ensembles: fix issues and add tuning functionality (#6657)

* Implement compute confidence to properly handle blanks

Signed-off-by: Igor Gitman <[email protected]>

* Implement proper confidence for transducers

Signed-off-by: Igor Gitman <[email protected]>

* Implement tuning logic

Signed-off-by: Igor Gitman <[email protected]>

* Add tests for confidence tuning

Signed-off-by: Igor Gitman <[email protected]>

* Remove unused imports

Signed-off-by: Igor Gitman <[email protected]>

* Add types/docs

Signed-off-by: Igor Gitman <[email protected]>

* Add comment about the main conf compute loop

Signed-off-by: Igor Gitman <[email protected]>

---------

Signed-off-by: Igor Gitman <[email protected]>

* [TTS] Implement new TextToSpeech dataset (#6575)

* [TTS] Implement new TextToSpeech dataset

Signed-off-by: Ryan <[email protected]>

* [TTS] Add unit tests

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix defaulting of use_log_energy

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix TTS export test

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

* Dialogue dataset  (#6654)

* chatbot interface

Signed-off-by: Yi Dong <[email protected]>

* latest gradio

Signed-off-by: Yi Dong <[email protected]>

* default greedy

Signed-off-by: Yi Dong <[email protected]>

* better chatbot

Signed-off-by: Yi Dong <[email protected]>

* handle preamble

Signed-off-by: Yi Dong <[email protected]>

* added chatbot training capablity

Signed-off-by: Yi Dong <[email protected]>

* added chatbot ui

Signed-off-by: Yi Dong <[email protected]>

* remove debug code

Signed-off-by: Yi Dong <[email protected]>

* default human

Signed-off-by: Yi Dong <[email protected]>

* use special token for roles

Signed-off-by: Yi Dong <[email protected]>

* special tokens

Signed-off-by: Yi Dong <[email protected]>

* fix name

Signed-off-by: Yi Dong <[email protected]>

* new chat dataset

Signed-off-by: Yi Dong <[email protected]>

* fix the system token

Signed-off-by: Yi Dong <[email protected]>

* upgrade gradio

Signed-off-by: Yi Dong <[email protected]>

* save the chat history

Signed-off-by: Yi Dong <[email protected]>

* update ui

Signed-off-by: root <[email protected]>

* update chat interface

Signed-off-by: Yi Dong <[email protected]>

* handles canonical form

Signed-off-by: Yi Dong <[email protected]>

* new sft chatbot

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change format

Signed-off-by: Yi Dong <[email protected]>

* check extra_id in the tokenizer

Signed-off-by: Yi Dong <[email protected]>

* added vocab property check

Signed-off-by: Yi Dong <[email protected]>

* added missing file

Signed-off-by: Yi Dong <[email protected]>

---------

Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: root <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <[email protected]>

* Add support for RNNT/hybrid models to partial transcribe (#6609)

* Add support for RNNT/hybrid models to partial transcribe

Signed-off-by: He Huang (Steve) <[email protected]>

* Update transcribe_utils.py

Signed-off-by: He Huang (Steve) <[email protected]>

* Update transcribe_speech.py

Signed-off-by: He Huang (Steve) <[email protected]>

* Update transcribe_utils.py

Signed-off-by: He Huang (Steve) <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* eval_beamsearch_ngram.py with hybrid ctc (#6656)

* separate_punctuation = false

* ctc decoding strategy = model.decoding

* transcribe(files, logprobs=True) returns logprobs



---------

Signed-off-by: Nikolay Karpov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix bucketing bug issue for picking new bucket (#6663)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* minor fix for missing chat attr (#6671)

Signed-off-by: arendu <[email protected]>

* [TTS] Add callback for saving audio during FastPitch training (#6665)

* [TTS] Add callback for saving audio during FastPitch training

Signed-off-by: Ryan <[email protected]>

* [TTS] Allow NGC model name for vocoder

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: KunalDhawan <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Ryan <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Igor Gitman <[email protected]>
Signed-off-by: Sangkug Lym <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: andrusenkoau <[email protected]>
Signed-off-by: Mikołaj Błaż <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Eric Harper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ryan Langman <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: Greg Clark <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: Igor Gitman <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Andrei Andrusenko <[email protected]>
Co-authored-by: mikolajblaz <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants