Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Modular SpeechLLM implementation for Sept. 2023 submission (SALM) (#7634
) * add initial impl of ModularizedSpeechGPTModel and integration test * fix typo in the test name (#1) approve the nit change * clean a initial version of example config; make sure it works by test (#2) approve as no need to review * add the test for training_step and fix the code correspondingly (test passed now) (#3) * add test for validation_step (#4) * mv audio and text emb concat to prepare_llm_input so as to write test to guard the llm input * Merge heh and zhehuai's initial version of frozen am+llm (#5) * Merge heh and zhehuai's initial version of frozen am+llm The previous differences are summarized here: https://docs.google.com/document/d/1zNI4hC6vJtUfcHbrUSPaMuYWRBQdN_36H0P2NiBiuPY/edit This PR includes 1. Finish merging the model, dataset, and config code 2. Previous tests are still enabled and passed (prepare_llm_input, training_step, validation_step) 3. the example training script with LS960 has been run to make sure the training pipeline works The major remaining works are listed here https://docs.google.com/document/d/1o0AM7v4gcTQkPZjE0Vl9TTX4vYnGTrbXEFGWh0UhGlk/edit#bookmark=id.pzvdadt5oxyw --------- Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * fix a nit init bug broke test (#6) Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Clean up implementation for SALM paper and sync to NEMO v1.20.0 (#18) * wip Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix data Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix consumed_samples Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix the training restart problem by storing adapter+perception model and init them from the ckpt Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * refix state dict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support wer and inf Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * nan guard Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * reimpl inf and bug fix Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * multi loader Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * unfreeze lm Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * flag for load am Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * tokenizer Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * overwrite vocab size Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support bpe dropout Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * add tarred datasets Signed-off-by: stevehuang52 <heh@nvidia.com> * fix sample_alpha Signed-off-by: stevehuang52 <heh@nvidia.com> * fix bpe dropout bugs in the mismatched context in tokenization Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * add bleu metric Signed-off-by: stevehuang52 <heh@nvidia.com> * update metrics Signed-off-by: stevehuang52 <heh@nvidia.com> * support inference and fix a bug in wer calculation Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix bucketing dataset Signed-off-by: stevehuang52 <heh@nvidia.com> * fix bleu implementation Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support question set file per dataset/data loader in preparation for multitask understanding; also fix bleu implementation Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support simple random context for word boosting Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * use sacrebleu.corpus_bleu to be consistent with the rest Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * make audio_file optional in the data loader Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * add a tool to materialize mt and text data Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * compatible with tar dataset Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * temp fix for metric and speed up materialization Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * make num of context configurable Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * val_check_interval fix; make manifest dumping consistent with speech models Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * random_context_positive_ratio configurable to control precision Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * bug fix: freeze_llm flag is not passed to the model cfg Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * overwrite tensor_model_parallel_size Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * support both stt and ssl models for loading audio encoder Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * fix the inference config so as to use sampling; allow inference config update in training Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * refactorize and clean up code for preprocessing collections, dataset interface, model inference and rename some classes to be consistent with salm paper. also make sure test passed Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * Undo changes in megatron_gpt_peft_models.py and move them to speechllm_models.py; make sure the correctness by test_speechllm_models.py::TestModularizedAudioGPTModel::test_predict_step Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * update default inference config and test golden value accordingly Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * integration test and minor fix Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * nit bug fix on manifest_filepath introduced by code cleanup Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * update workspace/ files; consider moving to examples later Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * further remove unnecessary stuff in the inference implementation Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * revert the update in default end_string to be compatible with legacy models Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> --------- Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * rename 'ModularizedAudioGPTModel' to 'ModularAudioGPTLoRAModel'; move speechllm stuff under nemo/collections/multimodal/speechllm Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> * update copyright; remove workspace/scripts and workspace/tools folders since the main branch has LLaMA support Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> --------- Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Zhehuai Chen <chenzhehuai.sjtu@aispeech.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com>
- Loading branch information