Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
NeMo 2.0 SFT PEFT notebooks (#10874)
* nemo2-sft notebook initial draft Signed-off-by: HuiyingLi <[email protected]> * remove mixtral info Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> * add import_ckpt script and minor changes Signed-off-by: HuiyingLi <[email protected]> * Random read for tarr files in lhotse dataloaders (#10536) * Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Solve failled tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Adding a testcase Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * Some changs in tests Signed-off-by: Nune <[email protected]> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <[email protected]> * removing import Signed-off-by: Nune <[email protected]> --------- Signed-off-by: Nune <[email protected]> Signed-off-by: nune-tadevosyan <[email protected]> Co-authored-by: nune-tadevosyan <[email protected]> * training code for hybrid-autoregressive inference model (#10841) * training code for hybrid-autoregressive inference model Signed-off-by: Hainan Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: hainan-xv <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Signed-off-by: hainan-xv <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: hainan-xv <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Use trainer.local_rank/global_rank (#10860) * fix global_rank calculation Signed-off-by: Alexandros Koumparoulis <[email protected]> * use trainer's global/local rank Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stacking operation from batched functions (#10524) * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * restore batch_intilize states funcname Signed-off-by: lilithgrigoryan <[email protected]> * fix typo Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * fix potentially uninitialized local variable in stateless transduser Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix docstring, rm comment Signed-off-by: lilithgrigoryan <[email protected]> * fix dosctrings Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> * [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471) * Add llm.generate Signed-off-by: Hemil Desai <[email protected]> * Remove comment Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix launching with python Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add assert cp Signed-off-by: Hemil Desai <[email protected]> * Add example script Signed-off-by: Hemil Desai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Adding support for LightningDataModule inside Fabric-API (#10879) * Make FabricMegatronMixedPrecision match MegatronMixedPrecision Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Supporting DataModule in fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Adding support for LightningDataModule inside Fabric-API Signed-off-by: Marc Romeijn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> * Remove import in mock.py Signed-off-by: Marc Romeijn <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * initial draft Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Initial local run Signed-off-by: smajumdar <[email protected]> * Save yaml config for model in nemo.lightning.io (#10765) * Save yaml config for model in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * Fix bug Signed-off-by: Hemil Desai <[email protected]> * fix bug Signed-off-by: Hemil Desai <[email protected]> * Add explicit yaml comparison Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * relax test Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Move collectiob.nlp imports inline for t5 (#10877) * Move collectiob.nlp imports inline for t5 Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * add world_size/pp_size runtime check (#10842) * add world_size/pp_size runtime check Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix msg precision Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix test_init_parallel_ranks ws=3 pp=3 Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix peft resume (#10887) Signed-off-by: Chen Cui <[email protected]> * Update engine build step for TRT-LLM 0.13.0 (#10880) * Setting use_fused_mlp for TRT-LLM >= 0.13.0 Signed-off-by: Jan Lasek <[email protected]> * Unused import removal Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Akoumparouli/nemo ux moe loss logging (#10128) * Move across pipeline loss reduction to a separate function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add support for MoE loss logging Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused function Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * enable vboost and set LM SM margin (#10853) * enable vboost Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * env vars Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * add perf plugin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * revert default executor Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * fix typo Signed-off-by: Jimmy Zhang <[email protected]> * fix more typo Signed-off-by: Jimmy Zhang <[email protected]> * ln margin knob Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> * specify lm margin Signed-off-by: Jimmy Zhang <[email protected]> * Apply isort and black reformatting Signed-off-by: JimmyZhang12 <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: JimmyZhang12 <[email protected]> Co-authored-by: malay-nagda <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608) * use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device) Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Use torch sdpa implementation in ASR mha (#9590) * use pytorch sdpa Signed-off-by: WoodieDudy <[email protected]> * sdpa work Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: titu1994 <[email protected]> * sdpa flag to false & sdpa_backend arg Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * change arg name Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * fix config args Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * add condition on version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * update condition on version Signed-off-by: WoodieDudy <[email protected]> * remove condition on torch version Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * move code to init Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> * Apply isort and black reformatting Signed-off-by: WoodieDudy <[email protected]> * refactor Signed-off-by: WoodieDudy <[email protected]> --------- Signed-off-by: WoodieDudy <[email protected]> Signed-off-by: titu1994 <[email protected]> Signed-off-by: WoodieDudy <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: titu1994 <[email protected]> Co-authored-by: WoodieDudy <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861) * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Remove cyclic import Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: artbataev <[email protected]> * call __post_init__ after altering config values (#10885) * call __post_init__ after altering config values Signed-off-by: Alexandros Koumparoulis <[email protected]> * test fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * turn off SP Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> * Nemo 2.0 ckpt support in TRT-LLM export (#10891) * fix minor import bug Signed-off-by: Onur Yilmaz <[email protected]> * Add registry to register all needed classes with artifacts in nemo.lightning.io Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fixes Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Fix Signed-off-by: Hemil Desai <[email protected]> * nemo 2.0 support in export to trt-llm Signed-off-by: Onur Yilmaz <[email protected]> * get mixing from main Signed-off-by: Onur Yilmaz <[email protected]> * Apply isort and black reformatting Signed-off-by: oyilmaz-nvidia <[email protected]> * fix style Signed-off-by: Onur Yilmaz <[email protected]> --------- Signed-off-by: Onur Yilmaz <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: oyilmaz-nvidia <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: oyilmaz-nvidia <[email protected]> * [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171) * various simple docs source fixes Signed-off-by: Elena Rastorgueva <[email protected]> * fix docstrings and typing with forward reference Signed-off-by: Elena Rastorgueva <[email protected]> * Apply isort and black reformatting Signed-off-by: erastorgueva-nv <[email protected]> * fix typing forward reference for PromptedAudioToTextLhotseDataset Signed-off-by: Elena Rastorgueva <[email protected]> * fix feature warnings Signed-off-by: yaoyu-33 <[email protected]> * Try fix some model part errors Signed-off-by: yaoyu-33 <[email protected]> * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try add requirements Signed-off-by: yaoyu-33 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix indent in docstring Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * handle duplicate issue Signed-off-by: yaoyu-33 <[email protected]> * fix imagen cite * fix ratio issues Signed-off-by: yaoyu-33 <[email protected]> * fix Dreambooth Signed-off-by: yaoyu-33 <[email protected]> * Fix activation recomputation Signed-off-by: yaoyu-33 <[email protected]> * fix sequence packing Signed-off-by: yaoyu-33 <[email protected]> * fix asr_language_modeling_and_customization Signed-off-by: yaoyu-33 <[email protected]> * fixes wip Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: erastorgueva-nv <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: erastorgueva-nv <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Huiying Li <[email protected]> * calculate step time batch end-batch end (#10202) * log step time at end Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * use nemo logging Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * cleanup Signed-off-by: Malay Nagda <[email protected]> * check remove Signed-off-by: Malay Nagda <[email protected]> * delta timing callback Signed-off-by: Malay Nagda <[email protected]> * comment and name change Signed-off-by: Malay Nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * late import prettytable (#10912) Signed-off-by: Maanu Grover <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Warning for missing FP8 checkpoint support for vLLM deployment (#10906) Signed-off-by: Jan Lasek <[email protected]> * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821) * Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787) * Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix ASR tests (#10794) * Make tests required Signed-off-by: Vladimir Bataev <[email protected]> * Debug torch.load issue Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Run only necessary tests Signed-off-by: Vladimir Bataev <[email protected]> * Try fix loading Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid caching fixture Signed-off-by: Vladimir Bataev <[email protected]> * Try restore model several times Signed-off-by: Vladimir Bataev <[email protected]> * Try customize temporary directory Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Reorder tests Signed-off-by: Vladimir Bataev <[email protected]> * Disable one test Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Avoid xxlarge model Signed-off-by: Vladimir Bataev <[email protected]> * Disable test Signed-off-by: Vladimir Bataev <[email protected]> * Revert changes Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Magic fix Signed-off-by: Vladimir Bataev <[email protected]> * Revert unnecessary changes Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Disable all jobs except L0 Signed-off-by: Vladimir Bataev <[email protected]> * RNNT alignments - merge with unit tests Signed-off-by: Vladimir Bataev <[email protected]> * Fix CUDA graph frame-looping decoder to handle non-CUDA inputs Signed-off-by: Vladimir Bataev <[email protected]> * Fix config Signed-off-by: Vladimir Bataev <[email protected]> * Log test results Signed-off-by: Vladimir Bataev <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Use less audio files for tests Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: artbataev <[email protected]> * Integrating mcore export (#10238) * Integrating mcore export * Integrating mcore export * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Move trt imports in nemo.collections.llm inside respective functions (#10234) Signed-off-by: Hemil Desai <[email protected]> * Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <[email protected]> * Address code review Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <[email protected]> * Apply isort and black reformatting Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> * fix artifact load Signed-off-by: ashors1 <[email protected]> * fix path artifact Signed-off-by: ashors1 <[email protected]> * remove unused import Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ashors1 <[email protected]> * Add MemoryProfileCallback (#10166) * Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <[email protected]> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <[email protected]> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <[email protected]> * Update docstring Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> * Lower bound transformers to support nemotron (#10240) Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> * [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <[email protected]> Co-authored-by: Kuray107 <[email protected]> * Revert torchrun fix for model import (#10251) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [NeMo-UX[ Move nemotron imports inline (#10255) * Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * Wrap CPU model init with megatron_lazy_init_context (#10219) * Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <[email protected]> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Bump `Dockerfile.ci` (2024-08-22) (#10227) * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff ! Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix bert flags Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * salm export trtllm (#10245) Signed-off-by: slyne deng <[email protected]> Co-authored-by: slyne deng <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> * Load model in the target export precision by default in PTQ (#10267) * Load model in the target export precision by default Signed-off-by: Jan Lasek <[email protected]> * Enable megatron_amp_O2=true to actually use half-precision Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223) * Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Remove duplicate Signed-off-by: Hemil Desai <[email protected]> * Add entity to wandb logger Signed-off-by: Hemil Desai <[email protected]> * Add documentation Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add warning Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Add comments Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259) * handle absolute and relative logger directories Signed-off-by: Anna Shors <[email protected]> * merge lines Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: Anna Shors <[email protected]> Signed-off-by: ashors1 <[email protected]> * Add sdxl notebook (#10139) * Add sdxl notebook Signed-off-by: mingyuanm <[email protected]> * Rename Signed-off-by: mingyuanm <[email protected]> * final Update SDXL notebook Signed-off-by: mingyuanm <[email protected]> --------- Signed-off-by: mingyuanm <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Updating some coments * Small change * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * ADD support for layernorm1p * Apply isort and black reformatting Signed-off-by: shanmugamr1992 <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> * Update Dockerfile.ci Signed-off-by: Shanmugam Ramasamy <[email protected]> --------- Signed-off-by: shanmugamr1992 <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ShriyaPalsamudram <[email protected]> Signed-off-by: Shriya Rishab <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Signed-off-by: Pin-Jui Ku <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: slyne deng <[email protected]> Signed-off-by: oliver könig <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Anna Shors <[email protected]> Signed-off-by: mingyuanm <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: shanmugamr1992 <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: ashors1 <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ShriyaPalsamudram <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Kuray107 <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Slyne Deng <[email protected]> Co-authored-by: slyne deng <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> * Fix artifact saving (#10914) Signed-off-by: Hemil Desai <[email protected]> * Lora improvement (#10918) * pull out freeze model Signed-off-by: Chen Cui <[email protected]> * add wildcard match to lora target modules Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * Huvu/t5 nemo2.0 peft (#10916) * adding peft test and cicd * add setting mcore model to train in peft.py * adding test for T5 lora * fix follow Chen's fix * restore cicd-main.yml --------- Co-authored-by: Huy Vu2 <[email protected]> * Add tie_word_embeddings=True (#10710) Signed-off-by: Yoshi Suhara <[email protected]> * Use a context-manager when opening files (#10895) * Use a context-manager when opening files Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: artbataev <[email protected]> * long context performance numbers in doc (#10784) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm from __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * change the figure file name Signed-off-by: Youngeun Kwon <[email protected]> * Accommodating the reviewer's comment Signed-off-by: Youngeun Kwon <[email protected]> * update the y-axis title Signed-off-by: Youngeun Kwon <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294) * Add ModelOpt transformer model pruning example for Llama3 model Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * examples code is at wrong dir, move them Signed-off-by: Shengliang Xu <[email protected]> * changes as suggested in comment remove some logging and unused config code, update example model to llama3.1 Signed-off-by: Shengliang Xu <[email protected]> * Add pruning of hidden_size into example Signed-off-by: Shengliang Xu <[email protected]> * Apply isort and black reformatting Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Shengliang Xu <[email protected]> * Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml Signed-off-by: Keval Morabia <[email protected]> * Add pruning test to cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> * Update cicd-main.yml Signed-off-by: Keval Morabia <[email protected]> --------- Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Update mamba.rst after dist ckpt addition (#10800) Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix chunked infer (#10581) Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * fix state transform (#10728) Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * use ckpt_to_weights_subdir in restore (#10786) * use ckpt_to_weights_subdir in restore Signed-off-by: Alexandros Koumparoulis <[email protected]> * make ckpt_to_{weight,context}_subdir idempotent Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Mixtral set seq_length=4k (#10704) * enable SP & set seq_lenght=4k Signed-off-by: Alexandros Koumparoulis <[email protected]> * update test expected values Signed-off-by: Alexandros Koumparoulis <[email protected]> * 8x22b 4k Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792) * Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA Signed-off-by: Valerie Sarge <[email protected]> * Apply isort and black reformatting Signed-off-by: vysarge <[email protected]> --------- Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Co-authored-by: vysarge <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * Disable checkpoint conversion inside AutoResume (#10645) * Disable checkpoint conversion inside AutoResume Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> * Update resume docstrings Signed-off-by: Hemil Desai <[email protected]> * fix Signed-off-by: Hemil Desai <[email protected]> * add default finetuning recipe and refactor llama3 8b recipe Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * address comment Signed-off-by: Chen Cui <[email protected]> * refactor other recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * remove 8x3b finetuning recipe for now because HF version not available Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * adjust unit tests based on recipe fixes Signed-off-by: Chen Cui <[email protected]> * fix failed unit test Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * replace png file to github assets Signed-off-by: Youngeun Kwon <[email protected]> * change image url to github release Signed-off-by: Youngeun Kwon <[email protected]> --------- Signed-off-by: Youngeun Kwon <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Shengliang Xu <[email protected]> Signed-off-by: shengliangxu <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: vysarge <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: pablo-garay <[email protected]> Co-authored-by: Shengliang Xu <[email protected]> Co-authored-by: shengliangxu <[email protected]> Co-authored-by: Keval Morabia <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: vysarge <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: cuichenx <[email protected]> * perf recipes and Mcore DistOpt params (#10883) * 175b gpt3 recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * dist opt params Signed-off-by: Malay Nagda <[email protected]> * 405b dist opt params Signed-off-by: Malay Nagda <[email protected]> * perf recipes and dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE dist opt params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * gpt bias fusion params Signed-off-by: Malay Nagda <[email protected]> * 175b recipe Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * MoE perf params comments Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> * perf recipes suffix Signed-off-by: Malay Nagda <[email protected]> * specific models fusion params Signed-off-by: Malay Nagda <[email protected]> * Apply isort and black reformatting Signed-off-by: malay-nagda <[email protected]> --------- Signed-off-by: Malay Nagda <[email protected]> Signed-off-by: malay-nagda <[email protected]> Co-authored-by: malay-nagda <[email protected]> * ci: Fix cherry pick team (#10945) Signed-off-by: Oliver Koenig <[email protected]> * Packed sequence bug fixes (#10898) * save prepared dataset to different folders according to tokenizer name Signed-off-by: Chen Cui <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * fix hang Signed-off-by: Chen Cui <[email protected]> * raise mbs>1 error and provide suggestion to user instead of automatically changing config Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * add ci for packed seq Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix bug Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: artbataev <[email protected]> * Fix requirements for MacOS (#10930) Signed-off-by: Vladimir Bataev <[email protected]> * Fix nemo 2.0 recipes (#10915) * Fix recipe num_nodes and long context docstring * Fix typo * Fix PP issue * Fix unit test * Change recipes * fix test * Fix unit tests * Fix recipes * Add general legal test on parallelization settings * Rename test * Apply isort and black reformatting Signed-off-by: BoxiangW <[email protected]> --------- Signed-off-by: BoxiangW <[email protected]> Co-authored-by: BoxiangW <[email protected]> * Akoumparouli/nemo ux fix dir or string artifact (#10936) * Add __repr__ to Artifact Signed-off-by: Alexandros Koumparoulis <[email protected]> * nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration Signed-off-by: Alexandros Koumparoulis <[email protected]> * t5 test minification Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * ckpt convert bug fixes (#10878) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * bug fix Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * remove extra file Signed-off-by: dimapihtar <[email protected]> * remove extra changes Signed-off-by: dimapihtar <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * add ckpt_format configurable Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * revert changes Signed-off-by: dimapihtar <[email protected]> * Apply isort and black reformatting Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: artbataev <[email protected]> * fix typo in docstring (#10955) Signed-off-by: ashors1 <[email protected]> * remove deprecated ci tests (#10922) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: dimapihtar <[email protected]> * [Nemo CICD] Remove deprecated tests (#10960) * remove deprecated tutorial Signed-off-by: dimapihtar <[email protected]> * remove deprecated ci tests Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * add deprecation note Signed-off-by: dimapihtar <[email protected]> * remove bart tests Signed-off-by: dimapihtar <[email protected]> * Remove deleted CI tests --------- Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Pablo Garay <[email protected]> Co-authored-by: dimapihtar <[email protected]> * Adithyare/oai chat completion (#10785) * updates Signed-off-by: adithyare <[email protected]> * open ai chat completion wip Signed-off-by: adithyare <[email protected]> * responding with model responses Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> * also support general completion Signed-off-by: adithyare <[email protected]> * Apply isort and black reformatting Signed-off-by: arendu <[email protected]> --------- Signed-off-by: adithyare <[email protected]> Signed-off-by: arendu <[email protected]> Co-authored-by: arendu <[email protected]> * Update megatron_t5_pretraining.py (#10952) Signed-off-by: Huy Vu <[email protected]> * Convert perf plugin env vars to strings (#10947) Signed-off-by: Hemil Desai <[email protected]> * disable dynamo for ddp checker (#10961) Signed-off-by: Alexandros Koumparoulis <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pablo-garay <[email protected]> * Mistral-NeMo-12B recipe (#10607) * Mistral-NeMo-12B recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * rename mistral to mistral_7b Signed-off-by: Alexandros Koumparoulis <[email protected]> * include mistral_nemo_12b in __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * add to __init__ Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * Remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2 Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove finetune_reci[e Signed-off-by: Alexandros Koumparoulis <[email protected]> * Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion Signed-off-by: Alexandros Koumparoulis <[email protected]> * update config names in tests Signed-off-by: Alexandros Koumparoulis <[email protected]> * mistral-nemo-12b from llama_8b Signed-off-by: Alexandros Koumparoulis <[email protected]> * TP=2; SP=True Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix overlap value Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * update mistral-nemo-base-12b finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Make nemo text processing optional in TTS (#10584) * move TN guard to better location; make guard print error message rather than throwing error Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> * Forgot to add the actual normalizer Signed-off-by: Jason <[email protected]> * Apply isort and black reformatting Signed-off-by: blisc <[email protected]> --------- Signed-off-by: Jason <[email protected]> Signed-off-by: blisc <[email protected]> Co-authored-by: blisc <[email protected]> * respect warnings' filters (#10953) * respect warnings' filters Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972) * initial commit * restore t5_pretraining * Apply isort and black reformatting Signed-off-by: huvunvidia <[email protected]> --------- Signed-off-by: huvunvidia <[email protected]> Co-authored-by: Huy Vu2 <[email protected]> Co-authored-by: huvunvidia <[email protected]> * Alit/mamba recipe (#10935) * add some mamba recipe * add 130m * add the rest of the recipes * add tokenizer * add tokenizer * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * minor fix * add fixes to ssm for nemorun recipes * add hybrid tokenizer * updating some recipes * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove comments * update gbs * fix ckpt resume * fix ckpt resume * fix ckpt resume * update recipes final * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> * remove redundant imports * ckpt convertor dtype fix * Apply isort and black reformatting Signed-off-by: JRD971000 <[email protected]> --------- Signed-off-by: JRD971000 <[email protected]> Signed-off-by: Ali Taghibakhshi <[email protected]> Co-authored-by: JRD971000 <[email protected]> * Long context performance doc hot fix (#10946) * long context perf Signed-off-by: Youngeun Kwon <[email protected]> * update the long context perf Signed-off-by: Youngeun Kwon <[email protected]> * Akoumparouli/mcore microbatch calculator fix (#10780) * move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <[email protected]> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <[email protected]> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused var Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Youngeun Kwon <[email protected]> * remove 8x3b recipes (#10764) * remove 8x3b recipes Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove 8x3b from test_nemo_run Signed-off-by: Alexandros Koumparoulis <[email protected]> * rm fr…
- Loading branch information