Skip to content

Commit 5dd8afa

Browse files
yaoyu-33suiyoubicuichenxmeatybobbyyashaswikarnati
committed
Add LLama32 Vision Model Support in Nemo 2.0 (NVIDIA#10763)
* add initial code for llama vlm Signed-off-by: yaoyu-33 <[email protected]> * some restructure Signed-off-by: yaoyu-33 <[email protected]> * add mock data placeholder Signed-off-by: yaoyu-33 <[email protected]> * Fix some importing Signed-off-by: yaoyu-33 <[email protected]> * add language component for vlm llama * update code Signed-off-by: yaoyu-33 <[email protected]> * now match num of params * update language part and fix vision part Signed-off-by: yaoyu-33 <[email protected]> * minor fix Signed-off-by: yaoyu-33 <[email protected]> * model can now init Signed-off-by: yaoyu-33 <[email protected]> * minor update for llama32 text config Signed-off-by: yaoyu-33 <[email protected]> * make checkpoint loading work * missing import * match vision part tensor shapes with configs Signed-off-by: yaoyu-33 <[email protected]> * solve some fwd issues and mismatch issues Signed-off-by: yaoyu-33 <[email protected]> * add vision import * fixes Signed-off-by: yaoyu-33 <[email protected]> * update importer to convert both text and image weights * importer typos and reduce clutter * fix import qkv * some fixes for LLM Signed-off-by: yaoyu-33 <[email protected]> * Add embedding * some updates Signed-off-by: yaoyu-33 <[email protected]> * enable loading only text or only vision * add example script * TP fix Signed-off-by: yaoyu-33 <[email protected]> * update * upload examples Signed-off-by: yaoyu-33 <[email protected]> * update generate Signed-off-by: yaoyu-33 <[email protected]> * update to newer version Signed-off-by: yaoyu-33 <[email protected]> * upload for sharing * update to new pyt ckpt * xattn_caches matches (except small differences due to TE RMSNorm) * cleanup * embeddings match * match precision of weights * update sharded state dict Signed-off-by: yaoyu-33 <[email protected]> * change xattn layer num to 3 7 11 etc * upload llama generation * minor fix * fix dummy layer input format * fix vision qkv order * fix shareded state dict Signed-off-by: yaoyu-33 <[email protected]> * fix vision precision * fix rope * match cross attn layer * remove nrep * Remove cross attention in ImageTransformerLayer and fix _gate_ffn * PP draft Signed-off-by: yaoyu-33 <[email protected]> * Fix intermediate tensor * temp save for pp2 is working Signed-off-by: yaoyu-33 <[email protected]> * fix pp issues Signed-off-by: yaoyu-33 <[email protected]> * merge * update mcore parallelism initialization Signed-off-by: yaoyu-33 <[email protected]> * small update to pretrain script Signed-off-by: yaoyu-33 <[email protected]> * update mcore parallelism initialization Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * added energon dataloader for neva training (NVIDIA#10451) * added energon dataloader for neva training * Apply isort and black reformatting Signed-off-by: yashaswikarnati <[email protected]> * specify global batch size to support grad accumulation * adding neva pretrain example * Apply isort and black reformatting Signed-off-by: yashaswikarnati <[email protected]> * change pretraine example to handle new ckpt reloading * fixed code quality warnings and unused imports Signed-off-by: ykarnati <[email protected]> * minor changes for PR comments * Apply isort and black reformatting Signed-off-by: yashaswikarnati <[email protected]> * refactor conversation template config * Apply isort and black reformatting Signed-off-by: yashaswikarnati <[email protected]> * remove optional import --------- Signed-off-by: yashaswikarnati <[email protected]> Signed-off-by: ykarnati <[email protected]> Co-authored-by: yashaswikarnati <[email protected]> (cherry picked from commit 7354740) * llama energon dataloader * have tokenizer for base task encoder class * Update megatron_init.py Signed-off-by: Yu Yao <[email protected]> * Add simple inference * evian3 update Signed-off-by: yaoyu-33 <[email protected]> * add encoder parallel default config Signed-off-by: yaoyu-33 <[email protected]> * add encoder parallel default config Signed-off-by: yaoyu-33 <[email protected]> * clean up Signed-off-by: yaoyu-33 <[email protected]> * add aspect ratio in model * support energon dataloader * some pp update Signed-off-by: yaoyu-33 <[email protected]> * fixes Signed-off-by: yaoyu-33 <[email protected]> * fix kv merging Signed-off-by: yaoyu-33 <[email protected]> * fix get_key_value_tensors Signed-off-by: yaoyu-33 <[email protected]> * rename files Signed-off-by: yaoyu-33 <[email protected]> * update to HF style position embedding Signed-off-by: yaoyu-33 <[email protected]> * fix energon dataloader and support batching * update forward args Signed-off-by: yaoyu-33 <[email protected]> * clean up and move to aspect_ratio_ids Signed-off-by: yaoyu-33 <[email protected]> * rename back to language.py Signed-off-by: yaoyu-33 <[email protected]> * fix loss function Signed-off-by: yaoyu-33 <[email protected]> * update and fix energon Signed-off-by: yaoyu-33 <[email protected]> * Add hf import * Fix type * Change config * update energon pretrain Signed-off-by: yaoyu-33 <[email protected]> * clean up * clean up * reformat Signed-off-by: yaoyu-33 <[email protected]> * update inference files for new code * update to instruct * update to instruct * update few names Signed-off-by: yaoyu-33 <[email protected]> * update generation Signed-off-by: yaoyu-33 <[email protected]> * fix importer embedding.weight * few fixes Signed-off-by: yaoyu-33 <[email protected]> * add hf script Signed-off-by: yaoyu-33 <[email protected]> * fix kv import * remove interleaved * fixes and updates Signed-off-by: yaoyu-33 <[email protected]> * lora fixes Signed-off-by: yaoyu-33 <[email protected]> * some code clean ups Signed-off-by: yaoyu-33 <[email protected]> * update training scripts Signed-off-by: yaoyu-33 <[email protected]> * refactors Signed-off-by: yaoyu-33 <[email protected]> * add LoRA finetuning * fixes and nemo update Signed-off-by: yaoyu-33 <[email protected]> * fix importer registering issue by adding 11B and 90B configs * update `decoder_seq_len` Signed-off-by: yaoyu-33 <[email protected]> * science vqa script Signed-off-by: yaoyu-33 <[email protected]> * clean up script name Signed-off-by: yaoyu-33 <[email protected]> * fix ckpt save serialization issue * fix predefined config classes * add num_chunks in input Signed-off-by: yaoyu-33 <[email protected]> * fix format Signed-off-by: yaoyu-33 <[email protected]> * update finetuning scripts for PEFT * add 11b recipe (need NVIDIA#10645 to test) * fix mask generation Signed-off-by: yaoyu-33 <[email protected]> * minor fix code style Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Support no image inference * add llama svqa eval * fix masking Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix generation Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * add 90b recipe and revise 11b recipe * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean up typing * add option to disable vision padding * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * base model finetuning (does not work yet) * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fixed default conversation template config for MLLama * Update svqa * add multinode * bot happy * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Perf improvements. Mainly from XAttn mask calculation (NVIDIA#10901) * Perf improvements. Mainly from XAttn mask calculation * Apply isort and black reformatting Signed-off-by: parthmannan <[email protected]> --------- Signed-off-by: parthmannan <[email protected]> Co-authored-by: parthmannan <[email protected]> * fix existing issues Signed-off-by: yaoyu-33 <[email protected]> * fix scripts Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix lora * few fixes for non image support Signed-off-by: yaoyu-33 <[email protected]> * update masking gen Signed-off-by: yaoyu-33 <[email protected]> * update lazy dataset Signed-off-by: yaoyu-33 <[email protected]> * fix data sampler and loading issue Signed-off-by: yaoyu-33 <[email protected]> * Add vlm generation * Apply isort and black reformatting Signed-off-by: meatybobby <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * generation update Signed-off-by: yaoyu-33 <[email protected]> * update lazy dataset Signed-off-by: yaoyu-33 <[email protected]> * Fix _strategy_lib.py Signed-off-by: Yu Yao <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix warning Signed-off-by: yaoyu-33 <[email protected]> * hide vlm examples Signed-off-by: yaoyu-33 <[email protected]> * Revert "Add vlm generation" This reverts commit 4711c75 Signed-off-by: yaoyu-33 <[email protected]> * Fix VisionEncoder multi-batch bug * update mcore parallelism initialization Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Update megatron_init.py Signed-off-by: Yu Yao <[email protected]> * add encoder parallel default config Signed-off-by: yaoyu-33 <[email protected]> * Fix _strategy_lib.py Signed-off-by: Yu Yao <[email protected]> * llm.generate fixes (NVIDIA#10983) * fix context path, disable optimizer init, add tp Signed-off-by: HuiyingLi <[email protected]> * format Signed-off-by: HuiyingLi <[email protected]> * address comments, require user to provide trainer Signed-off-by: HuiyingLi <[email protected]> * minor fix Signed-off-by: HuiyingLi <[email protected]> * minor fixes Signed-off-by: HuiyingLi <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> * use __dict__ in check (NVIDIA#11012) * check is_hf_model in leaf module Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> * disable getattr alternative path Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * undo; Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * LoRA support for HF::AutoModelForCausalLM (NVIDIA#10982) * add LinearAdapter Signed-off-by: Alexandros Koumparoulis <[email protected]> * add hf lora example Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove unused imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * subclass mixin Signed-off-by: Alexandros Koumparoulis <[email protected]> * remove stale imports Signed-off-by: Alexandros Koumparoulis <[email protected]> * undo Signed-off-by: Alexandros Koumparoulis <[email protected]> * fix scale Signed-off-by: Alexandros Koumparoulis <[email protected]> * regex selector for peft Signed-off-by: Alexandros Koumparoulis <[email protected]> * move lora Signed-off-by: Alexandros Koumparoulis <[email protected]> * fmt Signed-off-by: Alexandros Koumparoulis <[email protected]> * hf_auto_model_for_causal_lm finetune recipe Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Change default for always_save_context to True (NVIDIA#11014) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Pablo Garay <[email protected]> * Add a build option to load_context (NVIDIA#10713) * Add a build option to load_context Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Adding test Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Trying to fix failing CPU test Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * cherry-pick fix Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> * Fix pip install (NVIDIA#11026) * Move AutoTokenizer inline Signed-off-by: Marc Romeyn <[email protected]> * Move einops to common requirements Signed-off-by: Marc Romeyn <[email protected]> * Move AutoTokenizer import to top-level again in fine_tuning Signed-off-by: Marc Romeyn <[email protected]> * Move megatron init inside nemo.lightning Signed-off-by: Marc Romeyn <[email protected]> * Make megatron_lazy_init_context work when transformer-engine is not installed Signed-off-by: Marc Romeyn <[email protected]> * Only import get_nmt_tokenizer when needed Signed-off-by: Marc Romeyn <[email protected]> * Apply isort and black reformatting Signed-off-by: marcromeyn <[email protected]> --------- Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Co-authored-by: marcromeyn <[email protected]> * [WIP] Add docs for NEST SSL (NVIDIA#10804) * add docs Signed-off-by: stevehuang52 <[email protected]> * update doc and fix missing param Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> * Change dist ckpt defaults (NVIDIA#10913) * Enable ckpt features by default (async ckpt), ckpt every 15mins and reduce preemption time to 1min Signed-off-by: Shriya Palsamudram <[email protected]> * fix ssm tests Signed-off-by: Shriya Palsamudram <[email protected]> * Make note that ckpt_async_save is disabled for SSMs Signed-off-by: Shriya Palsamudram <[email protected]> * Enable async ckpt for SSMs with fix Signed-off-by: Shriya Palsamudram <[email protected]> * Disable async ckpt in the peft test as it is a known bug, add note. Signed-off-by: Shriya Palsamudram <[email protected]> * Fix failing unit tests Signed-off-by: Shriya Palsamudram <[email protected]> * Ashors/peft async ckpt (NVIDIA#11010) * [WIP] prototype for supporting async checkpointing with peft Signed-off-by: ashors1 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> * Enable async ckpt for the peft test Signed-off-by: Shriya Palsamudram <[email protected]> * Fix peft setup test Signed-off-by: Shriya Palsamudram <[email protected]> --------- Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ashors1 <[email protected]> Co-authored-by: ataghibakhsh <[email protected]> * Akoumparouli/mixtral recipe fix r2.0.0 (NVIDIA#10994) * Mixtral TP8 EP1 Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Fix _strategy_lib tests (NVIDIA#11033) * fix world size and don't mock Signed-off-by: Maanu Grover <[email protected]> * cleanup global state Signed-off-by: Maanu Grover <[email protected]> * check app state instead Signed-off-by: Maanu Grover <[email protected]> * fix syntax nemo logger test Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Update `BaseMegatronSampler` for compatibility with PTL's `_BatchProgress` (NVIDIA#11016) * Revert "[NeMo-UX] Use custom `BatchProgress` class which does not restore states (NVIDIA#10383)" This reverts commit b5798de. * make megatron sampler return the total number of batches in the dataset Signed-off-by: ashors1 <[email protected]> --------- Signed-off-by: ashors1 <[email protected]> * PTQ example for NeMo 2.0 (NVIDIA#10642) * initial commit Signed-off-by: Piotr Kaminski <[email protected]> * create Quantizer for NeMo 2.0 Signed-off-by: Piotr Kaminski <[email protected]> * refactor Signed-off-by: Piotr Kaminski <[email protected]> * Call quantize on an unwrapped mcore model Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Add tests, adjust unwrapping Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * fix export Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * Apply isort and black reformatting Signed-off-by: artbataev <[email protected]> * Fix output_path argument for HF import Signed-off-by: Piotr Kamiński <[email protected]> * fix fabric ckpt loading Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * code review suggestions Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * remove unused import Signed-off-by: Piotr Kaminski <[email protected]> * use cnn dataset in github ci Signed-off-by: Piotr Kaminski <[email protected]> * applied code review Signed-off-by: Piotr Kaminski <[email protected]> * code review changes Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * simplify interface for data iterator Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> * (partial) PP fix Signed-off-by: Piotr Kaminski <[email protected]> * Apply isort and black reformatting Signed-off-by: Laplasjan107 <[email protected]> --------- Signed-off-by: Piotr Kaminski <[email protected]> Signed-off-by: Laplasjan107 <[email protected]> Signed-off-by: Piotr Kamiński <[email protected]> Signed-off-by: artbataev <[email protected]> Co-authored-by: Piotr Kaminski <[email protected]> Co-authored-by: Laplasjan107 <[email protected]> Co-authored-by: artbataev <[email protected]> * TDT compute timestamps option and Extra Whitespace handling for SPE (NVIDIA#10875) * add token duration Signed-off-by: monica-sekoyan <[email protected]> * revert rnnt change Signed-off-by: monica-sekoyan <[email protected]> * add remove_extra_whitespaces arg to spe tokenizer Signed-off-by: monica-sekoyan <[email protected]> * add token duration retrieval Signed-off-by: monica-sekoyan <[email protected]> * add ignore_extra_whitespace to spe Signed-off-by: monica-sekoyan <[email protected]> * add compute_timestamp support for tdt Signed-off-by: monica-sekoyan <[email protected]> * fix config field name Signed-off-by: monica-sekoyan <[email protected]> * add refinement for tdt timestamps Signed-off-by: monica-sekoyan <[email protected]> * add segments timestamp support and refinement for ctc Signed-off-by: monica-sekoyan <[email protected]> * modify tests for ctc decoding timestamps Signed-off-by: monica-sekoyan <[email protected]> * add rnnt timestamp tests Signed-off-by: monica-sekoyan <[email protected]> * updated doc Signed-off-by: monica-sekoyan <[email protected]> * fix in test Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> * fix of unicode char Signed-off-by: monica-sekoyan <[email protected]> * fix rnnt_decoding test Signed-off-by: monica-sekoyan <[email protected]> * workaround for tesst tokenizer Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> * modify segments formation Signed-off-by: monica-sekoyan <[email protected]> * modify segments for ctc Signed-off-by: monica-sekoyan <[email protected]> * fix in ctc refinement Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> * minor changes Signed-off-by: monica-sekoyan <[email protected]> * reverse offset change Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> * warning mode=once Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> * make ignore_extrawhitespaces false Signed-off-by: monica-sekoyan <[email protected]> * minor changes Signed-off-by: monica-sekoyan <[email protected]> * adjust changes to the tests Signed-off-by: monica-sekoyan <[email protected]> * modify prompt_formatter tests Signed-off-by: monica-sekoyan <[email protected]> * Apply isort and black reformatting Signed-off-by: monica-sekoyan <[email protected]> --------- Signed-off-by: monica-sekoyan <[email protected]> Signed-off-by: monica-sekoyan <[email protected]> Co-authored-by: monica-sekoyan <[email protected]> * Basic online dynamic FP8 quantization with vLLM (NVIDIA#10904) * Basic online dynamic quantization with vLLM Signed-off-by: Jan Lasek <[email protected]> * Apply isort and black reformatting Signed-off-by: janekl <[email protected]> * vllm 0.6.3 updates Signed-off-by: Jan Lasek <[email protected]> * Pass quantization param in deploy_vllm_triton.py script Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Co-authored-by: janekl <[email protected]> * ci: Improve VM maintenance (NVIDIA#10758) * ci: Improve VM maintenance Signed-off-by: Oliver Koenig <[email protected]> * rename stuff Signed-off-by: Oliver Koenig <[email protected]> * title Signed-off-by: Oliver Koenig <[email protected]> * use team Signed-off-by: Oliver Koenig <[email protected]> * run on failure too Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * yrdy Signed-off-by: Oliver Koenig <[email protected]> * f Signed-off-by: Oliver Koenig <[email protected]> * test Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * f Signed-off-by: Oliver Koenig <[email protected]> * f Signed-off-by: Oliver Koenig <[email protected]> * f Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> * Add comment for vision transpose * update megatron_init.py inside lightning Signed-off-by: yaoyu-33 <[email protected]> * rename llama to mllama folder name Signed-off-by: yaoyu-33 <[email protected]> * update to attention bias Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * update dropout to 0 Signed-off-by: yaoyu-33 <[email protected]> * fix attention bias Signed-off-by: yaoyu-33 <[email protected]> * remove disable_vision_padding since we now have a fix Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Update init for mllama Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * Address comments Signed-off-by: yaoyu-33 <[email protected]> * Apply isort and black reformatting Signed-off-by: yaoyu-33 <[email protected]> * fix copyright title Signed-off-by: yaoyu-33 <[email protected]> * fix code scan Signed-off-by: yaoyu-33 <[email protected]> * update vision code Signed-off-by: yaoyu-33 <[email protected]> * revert attention bias changes until latest MLM code got merged Signed-off-by: yaoyu-33 <[email protected]> * fix warning Signed-off-by: yaoyu-33 <[email protected]> * Turn off system message check, as it's "" now Signed-off-by: yaoyu-33 <[email protected]> * Rolllback megatron_parallel.py Signed-off-by: Yu Yao <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Yu Yao <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: artbataev <[email protected]> Signed-off-by: parthmannan <[email protected]> Signed-off-by: meatybobby <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: Marc Romeijn <[email protected]> Signed-off-by: Marc Romeyn <[email protected]> Signed-off-by: marcromeyn <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Shriya Palsamudram <[email protected]> Signed-off-by: ashors1 <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Piotr Kaminski <[email protected]> Signed-off-by: Laplasjan107 <[email protected]> Signed-off-by: Piotr Kamiński <[email protected]> Signed-off-by: monica-sekoyan <[email protected]> Signed-off-by: monica-sekoyan <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: janekl <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Bobby Chen <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Yashaswi Karnati <[email protected]> Co-authored-by: ykarnati <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Yashaswi Karnati <[email protected]> Co-authored-by: artbataev <[email protected]> Co-authored-by: Parth Mannan <[email protected]> Co-authored-by: parthmannan <[email protected]> Co-authored-by: meatybobby <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: marcromeyn <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Shriya Rishab <[email protected]> Co-authored-by: ataghibakhsh <[email protected]> Co-authored-by: Maanu Grover <[email protected]> Co-authored-by: Anna Shors <[email protected]> Co-authored-by: Piotr Kamiński <[email protected]> Co-authored-by: Piotr Kaminski <[email protected]> Co-authored-by: Laplasjan107 <[email protected]> Co-authored-by: monica-sekoyan <[email protected]> Co-authored-by: monica-sekoyan <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: janekl <[email protected]> Co-authored-by: oliver könig <[email protected]>
1 parent 46db571 commit 5dd8afa

31 files changed

+3998
-57
lines changed

nemo/collections/multimodal/data/energon/base.py

+24-6
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,24 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14-
from typing import TYPE_CHECKING, Any, Dict, Literal, Optional
1514

15+
from copy import deepcopy
16+
from typing import Any, Dict, Literal, Optional
17+
18+
import fiddle as fdl
1619
import pytorch_lightning as pl
1720
from megatron.core import parallel_state
1821
from megatron.energon import WorkerConfig, get_savable_loader, get_train_dataset
1922
from pytorch_lightning.utilities.types import EVAL_DATALOADERS, TRAIN_DATALOADERS
2023
from torch.utils.data import DataLoader
24+
from typing_extensions import Self
2125

2226
from nemo.collections.multimodal.data.energon.config import MultiModalSampleConfig
2327
from nemo.collections.multimodal.data.energon.task_encoder import MultiModalTaskEncoder
24-
from nemo.lightning.io.mixin import IOMixin
28+
from nemo.lightning.io.mixin import IOMixin, serialization, track_io
2529
from nemo.lightning.pytorch.plugins import MegatronDataSampler
2630
from nemo.utils import logging
2731

28-
if TYPE_CHECKING:
29-
from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec
30-
3132

3233
class SimpleMultiModalDataModule(pl.LightningDataModule, IOMixin):
3334
"""
@@ -66,6 +67,7 @@ def __init__(
6667
pin_memory: bool = True,
6768
multimodal_sample_config: Optional[MultiModalSampleConfig] = MultiModalSampleConfig(),
6869
task_encoder: Optional[MultiModalTaskEncoder] = None,
70+
decoder_seq_length: Optional[int] = None,
6971
) -> None:
7072
"""
7173
Initialize the SimpleMultiModalDataModule.
@@ -87,6 +89,7 @@ def __init__(
8789
self.tokenizer = tokenizer
8890
self.image_processor = image_processor
8991
self.seq_length = seq_length
92+
self.decoder_seq_length = decoder_seq_length
9093
self.micro_batch_size = micro_batch_size
9194
self.global_batch_size = global_batch_size
9295
self.num_workers = num_workers
@@ -99,11 +102,24 @@ def __init__(
99102
)
100103
self.init_global_step = 0
101104
self.data_sampler = SequentialMegatronSampler(
102-
seq_len=self.seq_length, micro_batch_size=self.micro_batch_size, global_batch_size=self.global_batch_size
105+
seq_len=self.seq_length,
106+
decoder_seq_len=self.decoder_seq_length,
107+
micro_batch_size=self.micro_batch_size,
108+
global_batch_size=self.global_batch_size,
103109
)
104110
self.train_dataloader_object = None
105111
self.val_dataloader_object = None
106112

113+
def io_init(self, **kwargs) -> fdl.Config[Self]:
114+
# (pleasefixme) image_processor and task_encoder are problematic with Fiddle so we skip serializing them for now
115+
cfg_kwargs = {k: deepcopy(v) for k, v in kwargs.items() if k not in ['image_processor', 'task_encoder']}
116+
117+
for val in cfg_kwargs.values():
118+
if not serialization.find_node_traverser(type(val)):
119+
track_io(type(val))
120+
cfg = fdl.Config(type(self), **cfg_kwargs)
121+
return cfg
122+
107123
def datasets_provider(self, worker_config, split: Literal['train', 'val'] = 'val'):
108124
"""
109125
Provide the dataset for training or validation.
@@ -315,6 +331,7 @@ def __init__(
315331
micro_batch_size: int = 4,
316332
global_batch_size: int = 8,
317333
init_consumed_samples: int = 0,
334+
decoder_seq_len: Optional[int] = None,
318335
init_global_step=0,
319336
):
320337
"""
@@ -328,6 +345,7 @@ def __init__(
328345
"""
329346
super().__init__(
330347
seq_len=seq_len,
348+
decoder_seq_len=decoder_seq_len,
331349
micro_batch_size=micro_batch_size,
332350
global_batch_size=global_batch_size,
333351
init_consumed_samples=init_consumed_samples,

nemo/collections/multimodal/data/energon/config.py

+1-7
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from dataclasses import dataclass, field
1616
from typing import List
1717
import torch
18-
from nemo.collections.multimodal.data.energon.conversation import BaseConversationTemplateConfig
18+
from nemo.collections.multimodal.data.energon.conversation import LLaVATemplateConfig
1919

2020

2121
@dataclass
@@ -56,12 +56,6 @@ class ImageTextRawBatch:
5656
loss_mask: torch.Tensor = field(default_factory=lambda: torch.empty(0, dtype=torch.float))
5757

5858

59-
class LLaVATemplateConfig(BaseConversationTemplateConfig):
60-
"""LLava specific template configuration which extends the base config"""
61-
62-
pass
63-
64-
6559
@dataclass
6660
class MultiModalSampleConfig:
6761
image_token: ImageToken = field(default_factory=ImageToken)

nemo/collections/multimodal/data/energon/conversation.py

+20
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,15 @@
1919
class BaseConversationTemplateConfig:
2020
"""Conversation template config related parameters"""
2121

22+
system: Optional[str] = "".format() # fmt: off
23+
roles: List[str] = field(default_factory=lambda: ['user', 'assistant'])
24+
stop_string: Optional[str] = None
25+
chat_template = None
26+
27+
28+
class LLaVATemplateConfig(BaseConversationTemplateConfig):
29+
"""LLava specific template configuration which extends the base config"""
30+
2231
system: Optional[str] = (
2332
"A chat between a curious user and artificial assistant agent. The assistant gives helpful, detailed and polite answers to user's questions.".format()
2433
) # fmt: off
@@ -36,3 +45,14 @@ class BaseConversationTemplateConfig:
3645
{%- endif %}
3746
{%- endfor -%}
3847
"""
48+
49+
50+
class MLlamaTemplateConfig(BaseConversationTemplateConfig):
51+
"""LLava specific template configuration which extends the base config"""
52+
53+
system: Optional[str] = None
54+
roles: List[str] = field(default_factory=lambda: ['user', 'assistant'])
55+
stop_string: str = None
56+
chat_template = """
57+
'{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n {%- if strftime_now is defined %}\n {%- set date_string = strftime_now("%d %b %Y") %}\n {%- else %}\n {%- set date_string = "26 Jul 2024" %}\n {%- endif %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0][\'role\'] == \'system\' %}\n {%- set system_message = messages[0][\'content\']|trim %}\n {%- set messages = messages[1:] %}\n{%- else %}\n {%- set system_message = "" %}\n{%- endif %}\n\n{#- Find out if there are any images #}\n{% set image_ns = namespace(has_images=false) %} \n{%- for message in messages %}\n {%- for content in message[\'content\'] %}\n {%- if content[\'type\'] == \'image\' %}\n {%- set image_ns.has_images = true %}\n {%- endif %}\n {%- endfor %}\n{%- endfor %}\n\n{#- Error out if there are images and system message #}\n{%- if image_ns.has_images and not system_message == "" %}\n {{- raise_exception("Prompting with images is incompatible with system messages.") }}\n{%- endif %}\n\n{#- System message if there are no images #}\n{%- if not image_ns.has_images %}\n {{- "<|start_header_id|>system<|end_header_id|>\\n\\n" }}\n {%- if tools is not none %}\n {{- "Environment: ipython\\n" }}\n {%- endif %}\n {{- "Cutting Knowledge Date: December 2023\\n" }}\n {{- "Today Date: " + date_string + "\\n\\n" }}\n {%- if tools is not none and not tools_in_user_message %}\n {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}\n {{- \'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.\' }}\n {{- "Do not use variables.\\n\\n" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- "\\n\\n" }}\n {%- endfor %}\n {%- endif %}\n {{- system_message }}\n {{- "<|eot_id|>" }}\n{%- endif %}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n {#- Extract the first user message so we can plug it in here #}\n {%- if messages | length != 0 %}\n {%- set first_user_message = messages[0][\'content\']|trim %}\n {%- set messages = messages[1:] %}\n {%- else %}\n {{- raise_exception("Cannot put tools in the first user message when there\'s no first user message!") }}\n{%- endif %}\n {{- \'<|start_header_id|>user<|end_header_id|>\\n\\n\' -}}\n {{- "Given the following functions, please respond with a JSON for a function call " }}\n {{- "with its proper arguments that best answers the given prompt.\\n\\n" }}\n {{- \'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.\' }}\n {{- "Do not use variables.\\n\\n" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- "\\n\\n" }}\n {%- endfor %}\n {{- first_user_message + "<|eot_id|>"}}\n{%- endif %}\n\n{%- for message in messages %}\n {%- if not (message.role == \'ipython\' or message.role == \'tool\' or \'tool_calls\' in message) %}\n {{- \'<|start_header_id|>\' + message[\'role\'] + \'<|end_header_id|>\\n\\n\' }}\n {%- if message[\'content\'] is string %}\n {{- message[\'content\'] }}\n {%- else %}\n {%- for content in message[\'content\'] %}\n {%- if content[\'type\'] == \'image\' %}\n {{- \'<|image|>\' }}\n {%- elif content[\'type\'] == \'text\' %}\n {{- content[\'text\'] }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- \'<|eot_id|>\' }}\n {%- elif \'tool_calls\' in message %}\n {%- if not message.tool_calls|length == 1 %}\n {{- raise_exception("This model only supports single tool-calls at once!") }}\n {%- endif %}\n {%- set tool_call = message.tool_calls[0].function %}\n {{- \'<|start_header_id|>assistant<|end_header_id|>\\n\\n\' -}}\n {{- \'{"name": "\' + tool_call.name + \'", \' }}\n {{- \'"parameters": \' }}\n {{- tool_call.arguments | tojson }}\n {{- "}" }}\n {{- "<|eot_id|>" }}\n {%- elif message.role == "tool" or message.role == "ipython" %}\n {{- "<|start_header_id|>ipython<|end_header_id|>\\n\\n" }}\n {%- if message.content is mapping or message.content is iterable %}\n {{- message.content | tojson }}\n {%- else %}\n {{- message.content }}\n {%- endif %}\n {{- "<|eot_id|>" }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- \'<|start_header_id|>assistant<|end_header_id|>\\n\\n\' }}\n{%- endif %}\n'
58+
"""

nemo/collections/multimodal/data/energon/task_encoder.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def __init__(self, tokenizer, image_processor, multimodal_sample_config):
6262
image_processor (ImageProcessor): The image processor used for preprocessing images across different sample types.
6363
multimodal_sample_config (MultiModalSampleConfig): Configuration object for multimodal samples, including tokens and placeholders.
6464
"""
65-
65+
self.tokenizer = tokenizer
6666
self.encoders: Dict[str, SampleEncoder] = {
6767
VQASample.__name__: VQASampleEncoder(
6868
tokenizer=tokenizer,

nemo/collections/vlm/__init__.py

+45-7
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,56 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from nemo.collections.vlm.mllama.data import MLlamaLazyDataModule, MLlamaMockDataModule
16+
from nemo.collections.vlm.mllama.model.base import (
17+
CrossAttentionTextConfig,
18+
CrossAttentionVisionConfig,
19+
MLlamaModel,
20+
MLlamaModelConfig,
21+
)
22+
from nemo.collections.vlm.mllama.model.mllama import (
23+
MLlamaConfig11B,
24+
MLlamaConfig11BInstruct,
25+
MLlamaConfig90B,
26+
MLlamaConfig90BInstruct,
27+
)
128
from nemo.collections.vlm.neva.data import (
229
DataConfig,
330
ImageDataConfig,
431
ImageToken,
5-
MockDataModule,
632
MultiModalToken,
733
NevaLazyDataModule,
34+
NevaMockDataModule,
835
VideoDataConfig,
936
VideoToken,
1037
)
11-
from nemo.collections.vlm.neva.model import (
38+
from nemo.collections.vlm.neva.model.base import (
1239
CLIPViTConfig,
1340
HFCLIPVisionConfig,
14-
Llava1_5Config7B,
15-
Llava1_5Config13B,
16-
LlavaConfig,
17-
LlavaModel,
1841
MultimodalProjectorConfig,
1942
NevaConfig,
2043
NevaModel,
2144
)
45+
from nemo.collections.vlm.neva.model.llava import Llava1_5Config7B, Llava1_5Config13B, LlavaConfig, LlavaModel
46+
from nemo.collections.vlm.peft import LoRA
47+
from nemo.collections.vlm.recipes import *
2248

2349
__all__ = [
24-
"MockDataModule",
50+
"NevaMockDataModule",
2551
"NevaLazyDataModule",
52+
"MLlamaMockDataModule",
53+
"MLlamaLazyDataModule",
2654
"DataConfig",
2755
"ImageDataConfig",
2856
"VideoDataConfig",
@@ -38,4 +66,14 @@
3866
"Llava1_5Config7B",
3967
"Llava1_5Config13B",
4068
"LlavaModel",
69+
"MLlamaModel",
70+
"MLlamaModelConfig",
71+
"CrossAttentionTextConfig",
72+
"CrossAttentionVisionConfig",
73+
"MLlamaConfig11B",
74+
"MLlamaConfig11BInstruct",
75+
"MLlamaConfig90B",
76+
"MLlamaConfig90BInstruct",
77+
"mllama_11b",
78+
"mllama_90b",
4179
]
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
from transformers import PreTrainedTokenizerFast
15+
from nemo.lightning.io import track_io
16+
17+
track_io(PreTrainedTokenizerFast)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from nemo.collections.vlm.mllama.data.lazy import MLlamaLazyDataModule
16+
from nemo.collections.vlm.mllama.data.mock import MockDataModule as MLlamaMockDataModule
17+
18+
__all__ = [
19+
"MLlamaMockDataModule",
20+
"MLlamaLazyDataModule",
21+
]

0 commit comments

Comments
 (0)