Best practice for training LLaMA models in Megatron-LM
-
Updated
Jan 2, 2024 - Python
Best practice for training LLaMA models in Megatron-LM
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Annotations of the interesting ML papers I read
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Odysseus: Playground of LLM Sequence Parallelism
A LLaMA1/LLaMA12 Megatron implement.
Training NVIDIA NeMo Megatron Large Language Model (LLM) using NeMo Framework on Google Kubernetes Engine
Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP
Megatron-LM/GPT-NeoX compatible Text Encoder with 🤗Transformers AutoTokenizer.
Running Large Language Model easily.
Wrapped Megatron: As User-Friendly as HuggingFace, As Powerful as Megatron-LM | Megatron封装:和HuggingFace一样方便,和Megatron-LM一样强大
Add a description, image, and links to the megatron-lm topic page so that developers can more easily learn about it.
To associate your repository with the megatron-lm topic, visit your repo's landing page and select "manage topics."