Skip to content

Releases: linkedin/Liger-Kernel

v0.4.2: Fix 'RMSNorm' object has no attribute 'in_place'

17 Nov 19:22
cbebed6
Compare
Choose a tag to compare

Highlights

Fix #390 #383

What's Changed

Full Changelog: v0.4.1...v0.4.2

v0.4.1: Gemma 2 Support, CrossEntropy Patching FIx, and GroupNorm

12 Nov 23:42
d784664
Compare
Choose a tag to compare

Highlights

  1. Gemma 2 Support: The long pending gemma 2 is finally supported thanks to @Tcc0403! He has implemented the nasty softcapping in fused linear cross entropy (#320) and discovered the convergence issue which later fixed by @ByronHsu and @Tcc0403 together. (#376)

  2. CrossEntropy Patching FIx: If you use monkey patch for CrossEntropy (Not FLCE), it is actually not patched after transformers 4.46.1. This is because CrossEntropy was replaced with F.cross_entropy in the model code. We fixed the issue in the PR (#375)

  3. GroupNorm Kernel: Our new contributor @pramodith implemented a GroupNorm kernel #375 with 2x Speedup.

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!

05 Nov 22:15
e985195
Compare
Choose a tag to compare

Highlights

  1. AMD GPU: We have partnered with Embedding LLM to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% lower memory usage on AMD. See the full blogpost from https://embeddedllm.com/blog/cuda-to-rocm-portability-case-study-liger-kernel. @Edenzzzz @DocShotgun @tjtanaa

  2. Technical Report: We have published a technical report on arXiv (https://arxiv.org/pdf/2410.10989) with abundant details.

  3. Modal CI: We have moved our entire GPU CI stack to Modal! Thanks to intelligent Docker layer caching and blazingly fast container startup time and scheduling, we have reduced the CI overhead by over 10x (from minutes to seconds).

  4. LLaMA 3.2-Vision Model: We have added kernel support for the LLaMA 3.2-Vision model. You can easily use liger_kernel.transformers.apply_liger_kernel_to_mllama to patch the model. @tyler-romero @shivam15s

  5. JSD Kernel: We have added the JSD kernel for distillation, which also comes with a chunking version! @Tcc0403 @yundai424 @qingquansong

  6. HuggingFace Gradient Accumulation Fixes: We have fixed the notorious HuggingFace gradient accumulation issue (huggingface/transformers#34191) by carefully adjusting the cross entropy scalar. You can now safely use v0.4.0 with the latest HuggingFace gradient accumulation fixes (transformers>=4.46.2)!

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.4.0

v0.3.1: Patch Release

01 Oct 20:55
1520999
Compare
Choose a tag to compare

Summary

This patch release brings important updates and fixes to Liger-Kernel. Notable changes include:

  • KLDiv calculation fix: KLDiv now functions correctly with larger vocab sizes
  • SwiGLU/GeGLU casting fix: Program IDs are now cast to int64 in SwiGLU/GeGLU kernels to prevent memory errors with larger dimensions.
  • AutoLigerKernelForCausalLM fix: The model now properly passes through all original keyword arguments
  • Post-init model patching fix: Fix to post-init model patching to ensure HF Trainer integration works correctly
  • Relaxed transformers dependency: Improve compatibility with a broader range of versions.

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.3.1

v0.3.0 Release Note

13 Sep 21:45
793785f
Compare
Choose a tag to compare

Opening Thoughts

Thank you, everyone! Your overwhelming support continues to fuel our passion for innovation. With your engagement, we've pushed the boundaries further in this release!

We are hosting our 1st IRL event, 'Scaling AI Infra - GPUs, Kernels, LLMs and More'. We will discuss Liger-Kernel and invite speakers to talk about DeepSpeed, SGLang, and the TensorCore team. Please RSVP at our event page. Screenshot 2024-09-13 at 2 39 20 PM

What's New

🌐 Large Vision Language Model Support

Welcome Qwen-VL, our first venture into the large vision language models! This expansion allows more versatility in applying our solutions across different AI domains.

✨ Patch Kernels on Model Instances

Enhancing flexibility, our latest API update supports model name string and instance as input, streamlining the integration with Hugging Face's SFT trainer. This enhancement ensures that you can easily patch Liger kernels into your models, whether you're starting from scratch or adapting an existing model setup.

🚀 SWIFT Trainer Integration

We're excited to be integrated into the SWIFT Trainer Framework. This integration signifies our commitment to delivering cutting-edge tools that empower the community toward enhancing training efficiency across all supported models.

🔧 New Kernels and Features

KL Divergence Kernel: Dive deeper into model behaviors with our new KL divergence kernel, perfect for those needing model distillation, alignment, and beyond.
Experimental Kernel for Embedding: Explore acceleration possibilities with our experimental kernel that optimizes embedding operations.
Extended Cross Entropy Functionality: Now we support label smoothing and sum reduction, enabling more robust training and flexible loss calculations for neural networks.

Get Involved and Stay Tuned

Join us on our journey! Connect with us on our CUDA MODE server's Discord channel, and don't forget to follow our official account on X for the latest updates: https://x.com/liger_kernel.

A Look Ahead

We're not stopping here! Looking forward, we plan to expand our support to include even more model families and to explore further optimizations and innovative features. Your feedback is invaluable, so please keep it coming as we shape the future of Liger together!

🌟 Acknowledgments

Your contributions make a difference! Thanks to everyone who has starred, contributed, and provided feedback. Each contribution enriches our community and helps us grow stronger together.

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.0

v0.2.1

29 Aug 22:36
e5d6ad7
Compare
Choose a tag to compare

Patch Release

Fix bug in Gemma patch function that FLCE and CE are both true by default ruh roh

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0 Release Note

29 Aug 18:51
c6fb35e
Compare
Choose a tag to compare

Opening Thoughts 🫶

Thank You!

We'd love to take this chance to express our sincere gratefulness to the community! 2500+ ⭐ , 10+ new contributors, 50+ PRs, plus integration into Hugging Face 🤗, axolotl and LLaMA-Factory in less than one week since going open sourced is totally beyond our expectation. Being able to work together with all the cool people in the community is a bliss and we can't wait for further collaborations down the road!

Looking Ahead

We look forward to further enhancing our collaboration with the community, to work together on a lot of cool stuff -- support for more model families, squeeze out all optimization opportunities for kernels, and, why not, llama.triton? 😉

Get Involved and Stay Tuned

Please feel free to join our discord channel hosted in CUDA MODE server, and follow our repo's official account on X: https://x.com/liger_kernel !

Welcome Phi3 and Qwen2 🚀

This release ships with support for other popular models including Phi3 and Qwen2. All existing kernels in Liger repo can be leveraged to boost your training with models from these families now. Please refer to our API guide for how to use.

Even Easier API ❤️

Experimenting with different model families and tired of having if-else everywhere just to switch between kernel patching functions? You can now try out our new model-agnostic API to apply Liger kernels. Still a one-liner, but more elegant :) For example:

from liger_kernel.transformers import AutoLigerKernelForCausalLM

# This AutoModel wrapper class automatically monkey-patches the
# model with the optimized Liger kernels if the model is supported.
model = AutoLigerKernelForCausalLM.from_pretrained(...)

More Features

  • Support optional bias term in FusedLinearCrossEntropy (#144)
  • Mistral is now equipped with the humongous memory reduction from FusedLinearCrossEntropy now (#93)
  • Gemma is now equipped with the humongous memory reduction from FusedLinearCrossEntropy now (#111)

Bug Fixes

  • Fixed import error when using triton>=3.0.0 on NGC containers (#79)
  • Fixed the missing offset in Gemma RMSNorm (#85) oops
  • Added back missing dataclass entries in efficiency callback (#116)
  • There was some confusion on which Gemma do we support, we now support all! (#125)
  • Fallback to torch native linear + CrossEntropy when without label (#128)
  • Match the exact dtype up and downcasting in Llama & Gemma for RMSNorm (#92)
  • Address the bug that RoPE gets very slow when using dynamic sequence length (#149)

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.2.0

v0.1.1: Add readme on pypi

23 Aug 05:25
b418557
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.1

v0.1.0: First Public Release

20 Aug 18:50
27d2d51
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.0.1...v0.1.0

v0.0.1 pre release

15 Aug 20:41
Compare
Choose a tag to compare
v0.0.1 pre release Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: 0.0.2...v0.0.1