Conversation
WalkthroughTwo new files have been added under the LFM2 examples: a README providing installation guidance for the latest transformers library, and a YAML configuration file detailing dataset, training, and logging parameters for the LFM2-350M model. Changes
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
📖 Documentation Preview: https://6871e07b86558bdf0f609fce--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 7c6c10e |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/lfm2/lfm2-350m-fft.yaml (2)
21-26: Blank WandB fields may cause silent auto-init.When
wandb_projectis empty, WandB will fall back to your global default or open an anonymous run, which can be confusing.Either:
-wandb_project: +wandb_project: lfm2-experimentsor explicitly disable WandB with
wandb: disabledin the launcher.
17-20:pad_to_sequence_len: false+ long context may hurt throughput.Without sample packing and without padding, each example is individually 0-padded to its own length ⇒ poor GPU utilisation on chat datasets with large length variance.
Consider enabling padding to fixed
sequence_lenonce sample-packing support lands, or setsequence_lencloser to your dataset’s 90-th percentile length.examples/lfm2/README.md (1)
3-7: Pin the transformers commit for reproducibility.
pip install git+https://github.com/huggingface/transformers.gitpulls movingmain, which breaks reproducibility and may introduce incompatible API changes.pip install git+https://github.com/huggingface/transformers.git@<commit-sha>Replace
<commit-sha>with the commit you validated this example against and mention it in the README.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/lfm2/README.md(1 hunks)examples/lfm2/lfm2-350m-fft.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: preview
🔇 Additional comments (1)
examples/lfm2/lfm2-350m-fft.yaml (1)
30-33: Learning rate looks aggressive for 350 M model.
5e-5with AdamW-fused and cosine schedule tends to diverge when fine-tuning full-precision 350 M models on long-context data. Common practice is1–2e-5.Please sanity-check with a short ~200-step pilot run and inspect loss curves.
djsaunde
left a comment
There was a problem hiding this comment.
I need to read about this 👀
| @@ -0,0 +1,7 @@ | |||
| # Liquid Foundation Models 2 | |||
There was a problem hiding this comment.
TIL about liquid foundation models
| LFM2 support in transformers exists in the main branch, but is not yet included in the transformers release. | ||
|
|
||
| ```bash | ||
| pip install git+https://github.com/huggingface/transformers.git |
There was a problem hiding this comment.
--upgrade --no-deps maybe?
Description
LFM2 blog post: https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models
added example with packing (seems to work correctly even though causal_conv1d support isn't quite correct for packed sequences yet, see #state-spaces/mamba/244)
to add proper packing implemented in modeling code might look like this
https://github.com/zigzagcai/varlen_mamba/blob/feat/add-cu_seqlens/mamba_ssm/modules/mamba_simple.py#L188-L198
Summary by CodeRabbit
Summary by CodeRabbit
Documentation
New Features