[Model] Add LFM2-VL model support#31758
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request adds support for the LFM2-VL model, including its SigLIP2 vision encoder. The changes are comprehensive, covering the model implementation, multimodal processing pipeline, and necessary integrations. I've identified a couple of issues. First, there's a line of dead code in the smart_resize logic within lfm2_vl.py that should be removed. More critically, the data parallelism feature for the vision encoder (mm_encoder_tp_mode=\"data\") appears to be broken due to a hardcoded use_data_parallel=False flag in siglip2.py. This prevents the vision encoder from being replicated across tensor parallel ranks as intended. My review includes detailed comments on these points.
b29f05e to
79e2aed
Compare
|
Hi @tianshu-Michael-yu, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
909b2ba to
c3cd440
Compare
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
c3cd440 to
ec18271
Compare
|
We need to add an entry to https://github.com/vllm-project/vllm/blob/main/tests/models/registry.py cc @DarkLight1337 for final review |
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
- Move Siglip2Model import to top of file using relative import - Remove redundant CUDA check for spatial_shapes (already kept on CPU) - Remove use_data_parallel parameter from Siglip2 classes since it is derived from multimodal_config internally Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Please also add this model to the Supported Models page, example scripts (e.g. examples/offline_inference/vision_language.py), as well as the test registry tests/models/registry.py.
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
|
Documentation preview: https://vllm--31758.org.readthedocs.build/en/31758/ |
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
here is the problm I meet. if I add "projector_use_layernorm": false in config.json, still error |
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Purpose
Add support for the LFM2-VL (Liquid Foundation Model 2 Vision-Language) model family from LiquidAI.
Changes:
Lfm2VLForConditionalGenerationmodel implementation with full multimodal processing pipelinesiglip2.py) used by LFM2-VLmax_seqlentype hint inMMEncoderAttentionto acceptint | torch.Tensoris_mm_embedin GPU model runner to avoid race conditions with async copiesThe model supports:
Test Plan
python -m vllm.entrypoints.openai.api_server
--model LiquidAI/LFM2-VL-3B
--trust-remote-code
--served-model-name lfm2-vl-3b
--mm-processor-cache-type shm
--async-scheduling
--compilation-config '{"max_cudagraph_capture_size": 8192, "compile_mm_encoder": true}'## Test Result
Tested with LiquidAI/LFM2-VL-450M, LFM2-VL-3B, and LFM2-VL-8B-A1B variants.
LFM2-VL-3B benchmark results (MMStar):
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.