Skip to content

[Liquid AI] Support LFM VL#17954

Open
vincentzed wants to merge 4 commits intosgl-project:mainfrom
bzhng-development:vz/lfm-VL-supports
Open

[Liquid AI] Support LFM VL#17954
vincentzed wants to merge 4 commits intosgl-project:mainfrom
bzhng-development:vz/lfm-VL-supports

Conversation

@vincentzed
Copy link
Contributor

@vincentzed vincentzed commented Jan 29, 2026

Motivation

CleanShot 2026-01-29 at 15 33 14@2x
python3 -m sglang.launch_server --model LiquidAI/LFM2.5-VL-1.6B --trust-remote-code

Tested on B300.

Modifications

Accuracy Tests

pip install transformers==5.0.0

The branch already has patches to align with #17784 (necessary to make it work)

The image shows a man standing on the back of a yellow taxi, ironing clothes. He is wearing a yellow sweatshirt and sandals. The taxi is parked on a city street with buildings and trees in the background. There are also two other yellow taxis visible in the scene.

For the MMMU result, Please, see below.

Benchmarking and Profiling

Text model is fine.

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:04<00:00, 291.24it/s]
Accuracy: 0.597
Invalid: 0.001
Latency: 4.560 s
Output throughput: 31997.563 token/s
OPENAI_API_KEY=sk-123456 OPENAI_API_BASE=http://localhost:30000/v1 python3 -m lmms_eval --model openai_compatible --model_args 'model_version="LiquidAI/LFM2.5-VL-1.6B",tp=1' --tasks mmmu_val --batch_size 16 --log_samples --output_path ./logs
2026-01-29 20:28:09 | INFO     | __main__:cli_evaluate:311 - Verbosity set to INFO
2026-01-29 20:28:11 | INFO     | __main__:cli_evaluate_single:400 - Evaluation tracker args: {'output_path': './logs'}
2026-01-29 20:28:11 | INFO     | __main__:cli_evaluate_single:480 - Selected Tasks: ['mmmu_val']
2026-01-29 20:28:11 | INFO     | lmms_eval.evaluator:simple_evaluate:161 - Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2026-01-29 20:28:14 | INFO     | lmms_eval.evaluator:evaluate:402 - Running on rank 0 (local rank 0)
2026-01-29 20:28:14 | INFO     | lmms_eval.api.task:build_all_requests:427 - Building contexts for mmmu_val on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [00:00<00:00, 15856.88it/s]
2026-01-29 20:28:14 | INFO     | lmms_eval.evaluator:evaluate:495 - Running generate_until requests
Model Responding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:30<00:00,  1.21s/it]2026-01-29 20:29:44 | INFO     | lmms_eval.models.model_utils.gen_metrics:log_metrics:48 - Metric summary - Total time: 358.372s, Total tokens: 2757, Avg speed: 7.7 tokens/s
Model Responding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:30<00:00,  1.58s/it]
Postprocessing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [00:00<00:00, 17206.30it/s]
{'Overall-Art and Design': {'num': 120, 'acc': 0.45833}, 'Art': {'num': 30, 'acc': 0.56667}, 'Art_Theory': {'num': 30, 'acc': 0.4}, 'Design': {'num': 30, 'acc': 0.56667}, 'Music': {'num': 30, 'acc': 0.3}, 'Overall-Business': {'num': 150, 'acc': 0.31333}, 'Accounting': {'num': 30, 'acc': 0.26667}, 'Economics': {'num': 30, 'acc': 0.5}, 'Finance': {'num': 30, 'acc': 0.16667}, 'Manage': {'num': 30, 'acc': 0.23333}, 'Marketing': {'num': 30, 'acc': 0.4}, 'Overall-Science': {'num': 150, 'acc': 0.36667}, 'Biology': {'num': 30, 'acc': 0.4}, 'Chemistry': {'num': 30, 'acc': 0.26667}, 'Geography': {'num': 30, 'acc': 0.5}, 'Math': {'num': 30, 'acc': 0.5}, 'Physics': {'num': 30, 'acc': 0.16667}, 'Overall-Health and Medicine': {'num': 150, 'acc': 0.39333}, 'Basic_Medical_Science': {'num': 30, 'acc': 0.56667}, 'Clinical_Medicine': {'num': 30, 'acc': 0.43333}, 'Diagnostics_and_Laboratory_Medicine': {'num': 30, 'acc': 0.36667}, 'Pharmacy': {'num': 30, 'acc': 0.3}, 'Public_Health': {'num': 30, 'acc': 0.3}, 'Overall-Humanities and Social Science': {'num': 120, 'acc': 0.51667}, 'History': {'num': 30, 'acc': 0.6}, 'Literature': {'num': 30, 'acc': 0.73333}, 'Sociology': {'num': 30, 'acc': 0.43333}, 'Psychology': {'num': 30, 'acc': 0.3}, 'Overall-Tech and Engineering': {'num': 210, 'acc': 0.32381}, 'Agriculture': {'num': 30, 'acc': 0.33333}, 'Architecture_and_Engineering': {'num': 30, 'acc': 0.33333}, 'Computer_Science': {'num': 30, 'acc': 0.3}, 'Electronics': {'num': 30, 'acc': 0.33333}, 'Energy_and_Power': {'num': 30, 'acc': 0.43333}, 'Materials': {'num': 30, 'acc': 0.23333}, 'Mechanical_Engineering': {'num': 30, 'acc': 0.3}, 'Overall': {'num': 900, 'acc': 0.38444}}
2026-01-29 20:29:44 | INFO     | lmms_eval.loggers.evaluation_tracker:save_results_aggregated:188 - Saving results aggregated
2026-01-29 20:29:44 | INFO     | lmms_eval.loggers.evaluation_tracker:save_results_samples:287 - Saving samples to logs/__LiquidAI__LFM2.5-VL-1.6B__/20260130_042811_samples_mmmu_val.jsonl
openai_compatible (model_version="LiquidAI/LFM2.5-VL-1.6B",tp=1), gen_kwargs: (), limit: None, num_fewshot: None, batch_size: 16
| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|------|
|mmmu_val|      0|none  |     0|mmmu_acc|↑  |0.3844|±  |   N/A|
CleanShot 2026-01-29 at 15 32 30@2x The differences are from non standard VLEval kit that LiquidAI use. It has accuracy parity
Tasks Version Filter n-shot Metric Value Stderr
mmmu_val 0 none 0 mmmu_acc 0.3844 ± N/A

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>

Add debug instrumentation for LFM2-VL multimodal processing

Adds comprehensive logging to diagnose why vision features may not
be properly integrated in the LFM2-VL model.

Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
Cherry-picked from JustinTong0323:update-transformers-v5 PR.
Updates rope_theta and rope_scaling access to use config.rope_parameters
dict instead of direct config attributes for transformers v5 compatibility.

Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @vincentzed, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for the LiquidAI LFM2.5-VL-1.6B multimodal model, enabling it to be run within the SGLang framework. It includes the necessary configuration, model architecture, and image processing logic. A significant refactoring effort was undertaken to standardize how Rotary Positional Embedding parameters are accessed across various language models, enhancing code maintainability and future compatibility. Additionally, the changes ensure alignment with newer versions of the HuggingFace Transformers library and incorporate detailed debugging capabilities for multimodal data handling.

Highlights

  • New Model Support: Added comprehensive support for the LiquidAI LFM2.5-VL-1.6B (LFM VL) model, including its configuration (Lfm2VlConfig), core model implementation (Lfm2VlForConditionalGeneration), and a dedicated image processor (Lfm2VlImageProcessor).
  • RoPE Parameter Access Refactoring: Refactored how Rotary Positional Embedding (RoPE) parameters (rope_theta, rope_scaling) are accessed across numerous model implementations. Instead of direct attribute access (getattr(config, "rope_theta")), models now consistently use config.rope_parameters.get("rope_theta"), improving robustness and consistency.
  • HuggingFace Transformers v5 Compatibility: Updated the _verify_transformers_version logic to explicitly require transformers version 5.0.0dev0 or newer for LFM2-VL models, ensuring compatibility with the latest HuggingFace features and APIs.
  • Multimodal Processing Enhancements: Improved multimodal input handling by adding pixel_attention_mask and spatial_shapes to the _MM_ITEM_ATTR_TO_MODALITY mapping in the base processor, and updated test runners to use AutoModelForImageTextToText and extract features with return_dict=True and pooler_output.
  • Debug Logging for Multimodal Flow: Introduced extensive debug logging within the multimodal processing pipeline, specifically in mm_utils.py, multimodal_processor.py, and the new lfm2_vl.py files, to aid in tracing data flow and identifying issues.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the LFM-VL model and includes a significant refactoring across many model files to ensure compatibility with transformers>=5.0.0. The changes correctly adapt the way RoPE parameters are accessed from the model configuration.

However, there is a critical issue with leftover debugging code in several files. This code writes to hardcoded paths and includes session-specific information, which should not be part of the final codebase. I've added specific comments pointing out these instances. Please remove all debugging-related code before merging.

I've also noted a minor point about using a consistent logger in the new config file.

Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
# Conflicts:
#	python/sglang/srt/configs/__init__.py
#	python/sglang/srt/model_executor/model_runner.py
#	python/sglang/srt/models/deepseek_v2.py
#	python/sglang/srt/models/gemma2.py
#	python/sglang/srt/models/gpt_oss.py
#	python/sglang/srt/models/qwen3_next.py
#	python/sglang/test/runners.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek diffusion SGLang Diffusion Multi-modal multi-modal language model quant LLM Quantization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant