Qwen2.5-VL-7B egale3 train by Lzhang-hub · Pull Request #102 · sgl-project/SpecForge

Lzhang-hub · 2025-08-01T08:30:41Z

Motivation

This is a draft pr for support train qwen2.5-vl-7b model.

Modifications

prepare data

dataset: FreedomIntelligence/ALLaVA-4V.
process data: in the process, pixel_values and image_grid_thw were added, except for input_ids loss_mask attention_mask.

train

Add QwenVLOnlineEagle3Model in core/eagle3.py, the main difference is that the input for the draft model is not input_ids, but input embeds that integrate image embeds.
add vlm train in train_eagle3_online.py

acc

benchmark:

loss metrics

acc metrics

speedup

server: sglang for qwen-2.5-vl eagle3 infer
benchmark scripts: use mmstar benchmark

Note: draft model Rayzl/qwen2.5-vl-7b-eagle3-sgl is only train on 30k vqa datasets, more data is still training.

with eagle

server cmd:

python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --speculative-draft Rayzl/qwen2.5-vl-7b-eagle3-sgl --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --speculative-algo EAGLE3 --speculative-num-steps 4 --speculative-eagle-topk 6 --speculative-num-draft-tokens 24 --tp 1 --mem-fraction-static 0.7 --host 0.0.0.0 --port 8080

benchmark:
python run_mmstar.py --host http://0.0.0.0 --port 8080 --parallel 1 --num-questions 100

result:

Latency: 34.241 s
Output throughput: 181.069 token/s
Accept length: 3.219

without eagle

server cmd:

python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --tp 1 --mem-fraction-static 0.7 --host 0.0.0.0 --port 8080

benchmark:
python run_mmstar.py --host http://0.0.0.0 --port 8080 --parallel 1 --num-questions 100

result:

Latency: 54.813 s
Output throughput: 121.230 token/s
Accept length: 1.000

e2e speedup 1.5x

Train scripts

## env dep
pip install qwen_vl_utils

## prepare data
python scripts/prepare_data.py --dataset allava4v --sample-size 100000 --split-eval

## train
bash examples/run_qwen2_5_vl_eagle3_online.sh

Note:

current pr only support tp=1

TODO

support tp>1
offline train

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @Lzhang-hub, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented initial support for training the Qwen2.5-VL-7B model using the Eagle3 method. This involves significant changes across data preparation, model definition, and training scripts to accommodate the unique requirements of vision-language models, such as handling image inputs and specialized rotary embeddings. The goal is to enable efficient training of this multimodal model within our existing framework.

Highlights

Qwen2.5-VL-7B Model Support: I've added comprehensive support for training the Qwen2.5-VL-7B model, enabling it to leverage its multimodal capabilities within our framework.
Eagle3 Draft Model Integration for VLMs: I've integrated the Eagle3 draft model specifically for Qwen2.5-VL, including a new Qwen2_5_VLForCausalLMEagle3 model and QwenVLOnlineEagle3Model for online training, which handles the unique multimodal rotary embedding of Qwen2.5-VL.
Enhanced VLM Data Preparation: I've updated the data preparation scripts to support sharegpt4v and allava4v datasets, ensuring that pixel_values and image_grid_thw are correctly processed for VLM training.
VLM Training Script Adaptations: I've modified the training script to incorporate VLM-specific logic, such as loading AutoProcessor and passing image-related inputs (pixel_values, image_grid_thw) through the training loop.
New Configuration and Training Script: I've added a new configuration file and a dedicated shell script to streamline the setup and execution of Qwen2.5-VL-7B Eagle3 training runs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for training the Qwen2.5-VL-7B model, which is a significant and complex addition. The changes are extensive, touching data preparation, model definition, training scripts, and core components. While the overall approach seems sound, I've identified several critical issues that need to be addressed. These include a missing dependency, incorrect handling of position_ids in the multimodal rotary position embedding logic which will likely cause runtime errors and incorrect model behavior, and bugs in the training and testing scripts. I've also included some suggestions for refactoring to improve code maintainability. Addressing these points will be crucial for the stability and correctness of the new VLM training capabilities.

gemini-code-assist · 2025-08-01T08:32:51Z

specforge/core/eagle3.py

+                input_ids = padding(input_ids, left=False)
+                target = padding(target, left=False)
+                loss_mask = padding(loss_mask, left=False)


The position_ids are computed once before the TTT loop but are not updated within the loop. As input_ids, target, and loss_mask are shifted in each iteration using padding(..., left=False), position_ids should also be updated similarly to maintain correct positional information for subsequent TTT steps. Without this, the rotary embeddings will be computed with stale position information.

Suggested change

input_ids = padding(input_ids, left=False)

target = padding(target, left=False)

loss_mask = padding(loss_mask, left=False)

input_ids = padding(input_ids, left=False)

target = padding(target, left=False)

loss_mask = padding(loss_mask, left=False)

position_ids = padding(position_ids, left=False)

gemini-code-assist · 2025-08-01T08:32:51Z

specforge/data/preprocessing.py

 from tqdm import tqdm
-from transformers import PreTrainedTokenizer
+from transformers import PreTrainedTokenizer,ImageProcessingMixin
+from qwen_vl_utils import process_vision_info


This file imports process_vision_info from qwen_vl_utils. However, qwen_vl_utils does not seem to be part of this repository or a listed dependency. This will cause an ImportError at runtime. Please ensure this utility is added to the repository or included as a dependency.

gemini-code-assist · 2025-08-01T08:32:51Z

specforge/modeling/draft/qwen2_5_vl_eagle.py

+
+        else:
+            lck = len(cache_hidden[0])
+            cos, sin = self.rotary_emb(query_states, position_ids+ lck)


The logic for updating position_ids within the TTT loop by adding lck is incorrect for multimodal inputs. position_ids has a shape of (3, batch_size, seq_len) for Qwen-VL, where each of the 3 components corresponds to different modalities (text, image height, image width). Adding a scalar lck will broadcast incorrectly. Only the text-related position IDs (at index 0) should be offset.

A correct update would be to modify only the text-related part of the position IDs. However, a better approach would be to handle the position updates in the QwenVLOnlineEagle3Model.forward loop, which would simplify the logic here.

Suggested change

cos, sin = self.rotary_emb(query_states, position_ids+ lck)

cos, sin = self.rotary_emb(query_states, position_ids)

gemini-code-assist · 2025-08-01T08:32:51Z

scripts/train_eagle3_online.py

+                    eval_logdict[f"train/ploss_{i}"] = plosses[i].item()
+                for i in range(len(acces)):
+                    eval_logdict[f"train/acc_{i}"] = acces[i]


The wandb logging for evaluation metrics seems to be using the wrong keys. The metrics are logged under train/ploss_{i} and train/acc_{i} which is misleading during the evaluation phase. This should be corrected to eval/ploss_{i} and eval/acc_{i} to accurately reflect that these are evaluation metrics.

Suggested change

eval_logdict[f"train/ploss_{i}"] = plosses[i].item()

for i in range(len(acces)):

eval_logdict[f"train/acc_{i}"] = acces[i]

eval_logdict[f"eval/ploss_{i}"] = plosses[i].item()

for i in range(len(acces)):

eval_logdict[f"eval/acc_{i}"] = acces[i]

gemini-code-assist · 2025-08-01T08:32:52Z

specforge/modeling/draft/qwen2_5_vl_eagle.py

+
+class Qwen2_5_VLForCausalLMEagle3(Eagle3DraftModel):
+
+    config_class = LlamaConfig


The config_class is set to LlamaConfig, but for the Qwen2_5_VLForCausalLMEagle3 model, it should be Qwen2_5_VLConfig to ensure the correct configuration is used when the model is instantiated via AutoConfig.

Suggested change

config_class = LlamaConfig

config_class = Qwen2_5_VLConfig

gemini-code-assist · 2025-08-01T08:32:52Z

tests/test_qwenvl_loss_mask.py

+    print(f"Loss mask sum: {processed['loss_mask'][0].sum()}")
+    loss_mask = processed["loss_mask"][0].squeeze(0).tolist()
+    input_ids = input_ids.squeeze(0)
+    current_mask = input_ids[0]


The current_mask is initialized with input_ids[0], which is the first token ID. It should be initialized with the first value from loss_mask to correctly compare and group tokens for colored printing.

Suggested change

current_mask = input_ids[0]

current_mask = loss_mask[0]

gemini-code-assist · 2025-08-01T08:32:52Z

scripts/train_eagle3_online.py

+            if args.is_vlm:
+                plosses, _, acces = eagle3_model(
+                    input_ids=data["input_ids"].cuda(),
+                    attention_mask=data["attention_mask"].cuda(),
+                    loss_mask=data["loss_mask"].cuda(),
+                    pixel_values=data["pixel_values"].cuda(),
+                    image_grid_thw=data["image_grid_thw"].cuda(),
+                )
+            else:
+                plosses, _, acces = eagle3_model(
+                    input_ids=data["input_ids"].cuda(),
+                    attention_mask=data["attention_mask"].cuda(),
+                    loss_mask=data["loss_mask"].cuda(),
+                )


The model call is duplicated in the training loop for the VLM and non-VLM cases. This can be refactored to reduce code duplication and improve readability by constructing a dictionary of model inputs and then unpacking it for the model call.

model_inputs = { "input_ids": data["input_ids"].cuda(), "attention_mask": data["attention_mask"].cuda(), "loss_mask": data["loss_mask"].cuda(), } if args.is_vlm: model_inputs["pixel_values"] = data["pixel_values"].cuda() model_inputs["image_grid_thw"] = data["image_grid_thw"].cuda() plosses, _, acces = eagle3_model(**model_inputs)

sleepcoo · 2025-08-01T08:40:20Z

Great job!!!!

LugerW-A · 2025-08-06T11:53:38Z

The dataset.map step is very slow, and it hangs when num_proc is greater than 1.

Lzhang-hub · 2025-08-07T01:48:37Z

The dataset.map step is very slow, and it hangs when num_proc is greater than 1.

Can you provide your training script? I processed 30,000 datasets, image max size 2k, which takes about 15 minutes.
I trying use ImageProcessorFast instead ImageProcessor

LugerW-A · 2025-08-07T12:23:30Z

@Lzhang-hub I use the default command with my own data
I found that setting the temperature to 0 produces different inference results than those from the model without Eagle
Using sgl-project/sglang#8801 this pr
Maybe something wrong with mrope?
Log:
Missing validation function mapping in ROPE_VALIDATION_FUNCTIONS for 'rope_type'='mrope'
[2025-08-07 20:54:10] Warning: User-specified context_length (128000) is greater than the derived context_length (8192). This may lead to incorrect model outputs or CUDA errors.

ChiikawaSama · 2025-08-18T14:39:11Z

@ChiikawaSama Did you solve it ?

no, but the overall acc seems correct

* support qwen2_5_vl online * delete nohup * add qwen2.5-vl eagle model * add todo * clean dev code * support batch and fix position_ids bug * add eval wandb metrics * fix eval bug * fix eval dataloader bug * add comment * merge main * rename vlm online eagle3 model name * clean code * fix ttt input embeds bug Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> * fix eval metrics bug * merge qwen-vl draft model to llama3 Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> * fix qwen vl train shell * add timeout config Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> * qwenvl draft input without image embedding Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> * qwenvl draft input without image embedding Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> * Revert "qwenvl draft input without image embedding" This reverts commit 1e8eab8. * fix gitignore * fix wandb error * fix lint --------- Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>

mmdbhs · 2025-08-25T06:52:18Z

Does it work for qwen2.5-vl-3B?

Lzhang-hub · 2025-08-25T08:30:33Z

Does it work for qwen2.5-vl-3B?

@mmdbhs I haven’t tried it, but theoretically it should work. You can give it a try, and if you encounter any problems, feel free to provide feedback at any time.

oswen · 2025-09-09T03:11:57Z

Great job, does it work for qwen2.5-vl-32B?

Lzhang-hub · 2025-09-09T14:34:24Z

Great job, does it work for qwen2.5-vl-32B

It may be need tp, now not supported.

oswen · 2025-09-12T02:50:41Z

When I use the model and startup script you provided to launch the sglang inference instance, I get this error. Is it due to my transformers or sglang version?

Here is my shell:
CUDA_VISIBLE_DEVICES=0 python
-m sglang.launch_server
--model-path /ckpt/qwen2.5-vl-ckpts/Qwen2.5-VL-7B-Instruct
--speculative-draft /ckpt/qwen2.5-vl-ckpts/Qwen2.5-vl-7b-eagle3-sgl-en-zh
--trust-remote-code --chat-template qwen2-vl
--chunked-prefill-size -1
--cuda-graph-max-bs 1
--speculative-algo EAGLE3
--speculative-num-steps 4
--speculative-eagle-topk 6
--speculative-num-draft-tokens 24
--tp 1
--mem-fraction-static 0.7
--host 0.0.0.0
--port 7891

My transformers and sglang version:
sglang 0.5.1
transformers 4.55.2

Lzhang-hub · 2025-09-12T03:06:35Z

@oswen You need install sglang from source. v0.5.1 not support qwen-vl eagle3 infer.

oswen · 2025-09-12T03:11:10Z

@oswen You need install sglang from source. v0.5.1 not support qwen-vl eagle3 infer.

thanks for replying，so which branch of sglang should I chose？the master？

icicle4 · 2025-10-23T04:10:17Z

@oswen You need install sglang from source. v0.5.1 not support qwen-vl eagle3 infer.

thanks for replying，so which branch of sglang should I chose？the master？

pip uninstall sglang

pip install sglang==0.5.3

icicle4 · 2025-10-28T05:13:18Z

After following your procedure above, the ACC of my draft model reached around 0.6. Similarly, I deployed it using SGLang, but when I tested the Accept Length, I found that with the model you provided, the Accept Length could exceed 3.0, whereas mine only reached about 2.2. Do you have any idea what might be causing this difference? @Lzhang-hub

Train Script

prepare data

python scripts/prepare_data.py --dataset allava4v --sample-size 100000 --split-eval

train

bash examples/run_qwen2_5_vl_eagle3_online.sh

Acc result

Test Script

Test your model

python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --speculative-draft-model Rayzl/qwen2.5-vl-7b-eagle3-sgl --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --speculative-algo EAGLE3 --speculative-num-steps 4 --speculative-eagle-topk 6 --speculative-num-draft-tokens 24 --tp 1 --mem-fraction-static 0.7 --host localhost --port 9001

python run_mmstar.py --host http://localhost --port 9001 --parallel 1 --num-questions 100

Result:
Latency: 45.612 s
Output throughput: 141.148 token/s
Accept length: 3.186

Test My trained model

python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --speculative-draft-model ${my_local_dir}/epoch_9 --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --speculative-algo EAGLE3 --speculative-num-steps 4 --speculative-eagle-topk 6 --speculative-num-draft-tokens 24 --tp 1 --mem-fraction-static 0.7 --host localhost --port 9001

python run_mmstar.py --host http://localhost --port 9001 --parallel 1 --num-questions 100

Latency: 60.094 s
Output throughput: 106.566 token/s
Accept length: 2.267

icicle4 · 2025-10-30T03:06:00Z

After following your procedure above, the ACC of my draft model reached around 0.6. Similarly, I deployed it using SGLang, but when I tested the Accept Length, I found that with the model you provided, the Accept Length could exceed 3.0, whereas mine only reached about 2.2. Do you have any idea what might be causing this difference? @Lzhang-hub

Train Script

prepare data

python scripts/prepare_data.py --dataset allava4v --sample-size 100000 --split-eval

train

bash examples/run_qwen2_5_vl_eagle3_online.sh

Acc result
# Test Script ## Test your model python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --speculative-draft-model Rayzl/qwen2.5-vl-7b-eagle3-sgl --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --speculative-algo EAGLE3 --speculative-num-steps 4 --speculative-eagle-topk 6 --speculative-num-draft-tokens 24 --tp 1 --mem-fraction-static 0.7 --host localhost --port 9001
python run_mmstar.py --host http://localhost --port 9001 --parallel 1 --num-questions 100

Result: Latency: 45.612 s Output throughput: 141.148 token/s Accept length: 3.186

Test My trained model

python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct --speculative-draft-model ${my_local_dir}/epoch_9 --trust-remote-code --chat-template qwen2-vl --chunked-prefill-size -1 --cuda-graph-max-bs 1 --speculative-algo EAGLE3 --speculative-num-steps 4 --speculative-eagle-topk 6 --speculative-num-draft-tokens 24 --tp 1 --mem-fraction-static 0.7 --host localhost --port 9001

python run_mmstar.py --host http://localhost --port 9001 --parallel 1 --num-questions 100

Latency: 60.094 s Output throughput: 106.566 token/s Accept length: 2.267

Resolved. It caused due to wrong data preprocess.

cranehuang · 2025-11-10T09:25:47Z

The dataset.map step is very slow, and it hangs when num_proc is greater than 1.

I have the same issue，have you solved it？ thanks @LugerW-A

icicle4 · 2025-11-10T09:28:49Z

The dataset.map step is very slow, and it hangs when num_proc is greater than 1.

I have the same issue，have you solved it？ thanks

No idea. Just accept it, use num_proc=0.

cranehuang · 2025-11-10T09:30:26Z

The dataset.map step is very slow, and it hangs when num_proc is greater than 1.

I have the same issue，have you solved it？ thanks

No idea. Just accept it, use num_proc=0.

OKay，thanks

Abigbigbig · 2025-11-27T06:59:34Z

May I ask if this model can be trained with a 48GB A6000 GPU? I encountered a resource limit exceeded issue during kernel compilation using this GPU.

330205812 · 2025-12-04T09:44:41Z

@LugerW-A @icicle4
use this code to avoid stuck while loading data in main func

def main():
    # ================================================
    # 1. Initialize
    # ================================================
    parser, args = parse_args()
    set_seed(args.seed)
    init_distributed(timeout=args.dist_timeout, tp_size=args.tp_size)
    sanity_check(args)
    print_with_rank("Initialized distributed environment")

    # # ================================================
    # # 2. Build models
    # # ================================================
    # draft_model_config, draft_model = build_draft_model(args)
    # target_model, processor = build_target_model(args, draft_model_config)

    # ================================================
    # 2. Pre-load Processor (CPU only) & Config
    # ================================================
    if args.draft_model_config is None:
        auto_config_path = create_draft_config_from_target(
            target_model_path=args.target_model_path, cache_dir=args.cache_dir
        )
        draft_model_config = AutoDraftModelConfig.from_file(auto_config_path)
    else:
        draft_model_config = AutoDraftModelConfig.from_file(args.draft_model_config)

    processor = None
    if args.is_vlm:
        processor = AutoProcessor.from_pretrained(
            args.target_model_path,
            min_pixels=args.min_pixels,
            max_pixels=args.max_pixels,
        )

    # ================================================
    # 3. Build dataloader
    # ================================================

    train_dataloader, vocab_mapping_path, eval_dataloader = build_dataloaders(
        args, draft_model_config, processor
    )
    
    _, draft_model = build_draft_model(args) 
    
    target_model, _ = build_target_model(args, draft_model_config)
    
    # we load the vocab mapping then
    draft_model.load_vocab_mapping(vocab_mapping_path)
    print_with_rank("Loaded vocab mapping")

    # Calculate total steps if not provided
    if args.total_steps is None:
        steps_per_epoch = math.ceil(
            len(train_dataloader) / args.draft_accumulation_steps
        )
        args.total_steps = args.num_epochs * steps_per_epoch
        print_with_rank(
            f"Auto-calculated total_steps: {args.total_steps} (num_epochs={args.num_epochs} * steps_per_epoch={steps_per_epoch})"
        )
    else:
        print_with_rank(f"Using provided total_steps: {args.total_steps}")
    
    ...other

Lzhang-hub and others added 14 commits July 29, 2025 12:48

support qwen2_5_vl online

01dd714

delete nohup

49c1754

add qwen2.5-vl eagle model

449a7f3

add todo

879f30e

clean dev code

a51ed5a

support batch and fix position_ids bug

cae128d

add eval wandb metrics

3de7f6b

fix eval bug

d25c229

fix eval dataloader bug

8d5d8b0

add comment

da54bf3

Merge branch 'main' into qwen-vl

705a988

merge main

8ab0c2d

rename vlm online eagle3 model name

bc6f27f

clean code

1fb216b

gemini-code-assist bot reviewed Aug 1, 2025

View reviewed changes

Lzhang-hub changed the title ~~Qwen2.5-VL-7B egale3 draft pr~~ [Draft] Qwen2.5-VL-7B egale3 draft pr Aug 1, 2025

gemini-code-assist bot reviewed Aug 1, 2025

View reviewed changes

Lzhang-hub marked this pull request as ready for review August 5, 2025 10:20

Lzhang-hub requested review from FlamingoPg, FrankLeeeee, shuaills, sleepcoo and zyksir as code owners August 5, 2025 10:20

Lzhang-hub mentioned this pull request Aug 5, 2025

Qwen2.5-VL eagle3 infer sgl-project/sglang#8801

Merged

6 tasks

Lzhang-hub changed the title ~~[Draft] Qwen2.5-VL-7B egale3 draft pr~~ Qwen2.5-VL-7B egale3 train Aug 5, 2025

KerwinKai mentioned this pull request Aug 7, 2025

Add COT and VQA Benchmarks #106

Merged

6 tasks

sleepcoo added the high priority label Aug 22, 2025

Lzhang-hub added 2 commits August 22, 2025 07:33

Merge branch 'main' into qwen-vl

44e6e9b

fix wandb error

0557314

sleepcoo approved these changes Aug 22, 2025

View reviewed changes

fix lint

ec956e7

sleepcoo merged commit 1618db7 into sgl-project:main Aug 22, 2025

LJH-LBJ mentioned this pull request Sep 21, 2025

[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl vllm-project/vllm#22872

Merged

4 tasks

icicle4 mentioned this pull request Oct 30, 2025

[Question] A little doubt about the draft training of VL model #260

Closed

C3236455482 mentioned this pull request Nov 19, 2025

EAGLE3 on Qwen2.5-VL / Qwen3-VL shows extremely low accept length (accept_len ≈ 1) #310

Open

C3236455482 mentioned this pull request Dec 3, 2025

[Bug] args.build-dataset-num-proc doesn't work #349

Open

5 tasks

This was referenced Dec 5, 2025

Qwen2_5 VL training #325

Closed

[Feature] Add accept length simulator for QwenVL #279

Open

Lzhang-hub mentioned this pull request Dec 9, 2025

[VLM] support qwen3-vl eagle infer sgl-project/sglang#13918

Open

4 tasks

	cos, sin = self.rotary_emb(query_states, position_ids+ lck)
	cos, sin = self.rotary_emb(query_states, position_ids)


		class Qwen2_5_VLForCausalLMEagle3(Eagle3DraftModel):

		config_class = LlamaConfig

Conversation

Lzhang-hub commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

prepare data

train

acc

speedup

Train scripts

Note:

TODO

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

sleepcoo commented Aug 1, 2025

Uh oh!

LugerW-A commented Aug 6, 2025

Uh oh!

Lzhang-hub commented Aug 7, 2025

Uh oh!

LugerW-A commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChiikawaSama commented Aug 18, 2025

Uh oh!

mmdbhs commented Aug 25, 2025

Uh oh!

Lzhang-hub commented Aug 25, 2025

Uh oh!

oswen commented Sep 9, 2025

Uh oh!

Lzhang-hub commented Sep 9, 2025

Uh oh!

oswen commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lzhang-hub commented Sep 12, 2025

Uh oh!

oswen commented Sep 12, 2025

Uh oh!

icicle4 commented Oct 23, 2025

Uh oh!

icicle4 commented Oct 28, 2025

Train Script

prepare data

train

Test Script

Test your model

Test My trained model

Lzhang-hub commented Aug 1, 2025 •

edited

Loading

LugerW-A commented Aug 7, 2025 •

edited

Loading

oswen commented Sep 12, 2025 •

edited

Loading

cranehuang commented Nov 10, 2025 •

edited

Loading