Add support for Qwen qwq-32B and qwen2 #68

wpybtw · 2025-07-28T02:21:05Z

I am trying add support on Qwen qwq-32B.
Qwq-32B is already trained with a few epochs.
As qwq-32B has the same architecture as Qwen2. So I believe it also works for Qwen2

gemini-code-assist

Summary of Changes

Hello @wpybtw, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces full support for the Qwen2 and Qwen qwq-32B language models. It integrates these models into the system's architecture by providing both a specialized 'Eagle3 draft model' implementation for efficient speculative decoding and a 'distributed target model' for scalable training and inference. This expansion allows users to leverage the Qwen family of models within the framework, complete with necessary configurations and an example training script.

Highlights

New Model Support: Added comprehensive support for Qwen2 and Qwen qwq-32B models, enabling their integration into the existing framework.
Eagle3 Draft Model Integration: Introduced Qwen2ForCausalLMEagle3 to allow Qwen2 models to function as draft models within the Eagle3 architecture, complete with custom attention, MLP, and normalization layers.
Distributed Target Model Integration: Implemented Qwen2ForCausalLM as a distributed target model, leveraging tensor parallelism for efficient training and inference of Qwen2-based models.
Configuration and Example Script: Provided a new configuration file (qwq-32B-eagle3.json) and an example training script (run_qwq_eagle3_online.sh) to facilitate the setup and training of the new Qwen models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Qwen qwq-32B and Qwen2 models. It includes a new configuration file, an example training script, and the core model implementations for both the Eagle draft model and the distributed target model. The changes are well-structured and comprehensive. My feedback focuses on improving code clarity, correctness of type hints and docstrings, and adherence to best practices in both Python and shell scripting.

examples/run_qwq_eagle3_online.sh

specforge/modeling/draft/qwen2_eagle.py

+import torch.nn.functional as F
+from transformers import GenerationMixin, Qwen2Config, PreTrainedModel
+from transformers.activations import ACT2FN
+from transformers.models.qwen2.configuration_qwen2 import Qwen2Config


gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/draft/qwen2_eagle.py

+        # if last:
+        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+        # else:
+        #     self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size * 2, bias=False)


This block of code is commented out. To improve code clarity and remove dead code, please remove these lines.

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/draft/qwen2_eagle.py

+    ) -> Tuple[
+        torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]
+    ]:


The return type hint Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]] does not match the actual return value, which is a single torch.FloatTensor (hidden_states). Please correct the type hint to match the implementation.

) -> torch.FloatTensor:

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/draft/qwen2_eagle.py

+        """
+        Arguments:
+            hidden_states (`torch.FloatTensor`): input to the layer, cat low, mid high hidden_states of shape `(batch, seq_len, hidden_states * 3)`
+            input_ids (`torch.LongTensor`): input ids of shape `(batch, seq_len)`


The docstring for the forward method mentions input_ids, but this is not an argument to the function. The function takes inputs_embeds. Please update the docstring to accurately reflect the function's parameters.

inputs_embeds (`torch.Tensor`): embedded input tokens of shape `(batch, seq_len, hidden_size)`

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/target/qwen2.py

+        past_key_value: Optional[Cache] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[FlashAttentionKwargs],
+    ) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:


The return type hint tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]] appears to be incorrect. The function returns attn_output, attn_weights, which corresponds to tuple[torch.Tensor, Optional[torch.Tensor]]. The third element in the hint seems to be for past_key_value, which is not returned here.

Suggested change

) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:

) -> tuple[torch.Tensor, Optional[torch.Tensor]]:

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/target/qwen2.py

+    ) -> tuple[
+        torch.FloatTensor, Optional[tuple[torch.FloatTensor, torch.FloatTensor]]
+    ]:


The return type hint tuple[torch.FloatTensor, Optional[tuple[torch.FloatTensor, torch.FloatTensor]]] does not match the returned value. The function returns a tuple containing hidden_states and optionally self_attn_weights. A more accurate type hint would be tuple[torch.FloatTensor, Optional[torch.FloatTensor]].

) -> tuple[torch.FloatTensor, Optional[torch.FloatTensor]]:

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/target/qwen2.py

+        all_hidden_states = () if output_hidden_states else None
+        all_self_attns = () if output_attentions else None
+
+        for decoder_layer in self.layers[: self.config.num_hidden_layers]:


The slice [: self.config.num_hidden_layers] is redundant because self.layers is already initialized with num_hidden_layers elements. You can simplify this by iterating directly over self.layers.

for decoder_layer in self.layers:

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/target/qwen2.py

+        >>> model = Qwen2ForCausalLM.from_pretrained("meta-qwen2/Qwen2-2-7b-hf")
+        >>> tokenizer = AutoTokenizer.from_pretrained("meta-qwen2/Qwen2-2-7b-hf")


The example in the docstring uses meta-qwen2/Qwen2-2-7b-hf. Given that this PR adds support for Qwen/QwQ-32B (as seen in the example script), it would be more consistent to use that model in the example.

>>> model = Qwen2ForCausalLM.from_pretrained("Qwen/QwQ-32B") >>> tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")

gemini-code-assist · 2025-07-28T02:23:09Z

specforge/modeling/target/qwen2.py

+    "Qwen2ForSequenceClassification",
+    "Qwen2ForTokenClassification",
+    "Qwen2ForQuestionAnswering",
+]


This file is missing a final newline character. It's a common convention and good practice to end files with a newline.

sleepcoo · 2025-07-28T02:53:13Z

Can you add a new test case like in the tests/test_llama4_tp.py?

wpybtw · 2025-07-28T07:19:27Z

Can you add a new test case like in the tests/test_llama4_tp.py?

Sure. Done

sleepcoo · 2025-07-28T07:59:06Z

The code seems fine. I'll try training it later. Have you finished the training? Do you have any performance data, like the acceptance length?

ZhengHSI · 2025-07-29T03:27:30Z

examples/run_qwq_eagle3_online.sh

Could you provide an example in the README.md?

ZhengHSI · 2025-07-29T09:00:27Z

examples/run_qwq_eagle3_online.sh

https://huggingface.co/w497273/qwq-32b-eagle3/tree/main
Is this model weight obtained by training with this script?

Yes. I run two epoch of training on ultachat. The weight is https://huggingface.co/w497273/qwq-32b-eagle3/tree/main

I testest with

python3 -m sglang.launch_server --model /home/jovyan/computational_math/w00802858/QwQ-32B \ --speculative-algorithm EAGLE3 \ --speculative-draft-model-path /home/jovyan/pvc-shared/computational_math/w00802858/SpecForge/outputs/QwQ-32B-eagle3/epoch_1 \ --speculative-num-steps 4 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --mem-fraction 0.7 \ --disable-radix-cache --tp 2 python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --warmup-requests 0 --num-prompt 1 --max-concurrency 1

The gobal accept len (self.cum_spec_accept_length /self.cum_spec_accept_count ) is 1.67.

FrankLeeeee · 2025-07-31T06:36:13Z

Can you run pre-commit?

sleepcoo · 2025-07-31T06:41:02Z

The sglang not supports Qwen2ForCausalLMEagle3 at present. Do you use qwen2 as draft because the draft accepting length of qwen2 is higher?
@wpybtw

wpybtw · 2025-08-01T01:56:30Z

The sglang not supports Qwen2ForCausalLMEagle3 at present. Do you use qwen2 as draft because the draft accepting length of qwen2 is higher? @wpybtw

I quickly add Qwen2ForCausalLMEagle3 in Sgalng by modify LlamaForCausalLMEagle3 ( wpybtw/sglang@910fca1 )

What do you mean by "Do you use qwen2 as draft because the draft accepting length of qwen2 is higher?"

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

wpybtw · 2025-08-01T02:44:04Z

Quote reply
Referenc

done

rXYZkit · 2025-08-08T10:00:54Z

specforge/modeling/target/qwen2.py

+from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
+from transformers.models.qwen2.configuration_qwen2 import Qwen2Config
+from transformers.models.qwen2.modeling_qwen2 import (
+    KwargsForCausalLM,


KwargsForCausalLM is not support after transform=4.53.

sleepcoo · 2025-08-18T02:27:29Z

Hello, can you fix the conflict and transform = 4.53bug? I have reviewed the code and can merge it.
@wpybtw

ZhengHSI · 2025-08-20T09:12:28Z

Hello, can you fix the conflict and transform = 4.53bug? I have reviewed the code and can merge it. @wpybtw

@wpybtw please try to fix it

ZhengHSI · 2025-08-21T05:56:33Z

This PR will be closed, and a new PR is available at #163.

gemini-code-assist bot reviewed Jul 28, 2025

View reviewed changes

wpybtw force-pushed the qwq branch from c41a827 to 3c0e8aa Compare July 28, 2025 02:27

KerwinKai mentioned this pull request Jul 28, 2025

[feat] Support EAGLE3 for Qwen2 sgl-project/sglang#8434

Closed

sleepcoo requested a review from ZhengHSI July 28, 2025 07:22

ZhengHSI reviewed Jul 29, 2025

View reviewed changes

examples/run_qwq_eagle3_online.sh

Copy link

Collaborator

ZhengHSI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example in the README.md?

ZhengHSI reviewed Jul 29, 2025

View reviewed changes

sleepcoo requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners July 31, 2025 06:35

wpybtw and others added 7 commits August 1, 2025 02:10

Add support for qwq and qwen2

2ec57dd

clean

2682477

clean

c1730c6

Update examples/run_qwq_eagle3_online.sh

bfec5c4

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add test_qwen2_tp

1d08341

remove wandb args

1e69c32

add in example and run pre-commit

56aeb1a

wpybtw force-pushed the qwq branch from 74f26fd to 56aeb1a Compare August 1, 2025 02:18

fix for pre-commit

bb29437

rXYZkit reviewed Aug 8, 2025

View reviewed changes

KerwinKai mentioned this pull request Aug 15, 2025

[feat] Support EAGLE3 for Qwen2 sgl-project/sglang#9216

Merged

ZhengHSI mentioned this pull request Aug 21, 2025

Add support for Qwen qwq-32B and qwen2 #163

Merged

6 tasks

ZhengHSI closed this Aug 21, 2025

	) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:
	) -> tuple[torch.Tensor, Optional[torch.Tensor]]:

		>>> model = Qwen2ForCausalLM.from_pretrained("meta-qwen2/Qwen2-2-7b-hf")
		>>> tokenizer = AutoTokenizer.from_pretrained("meta-qwen2/Qwen2-2-7b-hf")

Add support for Qwen qwq-32B and qwen2 #68

Add support for Qwen qwq-32B and qwen2 #68

Uh oh!

Conversation

wpybtw commented Jul 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sleepcoo commented Jul 28, 2025

Uh oh!

wpybtw commented Jul 28, 2025

Uh oh!

sleepcoo commented Jul 28, 2025

Uh oh!

ZhengHSI Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ZhengHSI Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

wpybtw Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

FrankLeeeee commented Jul 31, 2025

Uh oh!

sleepcoo commented Jul 31, 2025

Uh oh!

wpybtw commented Aug 1, 2025

Uh oh!

wpybtw commented Aug 1, 2025

Uh oh!

rXYZkit Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

sleepcoo commented Aug 18, 2025

Uh oh!

ZhengHSI commented Aug 20, 2025

Uh oh!

ZhengHSI commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone