Skip to content

Qwen3 inference fixes#2436

Merged
shimmyshimmer merged 1 commit intounslothai:mainfrom
Datta0:qwen3_support
Apr 30, 2025
Merged

Qwen3 inference fixes#2436
shimmyshimmer merged 1 commit intounslothai:mainfrom
Datta0:qwen3_support

Conversation

@Datta0
Copy link
Copy Markdown
Collaborator

@Datta0 Datta0 commented Apr 30, 2025

  • Refactored LlamaModel_fast_forward_inference to make components customisable.
  • Created a function with same as previous arguments to ensure backwards compatibility.
  • Tested with meta-llama/Llama-3.1-8B-Instruct, mistralai/Mistral-7B-Instruct-v0.3 and Qwen/Qwen3-4B
inputs = tokenizer(
[
    "Explain Neural Networks in simple terms."
], return_tensors = "pt").to("cuda")
output = model.generate(**inputs,max_new_tokens = 128, output_hidden_states = True, temperature=1e-5)

Note: Left is unsloth and right is HF Transformers.

Qwen/Qwen3-4B
image

mistralai/Mistral-7B-Instruct-v0.3
image

meta-llama/Llama-3.1-8B-Instruct
image

@shimmyshimmer shimmyshimmer merged commit 2a65066 into unslothai:main Apr 30, 2025
@shimmyshimmer
Copy link
Copy Markdown
Collaborator

Amazing thanks @Datta0

@Datta0 Datta0 deleted the qwen3_support branch July 26, 2025 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants