Skip to content

fix is casual for qwen3#3213

Merged
danielhanchen merged 1 commit intounslothai:mainfrom
leizhenyuan:zhenyuan_is_casual_qwen3
Aug 26, 2025
Merged

fix is casual for qwen3#3213
danielhanchen merged 1 commit intounslothai:mainfrom
leizhenyuan:zhenyuan_is_casual_qwen3

Conversation

@leizhenyuan
Copy link
Copy Markdown
Contributor

This issue is the same for #2868
As we found when training qwen3, is_casual is not properly set.

@Datta0
Copy link
Copy Markdown
Collaborator

Datta0 commented Aug 26, 2025

That's a nice observation. Probably not identified thus far cuz FA and xformers take precedence. This only happens in the niche case where both those don't exist and we don't pass mask

@danielhanchen danielhanchen merged commit e45ddb5 into unslothai:main Aug 26, 2025
@Datta0
Copy link
Copy Markdown
Collaborator

Datta0 commented Aug 26, 2025

But this might need to be changed across the entire spectrum
Like mistral, gemma etc as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants