Skip to content

Revert logits to float change for accuracy#1839

Merged
regisss merged 1 commit into
transformers_4_49from
mixtral_text_gen
Mar 11, 2025
Merged

Revert logits to float change for accuracy#1839
regisss merged 1 commit into
transformers_4_49from
mixtral_text_gen

Conversation

@yeonsily
Copy link
Copy Markdown
Collaborator

@yeonsily yeonsily commented Mar 10, 2025

What does this PR do?

We noticed that the mixtral output is generated diffrently compare to the previous release and it's because of logtis dtype change.

<style> </style>
v1.15.0 v1.16-release transformers_4_49 reference
DeepSpeed is a machine learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a DeepSpeed is a machine learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a DeepSpeed is a machine learning framework that enables training of large models on a single machine with a single GPU. It is designed to be easy to use and to provide high performance.\n\nDeepSpeed is a machine learning framework that enables training of large models on a single machine with a single GPU. It is designed to be easy to use and to provide high performance.\n\nDeepSpeed is a machine learning framework that enables training of large models on a single machine with a single GPU. It is designed to be easy to use and DeepSpeed is a machine learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a single machine with multiple GPUs. It is designed to be easy to use and efficient, and it supports a wide range of models and tasks.\n\nDeepSpeed is a deep learning framework that enables training of large models on a

logits handled as float on v1.16-release

https://github.com/huggingface/optimum-habana/blob/v1.16-release/optimum/habana/transformers/models/mixtral/modeling_mixtral.py#L801

but float() is removed as comment now

https://github.com/huggingface/optimum-habana/blob/transformers_4_49/optimum/habana/transformers/models/mixtral/modeling_mixtral.py#L793

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@yeonsily yeonsily requested a review from libinta March 10, 2025 23:30
@yeonsily yeonsily requested a review from regisss as a code owner March 10, 2025 23:30
Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I thought I added it back to Mixtral too, good catch

@regisss regisss merged commit 9581480 into transformers_4_49 Mar 11, 2025
@regisss regisss deleted the mixtral_text_gen branch March 11, 2025 08:39
yafshar added a commit to yafshar/optimum-habana that referenced this pull request May 19, 2025
Aligned test outputs with latest expected results
Reference: huggingface#1839

Co-authored-by: Iman Gohari <s.m.iman.gohari@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants