fix gemma-2-27b text generation pytest by skaulintel · Pull Request #1828 · huggingface/optimum-habana

skaulintel · 2025-03-06T22:20:07Z

fixes the following pytest

python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s -k "gemma-2-27b and test_text_generation_bf16_1x" --token=****

without it, get the following assertionerror

E           AssertionError: assert False
E            +  where False = <built-in function eq>('DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep', 'DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient')

conftest.py:74: AssertionError
========================================================================================== short test summary info ===========================================================================================
FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True] - AssertionError: assert False

regisss · 2025-03-07T13:15:12Z

I don't think there is an issue with Gemma2. The reason why I added the code block

if self.config.final_logit_softcapping is not None:
    ...

is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

    
           "output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

uartie · 2025-03-07T16:25:50Z

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

You can use rebase to update baseline:

python -m pytest --rebase tests/test_text_generation_example.py::test_text_generation_bf16_1x[google/gemma-2-27b-1-False-True]

skaulintel · 2025-03-07T19:04:54Z

I don't think there is an issue with Gemma2. The reason why I added the code block
if self.config.final_logit_softcapping is not None:
    ...
is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

it makes sense, but there seems to be a lot of repetition. The output before this change seemed a little better.

regisss · 2025-03-07T21:15:49Z

This happens with greedy search, especially with models that have not been instruction fine-tuned. I'll take a look to see how to get more realistic results by tweaking a few generation parameters.

Luca-Calabria · 2025-03-07T21:52:59Z

I don't think there is an issue with Gemma2. The reason why I added the code block
if self.config.final_logit_softcapping is not None:
    ...
is because it has been in Transformers since Gemma2 was added. I'm not sure why it was not included here in #1280 and #1504 (any idea @billishyahao @Luca-Calabria ?).

final_logit_softcapping is actually specified in the configuration of the model so this piece of code is indeed used.

Moreover, the output of the model with this change still makes sense:

DeepSpeed is a machine learning framework that allows you to train deep learning models at any scale, from a single GPU to thousands of GPUs. It is a system that allows you to train models in a distributed environment.\n\nDeepSpeed is a deep learning training system that allows you to train models in a distributed environment. It is a system that allows you to train models in a distributed environment.\n\nThe DeepSpeed system is a deep learning training system that is designed to help you train deep learning models in a distributed environment.\n\nThe Deep

I think what we should do here is rather to update the baseline here:

optimum-habana/tests/baselines/fixture/tests/test_text_generation_example.json

Line 225 in 74ee9e6

"output": "DeepSpeed is a machine learning framework that enables you to train models with trillions of parameters and beyond, using model parallelism to partition large models over multiple GPUs.\n\nThe following is a brief introduction to the DeepSpeed model parallel training.\n\n<h2>1. Introduction</h2>\n\nThe DeepSpeed model parallel training is a simple and effective way to train large models. It is a framework that enables you to train models with trillions of parameters and beyond.\n\nDeepSpeed is a distributed deep learning optimization toolkit that makes it easy and efficient",

I have not a clear answer why it was not part of Gemma2 enabling PRs, but if this block was part of transformers and was not integrated on Gemma2 for Gaudi then it is something to add.
The baseline should be changed accordingly the new output.

regisss · 2025-03-10T10:47:21Z

@skaulintel It seems casting the logits to float when they are extracted from the forward pass of the model solves it: 02c4aa0#diff-c7b7c0b91ade41a0c87f1ad1f6784e4d51fb88c6a65f350042aca052b7ca1558R960

This used to be done in previous versions of Transformers. Now they have removed it but it seems it slightly affects a few models on Gaudi. So I reverted this change in the commit posted above. Closing this PR.

skaulintel · 2025-03-10T18:16:12Z

@skaulintel It seems casting the logits to float when they are extracted from the forward pass of the model solves it: 02c4aa0#diff-c7b7c0b91ade41a0c87f1ad1f6784e4d51fb88c6a65f350042aca052b7ca1558R960

This used to be done in previous versions of Transformers. Now they have removed it but it seems it slightly affects a few models on Gaudi. So I reverted this change in the commit posted above. Closing this PR.

So do we need to update the corresponding unit test?

regisss · 2025-03-10T18:25:50Z

So do we need to update the corresponding unit test?

Nope, since it generates the exact same output as before when using the cast to float

skaulintel · 2025-03-10T22:23:17Z

So do we need to update the corresponding unit test?

Nope, since it generates the exact same output as before when using the cast to float

That doesn't seem to be the case for me. I collected some data on gaudi3:

transformers_4_49 commit 6edca72:

'DeepSpeed is a machine learning framework that enables you to train large models on a single GPU. It is a framework that is used to train large models on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea of \u200b\u200bthe algorithm is to use the gradient of the loss function to update the model parameters.\n\nThe main idea of \u200b\u200bthe algorithm is to use the gradient of the loss function to update the model parameters.\n\nThe main idea of'

transformers_4_49 commit 11140b2:

'DeepSpeed is a machine learning framework that is designed to help you train your models faster and more efficiently. It is a collection of multi-GPU training techniques that can be used together or separately to improve the performance of your model.\n\nDeepSpeed is a system that allows you to train your models faster and more efficiently.\n\n<h2>What is DeepSpeed?</h2>\n\nDeepSpeed is a deep learning optimization toolkit that makes it easier to enable and customize deep learning optimization. It offers 1-2.5x speed increase compared to other'

reference, which i think we should update? :

"DeepSpeed is a machine learning framework that enables you to train large models on a single GPU. It is a framework that is used to train large models on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nDeepSpeed is a framework that allows you"

regisss · 2025-03-11T08:40:24Z

That doesn't seem to be the case for me. I collected some data on gaudi3:

transformers_4_49 commit 6edca72:

'DeepSpeed is a machine learning framework that enables you to train large models on a single GPU. It is a framework that is used to train large models on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea of \u200b\u200bthe algorithm is to use the gradient of the loss function to update the model parameters.\n\nThe main idea of \u200b\u200bthe algorithm is to use the gradient of the loss function to update the model parameters.\n\nThe main idea of'

transformers_4_49 commit 11140b2:

'DeepSpeed is a machine learning framework that is designed to help you train your models faster and more efficiently. It is a collection of multi-GPU training techniques that can be used together or separately to improve the performance of your model.\n\nDeepSpeed is a system that allows you to train your models faster and more efficiently.\n\n<h2>What is DeepSpeed?</h2>\n\nDeepSpeed is a deep learning optimization toolkit that makes it easier to enable and customize deep learning optimization. It offers 1-2.5x speed increase compared to other'

reference, which i think we should update? :

"DeepSpeed is a machine learning framework that enables you to train large models on a single GPU. It is a framework that is used to train large models on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nThe main idea is to use a large amount of memory to fit the model on a single GPU.\n\nDeepSpeed is a framework that allows you"

I thought I added the change for Mixtral too, that was not the case, #1839 should solve it

edit: ah wait this is gemma2, let me see

edit2: okay I only used Gaudi2, that's why I didn't meet the same issue. I just pushed 96c8a32 to correct the Gaudi3 baseline, let me know if that works for you

skaulintel · 2025-03-11T17:36:20Z

python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s -k "gemma-2-27b and test_text_generation_bf16_1x" --token=

Yes, it works for me now. Thanks!

fix gemma-2-27b text generation pytest

25d8cb7

skaulintel requested a review from regisss as a code owner March 6, 2025 22:20

skaulintel requested review from hsubramony, jiminha, libinta and shepark March 6, 2025 22:20

libinta added the transformers_4_49 label Mar 7, 2025

regisss closed this Mar 10, 2025

regisss deleted the skaulintel/gemma2_pytest_fix branch March 11, 2025 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gemma-2-27b text generation pytest#1828

fix gemma-2-27b text generation pytest#1828
skaulintel wants to merge 1 commit into
transformers_4_49from
skaulintel/gemma2_pytest_fix

skaulintel commented Mar 6, 2025

Uh oh!

regisss commented Mar 7, 2025

Uh oh!

uartie commented Mar 7, 2025

Uh oh!

skaulintel commented Mar 7, 2025

Uh oh!

regisss commented Mar 7, 2025

Uh oh!

Luca-Calabria commented Mar 7, 2025

Uh oh!

regisss commented Mar 10, 2025

Uh oh!

skaulintel commented Mar 10, 2025

Uh oh!

regisss commented Mar 10, 2025

Uh oh!

skaulintel commented Mar 10, 2025 •

edited

Loading

Uh oh!

regisss commented Mar 11, 2025 •

edited

Loading

Uh oh!

skaulintel commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

skaulintel commented Mar 6, 2025

Uh oh!

regisss commented Mar 7, 2025

Uh oh!

uartie commented Mar 7, 2025

Uh oh!

skaulintel commented Mar 7, 2025

Uh oh!

regisss commented Mar 7, 2025

Uh oh!

Luca-Calabria commented Mar 7, 2025

Uh oh!

regisss commented Mar 10, 2025

Uh oh!

skaulintel commented Mar 10, 2025

Uh oh!

regisss commented Mar 10, 2025

Uh oh!

skaulintel commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skaulintel commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

skaulintel commented Mar 10, 2025 •

edited

Loading

regisss commented Mar 11, 2025 •

edited

Loading