Skip to content

fix mistral and mistral3 tests#38978

Merged
ydshieh merged 4 commits intomainfrom
fix_mistral
Jun 23, 2025
Merged

fix mistral and mistral3 tests#38978
ydshieh merged 4 commits intomainfrom
fix_mistral

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jun 23, 2025

What does this PR do?

Mostly update expected values and use cleanup to avoid OOM.

See some details in a few comments.



@require_torch_accelerator
@require_read_token
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use this at class level now

Comment on lines 347 to 352
@classmethod
def tearDownClass(cls):
del cls._model
cleanup(torch_device, gc_collect=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should delete class attributes that containing models at the end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh woww for sure! IMO it's super bad practice to add it as a class attribute, can we switch it to instance attribute in setUpClass maybe? That should avoid several potential pitfalls, and we should not need tearDownCLass anymore I think (object should be cleaned up automatically)

def setUp(self):
self.model_dtype = torch.float16
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
cleanup(torch_device, gc_collect=True)
Copy link
Collaborator Author

@ydshieh ydshieh Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #33657, we get GPU OOM on T4 for the 2 tests in this test class.

It could be fixed by adding cleanup here, but maybe it's better to see if this is somehow a regression (I think it's probably not a serious issue however). cc @Cyrilvallez

You can reproduce (on T4) by running

HF_HUB_READ_TOKEN=xxx RUN_SLOW=1 python3 -m pytest -v tests/models/mistral/test_modeling_mistral.py::MistralIntegrationTest tests/models/mistral/test_modeling_mistral.py::Mask4DTestHard

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that affected by assisted decoding or the link is wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damm, sorry

it is

Loading optimizations (#36742)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that we need this in setUp when we already have it in tearDown anyway 🤔 It is still the case when moving the model to instance attr instead of class attr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, for future reference:

it's not very clear, but the extra cleanup added in setup is necessary to avoid OOM if the test test_speculative_generation is failing (due to mismatching of outputs). If it was passing, we don't need this extra cleanup.

I just want to avoid this kind of OOM even some other tests are failing, so better to always have a pair of cleanup

@slow
def test_speculative_generation(self):
EXPECTED_TEXT_COMPLETION = "My favourite condiment is 100% ketchup. I love it on everything. I’m not a big"
EXPECTED_TEXT_COMPLETION = "My favourite condiment is 100% Sriracha. I love it on everything. I have it on my"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always failing up to now

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 364 to 365
("cuda", 7): 'The image features two tabby cats lying on a pink surface, which appears to be a couch or',
("cuda", 8): 'The image features two cats lying on a pink surface, which appears to be a couch or a bed',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #37882 These values are not updated, but I remembered I checked them on CI runners.
From CI reports, it looks like we get different outputs every day, but I can't reproduce that: I ran them several times and even trigger via workflow runs, all give the same results now (and passing with the updated values).

TODO: monitor in the next few days to see if we get them passing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I again different values today v.s. the one I get yesterday, wtf .... 😭

Copy link
Collaborator Author

@ydshieh ydshieh Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert the changes on mistral3 and dive into them in a separate PR.
Let's keep this PR only for mistral

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe we can set_seed or the order of test-cases matters because we use the same cls.model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order of tests here is the same (unless we add , remove or skip some tests).

It seems to me that, only if I rebuild the docker image, we will see the outputs change. If I use the same docker image built, no matter how I run (manually in SSH runners or trigger by GitHub Actions), they all give the same outputs.

I am still checking

Copy link
Collaborator Author

@ydshieh ydshieh Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp if you ever want to hear this sad sorry (at least for me)

https://huggingface.slack.com/archives/C01NE71C4F7/p1750885359361689

{"type": "image", "url": "https://huggingface.co/ydshieh/kosmos-2.5/resolve/main/view.jpg"},
{
"type": "image",
"url": "https://huggingface.co/datasets/hf-internal-testing/testing-data-mistral3/resolve/main/view.jpg",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just move the files to a better hub repository

@ydshieh ydshieh requested a review from zucchini-nlp June 23, 2025 09:03
@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 23, 2025

run-slow: mistral, mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/mistral', 'models/mistral3']
quantizations: [] ...

@ydshieh ydshieh requested a review from Cyrilvallez June 23, 2025 09:33
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgmt

Comment on lines 364 to 365
("cuda", 7): 'The image features two tabby cats lying on a pink surface, which appears to be a couch or',
("cuda", 8): 'The image features two cats lying on a pink surface, which appears to be a couch or a bed',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe we can set_seed or the order of test-cases matters because we use the same cls.model?

def setUp(self):
self.model_dtype = torch.float16
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
cleanup(torch_device, gc_collect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that affected by assisted decoding or the link is wrong?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 23, 2025

run-slow: mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/mistral3']
quantizations: [] ...

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! In general, I think that we should abslutely avoid models as class attributes, as it will lead to many many potential issues! Not sure why it was added as class attribute without proper cleanup before, but I'd like to revert this if possible 🤗

Comment on lines 347 to 352
@classmethod
def tearDownClass(cls):
del cls._model
cleanup(torch_device, gc_collect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh woww for sure! IMO it's super bad practice to add it as a class attribute, can we switch it to instance attribute in setUpClass maybe? That should avoid several potential pitfalls, and we should not need tearDownCLass anymore I think (object should be cleaned up automatically)

def setUp(self):
self.model_dtype = torch.float16
self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
cleanup(torch_device, gc_collect=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird that we need this in setUp when we already have it in tearDown anyway 🤔 It is still the case when moving the model to instance attr instead of class attr?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 23, 2025

run-slow: mistral3

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/mistral3']
quantizations: [] ...

@ydshieh ydshieh merged commit 2ce02b9 into main Jun 23, 2025
15 checks passed
@ydshieh ydshieh deleted the fix_mistral branch June 23, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants