fix `mistral` and `mistral3` tests by ydshieh · Pull Request #38978 · huggingface/transformers

ydshieh · 2025-06-23T08:44:36Z

What does this PR do?

Mostly update expected values and use cleanup to avoid OOM.

See some details in a few comments.

ydshieh · 2025-06-23T08:45:30Z

tests/models/mistral/test_modeling_mistral.py



 @require_torch_accelerator
+@require_read_token


we can use this at class level now

ydshieh · 2025-06-23T08:48:22Z

tests/models/mistral/test_modeling_mistral.py

+    @classmethod
+    def tearDownClass(cls):
+        del cls._model
+        cleanup(torch_device, gc_collect=True)


we should delete class attributes that containing models at the end

Oh woww for sure! IMO it's super bad practice to add it as a class attribute, can we switch it to instance attribute in setUpClass maybe? That should avoid several potential pitfalls, and we should not need tearDownCLass anymore I think (object should be cleaned up automatically)

ydshieh · 2025-06-23T08:54:09Z

tests/models/mistral/test_modeling_mistral.py

    def setUp(self):
        self.model_dtype = torch.float16
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        cleanup(torch_device, gc_collect=True)


~~After #33657~~, we get GPU OOM on T4 for the 2 tests in this test class.

It could be fixed by adding cleanup here, but maybe it's better to see if this is somehow a regression (I think it's probably not a serious issue however). cc @Cyrilvallez

You can reproduce (on T4) by running

HF_HUB_READ_TOKEN=xxx RUN_SLOW=1 python3 -m pytest -v tests/models/mistral/test_modeling_mistral.py::MistralIntegrationTest tests/models/mistral/test_modeling_mistral.py::Mask4DTestHard

Is that affected by assisted decoding or the link is wrong?

damm, sorry

it is

Loading optimizations (#36742)

It's weird that we need this in setUp when we already have it in tearDown anyway 🤔 It is still the case when moving the model to instance attr instead of class attr?

ok, for future reference:

it's not very clear, but the extra cleanup added in setup is necessary to avoid OOM if the test test_speculative_generation is failing (due to mismatching of outputs). If it was passing, we don't need this extra cleanup.

I just want to avoid this kind of OOM even some other tests are failing, so better to always have a pair of cleanup

ydshieh · 2025-06-23T08:56:15Z

tests/models/mistral/test_modeling_mistral.py

    @slow
    def test_speculative_generation(self):
-        EXPECTED_TEXT_COMPLETION = "My favourite condiment is 100% ketchup. I love it on everything. I’m not a big"
+        EXPECTED_TEXT_COMPLETION = "My favourite condiment is 100% Sriracha. I love it on everything. I have it on my"


always failing up to now

HuggingFaceDocBuilderDev · 2025-06-23T08:57:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2025-06-23T09:01:02Z

tests/models/mistral3/test_modeling_mistral3.py

+                ("cuda", 7): 'The image features two tabby cats lying on a pink surface, which appears to be a couch or',
+                ("cuda", 8): 'The image features two cats lying on a pink surface, which appears to be a couch or a bed',


In #37882 These values are not updated, but I remembered I checked them on CI runners.
From CI reports, it looks like we get different outputs every day, but I can't reproduce that: I ran them several times and even trigger via workflow runs, all give the same results now (and passing with the updated values).

TODO: monitor in the next few days to see if we get them passing.

well, I again different values today v.s. the one I get yesterday, wtf .... 😭

I will revert the changes on mistral3 and dive into them in a separate PR.
Let's keep this PR only for mistral

hm, maybe we can set_seed or the order of test-cases matters because we use the same cls.model?

the order of tests here is the same (unless we add , remove or skip some tests).

It seems to me that, only if I rebuild the docker image, we will see the outputs change. If I use the same docker image built, no matter how I run (manually in SSH runners or trigger by GitHub Actions), they all give the same outputs.

I am still checking

@zucchini-nlp if you ever want to hear this sad sorry (at least for me)

https://huggingface.slack.com/archives/C01NE71C4F7/p1750885359361689

ydshieh · 2025-06-23T09:01:21Z

tests/models/mistral3/test_modeling_mistral3.py

-                        {"type": "image", "url": "https://huggingface.co/ydshieh/kosmos-2.5/resolve/main/view.jpg"},
+                        {
+                            "type": "image",
+                            "url": "https://huggingface.co/datasets/hf-internal-testing/testing-data-mistral3/resolve/main/view.jpg",


just move the files to a better hub repository

ydshieh · 2025-06-23T09:04:57Z

run-slow: mistral, mistral3

github-actions · 2025-06-23T09:06:20Z

This comment contains run-slow, running the specified jobs:

models: ['models/mistral', 'models/mistral3']
quantizations: [] ...

zucchini-nlp

Thanks, lgmt

zucchini-nlp · 2025-06-23T10:24:47Z

tests/models/mistral3/test_modeling_mistral3.py

+                ("cuda", 7): 'The image features two tabby cats lying on a pink surface, which appears to be a couch or',
+                ("cuda", 8): 'The image features two cats lying on a pink surface, which appears to be a couch or a bed',


hm, maybe we can set_seed or the order of test-cases matters because we use the same cls.model?

zucchini-nlp · 2025-06-23T10:27:25Z

tests/models/mistral/test_modeling_mistral.py

    def setUp(self):
        self.model_dtype = torch.float16
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        cleanup(torch_device, gc_collect=True)


Is that affected by assisted decoding or the link is wrong?

ydshieh · 2025-06-23T10:50:07Z

run-slow: mistral3

github-actions · 2025-06-23T10:51:36Z

This comment contains run-slow, running the specified jobs:

models: ['models/mistral3']
quantizations: [] ...

Cyrilvallez

Hey! In general, I think that we should abslutely avoid models as class attributes, as it will lead to many many potential issues! Not sure why it was added as class attribute without proper cleanup before, but I'd like to revert this if possible 🤗

Cyrilvallez · 2025-06-23T11:00:04Z

tests/models/mistral/test_modeling_mistral.py

+    @classmethod
+    def tearDownClass(cls):
+        del cls._model
+        cleanup(torch_device, gc_collect=True)


Oh woww for sure! IMO it's super bad practice to add it as a class attribute, can we switch it to instance attribute in setUpClass maybe? That should avoid several potential pitfalls, and we should not need tearDownCLass anymore I think (object should be cleaned up automatically)

Cyrilvallez · 2025-06-23T11:07:17Z

tests/models/mistral/test_modeling_mistral.py

    def setUp(self):
        self.model_dtype = torch.float16
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        cleanup(torch_device, gc_collect=True)


It's weird that we need this in setUp when we already have it in tearDown anyway 🤔 It is still the case when moving the model to instance attr instead of class attr?

ydshieh · 2025-06-23T12:44:58Z

run-slow: mistral3

github-actions · 2025-06-23T12:46:30Z

This comment contains run-slow, running the specified jobs:

models: ['models/mistral3']
quantizations: [] ...

ydshieh commented Jun 23, 2025

View reviewed changes

ydshieh requested a review from zucchini-nlp June 23, 2025 09:03

ydshieh requested a review from Cyrilvallez June 23, 2025 09:33

zucchini-nlp approved these changes Jun 23, 2025

View reviewed changes

Cyrilvallez reviewed Jun 23, 2025

View reviewed changes

ydshieh added 2 commits June 23, 2025 15:08

fix

87e9c8c

fix

01bd820

ydshieh force-pushed the fix_mistral branch from 3faf934 to 01bd820 Compare June 23, 2025 13:17

fix

f393061

ydshieh force-pushed the fix_mistral branch from b643ec4 to f393061 Compare June 23, 2025 14:23

fix

0861ddc

ydshieh merged commit 2ce02b9 into main Jun 23, 2025
15 checks passed

ydshieh deleted the fix_mistral branch June 23, 2025 15:07

		("cuda", 7): 'The image features two tabby cats lying on a pink surface, which appears to be a couch or',
		("cuda", 8): 'The image features two cats lying on a pink surface, which appears to be a couch or a bed',



		@require_torch_accelerator
		@require_read_token

Conversation

ydshieh commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jun 23, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jun 23, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jun 23, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydshieh commented Jun 23, 2025 •

edited

Loading

ydshieh Jun 23, 2025 •

edited

Loading

ydshieh Jun 23, 2025 •

edited

Loading

ydshieh Jun 25, 2025 •

edited

Loading