Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Mar 27, 2024

What does this PR do?

This PR is the same as #29805 with some final changes. #29805 is closed, and we will merge this one without waiting @younesbelkada being back to the office.

I rebased the PR on a more recent main.

No need to go over again I think. Just check this comment (if you think necessary)

@ydshieh ydshieh force-pushed the fix-slow-tests-shieh branch from d8348e6 to 54c6918 Compare March 27, 2024 14:16
@ydshieh ydshieh changed the title Fix slow tests shieh Fix slow tests for important models to be compatible with A10 runners Mar 27, 2024
@require_flash_attn
@require_torch_gpu
@pytest.mark.flash_attn_test
@is_flaky
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comment here, my response is

I will open an issue but leave @younesbelkada to fill more details on the issue page.

# 8 is for A100 / A10 and 7 for T4
cls.cuda_compute_capability_major_version = torch.cuda.get_device_capability()[0]

@require_read_token
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a better name, cc @younesbelkada

See #29805 (comment)

@ydshieh
Copy link
Collaborator Author

ydshieh commented Mar 27, 2024

Although it is newly opened (despite with tiny changes), formally let me get a ✅ approval before merge.

here are the changes: mostly, they are just some name changes

1
2
3

@ydshieh ydshieh requested a review from amyeroberts March 27, 2024 14:36
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh requested review from ArthurZucker and removed request for amyeroberts March 28, 2024 09:23
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both 🚀

"mistralai/Mistral-7B-v0.1",
device_map="auto",
attn_implementation="sdpa",
"mistralai/Mistral-7B-v0.1", device_map="auto", attn_implementation="sdpa", torch_dtype=torch.float16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no torch_device here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because we set "auto" and in this case, no need to set torch_device.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch_device for the recent updates done by a contributor to support multiple devices when testing!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydshieh ydshieh requested a review from ArthurZucker March 29, 2024 09:46
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright but let's make sure CIs are green please

@ydshieh ydshieh force-pushed the fix-slow-tests-shieh branch from c684692 to effc252 Compare April 3, 2024 13:20
@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 3, 2024

The only failing test is

FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_generate_with_prompt_ids_max_length - IndexError: index -1 is out of bounds for dimension 1 with size 0

which is irrelevant to this PR, and #30018 is on it.

@ydshieh ydshieh requested a review from ArthurZucker April 3, 2024 14:25
@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 3, 2024

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me as long as we fix the remaining tests

@younesbelkada
Copy link
Contributor

tests are all green: https://github.com/huggingface/transformers/actions/runs/8613471616/job/23604832895

@ydshieh ydshieh force-pushed the fix-slow-tests-shieh branch from 19a6a07 to 71debdb Compare April 9, 2024 10:30
@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 9, 2024

Thank you @younesbelkada again!

@ydshieh ydshieh merged commit 08a194f into main Apr 9, 2024
@ydshieh ydshieh deleted the fix-slow-tests-shieh branch April 9, 2024 11:28
Comment on lines 686 to +690
def test_compile_static_cache(self):
NUM_TOKENS_TO_GENERATE = 40
EXPECTED_TEXT_COMPLETION = [
"Simply put, the theory of relativity states that 1) the speed of light is constant, 2) the speed of light is the same for all observers, and 3) the laws of physics are the same for all observers.",
"My favorite all time favorite condiment is ketchup. I love it on everything. I love it on my eggs, my fries, my chicken, my burgers, my hot dogs, my sandwiches, my salads, my p",
]
EXPECTED_TEXT_COMPLETION = {
7: [
"Simply put, the theory of relativity states that 1) the speed of light is constant, 2) the speed of light is the same for all observers, and 3) the laws of physics are the same for all observers.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails on A100

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the A10 runners with the transformers-all-latest-gpu docker image it passed, might be an env issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the ref is probably good for A10G, but not A100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants