Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jun 4, 2024

What does this PR do?

Also,

        del model
        backend_empty_cache(torch_device)
        gc.collect()

this is not helping and worse, with it, we actually get GPU OOM (here for test_model_7b_long_prompt_sdpa). After removing them, the tests are all passing now.

111 passed, 48 skipped, 276 warnings in 205.46s (0:03:25)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# considering differences in hardware processing and potential deviations in output.
EXPECTED_SLICE = {
7: torch.tensor([-5.8781, -5.8616, -0.1052, -4.7200, -5.8781, -5.8774, -5.8773, -5.8777, -5.8781, -5.8780, -5.8781, -5.8779, -1.0787, 1.7583, -5.8779, -5.8780, -5.8783, -5.8778, -5.8776, -5.8781, -5.8784, -5.8778, -5.8778, -5.8777, -5.8779, -5.8778, -5.8776, -5.8780, -5.8779, -5.8781]),
7: torch.tensor([-5.8828, -5.8633, -0.1042, -4.7266, -5.8828, -5.8789, -5.8789, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -1.0801, 1.7598, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828, -5.8828]),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update in #29905 but forgot

def test_model_7b_generation(self):
EXPECTED_TEXT_COMPLETION = {
7: "My favourite condiment is 100% ketchup. I love it on everything. I'm not a big",
7: "My favourite condiment is 100% ketchup. Im not a fan of mustard, mayo,",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update in #29905 but forgot

Comment on lines -562 to -565
del model
backend_empty_cache(torch_device)
gc.collect()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not help and worse cause some GPU OOM in subsequent tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to have this deleted but very confused why this would cause OOM 😭

Copy link
Collaborator Author

@ydshieh ydshieh Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got to say I am confused too. torch.cuda.empty_cache is not really a magic

empty_cache() doesn’t increase the amount of GPU memory available for PyTorch.

but I was not expecting it would have undesired side-effect like this (even if it is not helpful).

I don't check if del model and gc.collect() plays a role here though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity and keep info here for the record:

  • it is test_model_7b_long_prompt gets OOM.
    • previously with those empty cache, at the beginning of test_model_7b_long_prompt, nvidia-smi shows 150MiB / 15360MiB which looks nice but we get OOM afterward inside this test
    • wihout empty cache, nvidia-smi shows 9066MiB/ 15360MiB which looks not great but we DON'T get OOM afterward inside this test

It's very mysterious to me.

# considering differences in hardware processing and potential deviations in generated text.
EXPECTED_TEXT_COMPLETION = {
7: "My favourite condiment is 100% Sriracha. I love the heat, the tang and the fact costs",
7: "My favourite condiment is 100% ketchup. I love it on everything. I’m not a big",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see PR description

@ydshieh ydshieh requested a review from amyeroberts June 4, 2024 14:13
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Comment on lines -562 to -565
del model
backend_empty_cache(torch_device)
gc.collect()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to have this deleted but very confused why this would cause OOM 😭

@ydshieh ydshieh merged commit fd3238b into main Jun 4, 2024
@ydshieh ydshieh deleted the fix_mistral_int branch June 4, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants