Skip to content

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Jun 13, 2024

What does this PR do?

This PR is to enable the TestDeepSpeedModelZoo tests to work on a non-CUDA accelerator.

@amyeroberts and @ydshieh

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@amyeroberts
Copy link
Contributor

Let's get a second 👍 from @ydshieh the test master to confirm this is OK on our CI runs

@faaany
Copy link
Contributor Author

faaany commented Jun 13, 2024

Let's get a second 👍 from @ydshieh the test master to confirm this is OK on our CI runs

Sure!

@faaany
Copy link
Contributor Author

faaany commented Jun 14, 2024

I further updated my code to get the real device count. But the CI seems to not use the latest transformers version. @ydshieh is this something intended?

@ydshieh ydshieh self-assigned this Jun 14, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 14, 2024

@faaany . It uses the latest dev. transformers version (i.e. this PR).

The problem is that, some variables like BACKEND_DEVICE_COUNT is only defined when torch is available. But we have other CI environment where only Tensorflow or only Flax is installed without torch.

Let me check what to do with it.

if is_torch_available():
    # Mappings from device names to callable functions to support device agnostic
    # testing.
    BACKEND_MANUAL_SEED = {"cuda": torch.cuda.manual_seed, "cpu": torch.manual_seed, "default": torch.manual_seed}
    BACKEND_EMPTY_CACHE = {"cuda": torch.cuda.empty_cache, "cpu": None, "default": None}
    BACKEND_DEVICE_COUNT = {"cuda": torch.cuda.device_count, "cpu": lambda: 0, "default": lambda: 1}

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 14, 2024

Could you add

if is_torch_available():
    # Mappings from device names to callable functions to support device agnostic
    # testing.
    BACKEND_MANUAL_SEED = {"cuda": torch.cuda.manual_seed, "cpu": torch.manual_seed, "default": torch.manual_seed}
    BACKEND_EMPTY_CACHE = {"cuda": torch.cuda.empty_cache, "cpu": None, "default": None}
    BACKEND_DEVICE_COUNT = {"cuda": torch.cuda.device_count, "cpu": lambda: 0, "default": lambda: 1}

to

if is_torch_available():
    # Mappings from device names to callable functions to support device agnostic
    # testing.
    BACKEND_MANUAL_SEED = {"cuda": torch.cuda.manual_seed, "cpu": torch.manual_seed, "default": torch.manual_seed}
    BACKEND_EMPTY_CACHE = {"cuda": torch.cuda.empty_cache, "cpu": None, "default": None}
    BACKEND_DEVICE_COUNT = {"cuda": torch.cuda.device_count, "cpu": lambda: 0, "default": lambda: 1}
else:
    BACKEND_MANUAL_SEED = {"default": None}
    BACKEND_EMPTY_CACHE = {"default": None}
    BACKEND_DEVICE_COUNT = {"default": lambda: 0}

and see how it goes? 🙏 thank you in advance

@faaany
Copy link
Contributor Author

faaany commented Jun 14, 2024

@ydshieh yes, it works, thx for the suggestion! The CI still doesn't run through, but I think it is not caused by this PR.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 14, 2024

Yeah, rebase on main to include #31407 would make CI green.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh merged commit 9454f43 into huggingface:main Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants