Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run only CUDA tests on Azure GPU CI #13651

Merged
merged 15 commits into from
Jul 15, 2022
Merged

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Jul 14, 2022

What does this PR do?

Run only the CUDA tests on Azure GPU CI. Just as we do for TPUs, IPUs, and HPUs CI jobs.
This should slash the runtime considerably.

This logically reduces the coverage as CPU-only code could fail if run on a GPU environment. But this tradeoff is currently worth it given our limitations in GPU testing.

Reduces the "Standard" tests step from GPU ci from ~24' to ~10'

Discussed internally.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • [n/a] Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @carmocca @akihironitta @Borda

@carmocca carmocca added the ci Continuous Integration label Jul 14, 2022
@carmocca carmocca added this to the pl:1.7 milestone Jul 14, 2022
@carmocca carmocca self-assigned this Jul 14, 2022
@Borda
Copy link
Member

Borda commented Jul 14, 2022

how about test that is written such uses max(CPU, GPU) to reduce code duplication...
in fact, thinking that all tests that are running any form or training shall be exploring both

how about #13664?

@carmocca
Copy link
Contributor Author

carmocca commented Jul 15, 2022

@Borda Those tests will no longer run if they don't have the @RunIf(min_cuda_gpus=N) decorator. I don't think the changes in #13664 are a bad idea but there must be a mechanism to mark which tests run under each configuration. So your PR would become "useless" CI-wise after this is merged.

@mergify mergify bot removed the has conflicts label Jul 15, 2022
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it ! This is a great step.

@mergify mergify bot added the ready PRs ready to be merged label Jul 15, 2022
@carmocca carmocca merged commit 8355ba1 into master Jul 15, 2022
@carmocca carmocca deleted the ci/azure-runs-only-cuda-tests branch July 15, 2022 11:51
@Borda
Copy link
Member

Borda commented Jul 15, 2022

Those tests will no longer run if they don't have the @RunIf(min_cuda_gpus=N) decorator. I don't think the changes in #13664 are a bad idea but there must be a mechanism to mark which tests run under each configuration. So your PR would become "useless" CI-wise after this is merged.

I see the motivation for this PR and agree no need to run tests using only CPU on GPU machine but I think that for robustness we shall all push to use instance maximal potential... that is also why @kaushikb11 started #12482 and similar for TPU and IPU...
Said so I do not think this merge is moving anyway towards robust testing... :(

@carmocca
Copy link
Contributor Author

#12482 does exactly what this PR does, but for HPUs. Collect automatically all HPU tests and run them exclusively.

no need to run tests using only CPU on GPU machine

This is what this PR does

push to use instance maximal potential

This PR does not go against this idea. It just filters out the jobs that don't require CUDA

@Borda
Copy link
Member

Borda commented Jul 15, 2022

This PR does not go against this idea. It just filters out the jobs that don't require CUDA

true, so the way out would be to introduce a flag for tests to run everywhere?

@carmocca
Copy link
Contributor Author

Depends on what we want. For example, we could have a PL_RUN_ALL_TESTS=1 that nullifies all the other environment variables. But we don't have any job currently where we would need it.

@Borda
Copy link
Member

Borda commented Jul 15, 2022

Depends on what we want. For example, we could have a PL_RUN_ALL_TESTS=1 that nullifies all the other environment variables. But we don't have any job currently where we would need it.

I would do it vise-versa, by default, run all and if some short-run set env variable

@carmocca
Copy link
Contributor Author

Good point: #13676

@carmocca carmocca mentioned this pull request Jul 19, 2022
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants