Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use conda to build python packages during GPU tests #897

Merged
merged 1 commit into from
May 13, 2022

Conversation

jjacobelli
Copy link
Contributor

@jjacobelli jjacobelli commented Apr 21, 2022

This PR convert the from sources build we are doing in GPU test job to a conda build. This is done for the following reasons:

  • This is required step to improve the Ops CI/CD setup to a more convenient pipeline
  • This is required to start using conda compilers and mamba to build RAPIDS packages
  • This prevent us from manually managing and installing the dependencies in GPU job
  • This ensure the packages can be installed
  • This ensure the tests are running and working against the package content and not the build results. Currently the Python packages are not tested.

This may increase the global pipeline time, but the usage of mamba should resolve this as mamba is faster than conda to build packages

@jjacobelli jjacobelli added 5 - DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 21, 2022
@jjacobelli jjacobelli self-assigned this Apr 21, 2022
@github-actions github-actions bot added the gpuCI gpuCI issue label Apr 21, 2022
@pentschev
Copy link
Member

Is there some advantage to doing this, or is it just to test conda packaging as well?

@jjacobelli
Copy link
Contributor Author

jjacobelli commented Apr 21, 2022

Is there some advantage to doing this, or is it just to test conda packaging as well?

That's one of the advantages. We are also doing this to remove the usage of the rapids-build-env package and to start using conda compilers with mamba to build packages. I will update the PR body with all of this before merging it

@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from b6360f6 to a7e30b3 Compare April 25, 2022 09:33
@jjacobelli jjacobelli removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Apr 25, 2022
@jjacobelli jjacobelli marked this pull request as ready for review April 25, 2022 09:38
@jjacobelli jjacobelli requested a review from a team as a code owner April 25, 2022 09:38
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.06@c4d542b). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-22.06    #897   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?      22           
  Lines                   ?    3075           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?    3075           
  Partials                ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4d542b...0d5b7b6. Read the comment docs.

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

The timeout from dask_cuda.tests.test_spill.test_cudf_cluster_device_spill[params0] is being dealt with in #901 .

@pentschev
Copy link
Member

rerun tests

1 similar comment
@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

This many timeouts doesn't look normal, even the timeout that was increased in #901 was hit here. @Ethyling could you add conda list --show-channels after gpuci_mamba_retry install dask-cuda so I can compare the packages being picked here vs what was being picked before?

@jjacobelli
Copy link
Contributor Author

This many timeouts doesn't look normal, even the timeout that was increased in #901 was hit here. @Ethyling could you add conda list --show-channels after gpuci_mamba_retry install dask-cuda so I can compare the packages being picked here vs what was being picked before?

@pentschev PR updated with this

@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from d65e32f to 0d5b7b6 Compare April 27, 2022 11:08
@pentschev
Copy link
Member

I have been trying to reproduce timeouts on CUDA 11.0, driver 450.80.02, just like in CI, for days but have been unsuccessful. I can only imagine at this point that there is something related to the CI machines that make it just a tiny bit slower, in which case we may try to increase timeouts. I will run tests a few more times in CI here to see how it behaves and see what are the tests that may need increased timeouts.

@pentschev
Copy link
Member

rerun tests

@pentschev
Copy link
Member

Tests that failed in latest run are now resolved by #905, rerunning.

@pentschev
Copy link
Member

rerun tests

1 similar comment
@pentschev
Copy link
Member

rerun tests

@jjacobelli
Copy link
Contributor Author

Three successful runs in a row. At this point I would assume we had a hiccup in CI when this was running before and would suggest we move forward as is.

Nice, thank you @pentschev!

@Ethyling is there anything more that needs to be done here or should we consider reviewing/merging this PR?

I prefer to wait that all the PRs pass for all the RAPIDS projects before merging it

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a question.

ci/gpu/build.sh Outdated Show resolved Hide resolved
ci/gpu/build.sh Outdated Show resolved Hide resolved
@jjacobelli jjacobelli force-pushed the remove-from-sources branch 2 times, most recently from 772733a to b0024db Compare May 10, 2022 11:20
@github-actions github-actions bot added the conda conda issue label May 10, 2022
@@ -10,7 +10,7 @@ package:
version: {{ version }}

source:
path: ../../..
git_url: ../../..
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is required to prevent this issue:

AssertionError: Can't merge/copy source into subdirectory of itself.  Please create separate spaces for these things.
  src: /workspace
  dst: /workspace/.conda-bld/work

@jjacobelli
Copy link
Contributor Author

rerun tests

1 similar comment
@jjacobelli
Copy link
Contributor Author

rerun tests

@pentschev
Copy link
Member

Failing test is due to dask/distributed#6320 , which was caused by dask/distributed#5910 merge.

@jjacobelli
Copy link
Contributor Author

Failing test is due to dask/distributed#6320 , which was caused by dask/distributed#5910 merge.

Do you know if this will be fixed soon?

@pentschev
Copy link
Member

Sorry @Ethyling , I don't know if that is going to be fixed soon. I marked the test to xfail in #908 to unblock CI for now.

@pentschev
Copy link
Member

rerun tests

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work here @Ethyling .

@pentschev
Copy link
Member

Feel free to merge if this is ready to go.

@jjacobelli
Copy link
Contributor Author

Thank you for your help with this PR @pentschev!

@jjacobelli
Copy link
Contributor Author

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda conda issue gpuCI gpuCI issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants