-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port to rattler-build #1796
base: branch-25.04
Are you sure you want to change the base?
port to rattler-build #1796
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
9a224dd
to
c79968f
Compare
conda/recipes/librmm/meta.yaml
Outdated
{% if cuda_major != "11" %} | ||
- cuda-cudart-dev | ||
{% endif %} | ||
- {{ "cuda-cudart-dev" if cuda_major == "12" else "cuda-version" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conda-recipe-manager
doesn't like !=
as a comparison operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could do not
and ==
s?
- {{ "cuda-cudart-dev" if cuda_major == "12" else "cuda-version" }} | |
- {{ "cudatoolkit" if not (cuda_major == "12") else "cuda-version" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to write this in a way that only mentions CUDA 11, so that the condition can be trivially deleted when adding future major version support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was the idea. I just goofed on the syntax. Took another go below
version: ${{ env.get("RAPIDS_PACKAGE_VERSION") }} | ||
cuda_version: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[:2] | join(".") }} | ||
cuda_major: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[0] }} | ||
date_string: ${{ env.get("RAPIDS_DATE_STRING") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grabbed these from rapidsai/cugraph#4551 as currently conda-recipe-manager
doesn't have support for handling converting from the extended jinja2 syntax to the subset that rattler supports
Co-authored-by: jakirkham <[email protected]>
Co-authored-by: jakirkham <[email protected]>
5af3309
to
be8c03f
Compare
looks like adding `$CPP_CHANNEL` to the mix overrides the existing channels
0687f82
to
33e7351
Compare
Ok, now
|
a9067b2
to
c8a4920
Compare
c8a4920
to
a9067b2
Compare
Thanks Gil! 🙏 How is channel priority handled in those cases? |
It's disabled for both |
python/rmm/rmm/librmm/*.cpp | ||
!python/rmm/rmm/librmm/_torch_allocator.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that rattler-build
does respect gitignore
negations, but the unignored locationw as incorrect, which is why the file wasn't included in the builds.
ci/build_cpp.sh
Outdated
# These are probably set via `rapids-configure-conda-channels` | ||
# -c rapidsai \ | ||
# -c conda-forge \ | ||
# -c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are set via rapids-configure-conda-channels
BUT, for the python builds where we also specify ${CPP_CHANNEL}
that includes an implicit --override-channels
, so we need to list them explicitly
-c "${CPP_CHANNEL}" \ | ||
-c rapidsai \ | ||
-c rapidsai-nightly \ | ||
-c conda-forge \ | ||
-c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, adding ${CPP_CHANNEL}
adds an implicit --override-channels
so we need to list them inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels unexpected. Should we report this upstream as a bug, or at least as something that should be documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll report it upstream -- it feels like a bug, but it might be a conscious break from some implicit conda
behavior, but in that case, documentation is certainly merited.
Co-authored-by: jakirkham <[email protected]>
GIT_DESCRIBE_HASH: ${{ env.get("GIT_DESCRIBE_HASH") }} | ||
GIT_DESCRIBE_NUMBER: ${{ env.get("GIT_DESCRIBE_NUMBER") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are set by rapids-configure-rattler
conda/recipes/rmm/recipe.yaml
Outdated
- if: cuda_major == "11" | ||
then: | ||
- ${{ compiler('cuda') }} =${{ cuda_version }} | ||
else: | ||
- ${{ compiler('cuda') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakirkham -- do you think we can collapse this down into just
- if: cuda_major == "11" | |
then: | |
- ${{ compiler('cuda') }} =${{ cuda_version }} | |
else: | |
- ${{ compiler('cuda') }} | |
- ${{ compiler('cuda') }} |
given the constraint on the line below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can. Let's do it 👍
Thanks for continuing to whittle this down Gil! 🙏
constrained by `cuda_version`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! While we are rewriting our recipes already, I think we should take this opportunity to consider some additional changes and simplifications. I've left some notes inline.
For most of our recipes, we'll want to use the cache key that I have set
up in librmm, otherwise each output is built as a separate recipe so you
end up compiling everything N times
By this, I assume you're not referring to the sccache ccache key but rather the rattler-build multi-output cache? If so, yes that is a must. It is the only way that I found to ensure a single compilation. Fortunately that feature has been released as fully-featured at this point.
One debate that we might as well have now on the first conversion: do we like sticking with conda_build_config.yaml, or should we move that file to variants.yaml? Personally I find the latter name much clearer, whereas the relationship of the former to variants was not clear until I read conda-build documentation. Also it's a clear sign of the changes needed for rattler-build. That being said, it's not strictly necessary since rattler-build ]does respect conda_build_config.yaml by default](https://rattler.build/latest/variants/#automatic-discovery) (unlike meta.yaml, which you would have to specify explicitly). If we've already discussed this elsewhere and decided not to make the change, feel free to ignore this.
source rapids-configure-rattler | ||
|
||
rattler-build build --recipe conda/recipes/librmm \ | ||
--experimental \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this? I don't think we are using any experimental features any more, but please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the multi-output cache
is still gated behind the --experimental
flag (on 0.35.9 which I believe is the latest release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK got it. I was basing my statement on the fact that their docs no longer list the cache on their experimental features page and instead on its own. Either an oversight in their docs or maybe it'll be moved out in the next release.
|
||
rattler-build build --recipe conda/recipes/librmm \ | ||
--experimental \ | ||
--no-build-id \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave a comment pointing to https://rattler.build/latest/tips_and_tricks/#using-sccache-or-ccache-with-rattler-build for why we need this?
-c "${CPP_CHANNEL}" \ | ||
-c rapidsai \ | ||
-c rapidsai-nightly \ | ||
-c conda-forge \ | ||
-c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels unexpected. Should we report this upstream as a bug, or at least as something that should be documented?
version: ${{ version }} | ||
build: | ||
number: ${{ GIT_DESCRIBE_NUMBER }} | ||
string: cuda${{ cuda_major }}_${{ date_string }}_${{ GIT_DESCRIBE_HASH }}_${{ GIT_DESCRIBE_NUMBER }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good time to reconsider the build string and build number.
- Do we really need to include the hash or the number? The number is now part of the package version itself via
rapids-generate-version
, while the commit hash is written by the build backend. Admittedly, the latter does not apply to C++ libraries, so I suppose some information is lost, but in theory that information is largely redundant to the git describe number (which is in the version). I created both the versioning strategy and the build backend well after these build strings were put in place and didn't question them then, but perhaps now is a good time to do so? - We don't need to specify a build number above anymore because the versioning scheme guarantees a unique version for each new commit, so we could drop that as well.
If either of these changes caused problems in any repos, that would almost certainly be an indication of bugs like rapidsai/kvikio#616 where the versioning scheme was not being correctly applied, so that would be a win too.
Dropping these would also obviate the need for rapids-configure-rattler
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove them and test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with removing the GIT_DESCRIBE_NUMBER
from the build string as long as it stays in the alpha version number. That is redundant.
I do not want to remove the GIT_DESCRIBE_HASH
. The GIT_DESCRIBE_HASH
is very useful for debugging other developers' conda environments because we can see what commit their packages were built from, and we can determine if their build is expected to have a certain feature or not. Otherwise we are stuck trying to use GIT_DESCRIBE_NUMBER
to count how many commits their build was from the last tag...
I think the hash also provides a nice confirmation that releases were built correctly from the tagged commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdice I think that's reasonable. Are you good with removing the build number? Given that we basically never rebuild a particular build, the current behavior feels very wrong to me: for a release YY.MM.00aN
, it will always be released with build number N. We'll never see builds 0
-(N-1)
.
As long as the only piece of information that we need is the git commit hash, and given that we are keeping the experimental flag on anyway based on the above thread, I wonder if we could make use of rattler's new git functions to get the hash. At surface level they seem to have a problem, which (if I'm understanding the docs right) is that they will always pull the latest commit hash for the head of the repo, so when you're building during a PR they hash will be wrong. More concerning is that the hash could be wrong during burndown when we're simultaneously building two different release branches; the commit hash rattler chooses might be the wrong one half the time. However, most git operations will accept local filesystem paths, so if we add
head_rev: ${{ git.head_rev( "." ) }}
to our context section we may be able to get the git hash from the local checkout and be all set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested this in the rapids-logger recipe and it works.
|
||
build: | ||
script: | ||
file: build.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we double-check the contents of each of the different output packages and make sure that we're putting what we expect in each package? IIRC when I experimented with rattler-build the way the cache worked was that when you restored files from the cache it dropped everything into the PREFIX and you had to do more careful filtering of each output to ensure that the files were present. I don't remember why I couldn't just leverage a cmake --install
command for each output, but I think the issue was that the build directory would no longer be available between outputs and only what was restored to the PREFIX from the cache was available. Is that no longer the case?
Looking at some of the job build outputs, it does seems like the right things are getting packaged into each output, which is encouraging, but it would be good to download the packages and inspect them to be doubly sure.
While looking at the outputs, I also noticed that the entire build cache is being uploaded to s3 as well by the rapids-upload-to-s3 command. If possible it would be good to filter that out somehow (or simply delete that directory entirely before the upload) to reduce network traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered the contents bit below in a separate comment -- in re:
While looking at the outputs, I also noticed that the entire build cache is being uploaded to s3 as well by the rapids-upload-to-s3 command. If possible it would be good to filter that out somehow (or simply delete that directory entirely before the upload) to reduce network traffic.
I will do that
- python: | ||
imports: | ||
- rmm | ||
pip_check: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this just fail outright for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, yes -- the listed dependency bound in rmm/pyproject.toml
is
dependencies = [
"cuda-python>=11.8.5,<12.0a0",
"numpy>=1.23,<3.0a0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
which fails when run against the cuda12 build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably invoke rapids-build-backend
or rapids-dependency-file-generator
to write out the appropriate dependencies before we build the package. We should try to pass pip check
, because it gives us a (strong? weak?) cross-check that dependencies.yaml
is compatible with recipe.yaml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure that I really understand what pip check is doing here. Is it actually looking at the contents of pyproject.toml? I would have expected that it is only looking at installed packages, i.e. the Requires-Dist metadata keys in the installed package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much all of the comments from the other recipe also apply here so let's resolve those discussions then we can apply the same changes (or not) to this one.
Thanks for the detailed review @vyasr ! Still working through some of it, but to this point:
I believe that the packages contain what's expected, but I'm still pretty new to these packages, so I'll post the contents for others to inspect at their leisure.
|
@gforsyth The package contents look right to me. I'm not aware of anything missing but it could be helpful to do a direct comparison against the existing nightlies and see if any paths are different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another round of comments -- great work so far!
cuda_version: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[:2] | join(".") }} | ||
cuda_major: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[0] }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: let's normalize everything to use double-quotes.
cuda_version: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[:2] | join(".") }} | |
cuda_major: ${{ (env.get('RAPIDS_CUDA_VERSION') | split('.'))[0] }} | |
cuda_version: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[:2] | join(".") }} | |
cuda_major: ${{ (env.get("RAPIDS_CUDA_VERSION") | split("."))[0] }} |
SCCACHE_REGION: ${{ env.get("SCCACHE_REGION", default="") }} | ||
SCCACHE_S3_USE_SSL: ${{ env.get("SCCACHE_S3_USE_SSL", default="") }} | ||
SCCACHE_S3_NO_CREDENTIALS: ${{ env.get("SCCACHE_S3_NO_CREDENTIALS", default="") }} | ||
SCCACHE_S3_KEY_PREFIX: librmm-${{ env.get("RUNNER_ARCH", default="X64") | replace("X64", "linux64") | replace("ARM64", "aarch64") | lower }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a RAPIDS_ARCH
(or RAPIDS_CONDA_ARCH
) variable into our CI containers so we aren't tying this to GitHub Actions implementation details (GHA specifies RUNNER_ARCH
but local reproductions of CI do not have it defined!). I ran into problems when reproducing CI locally for cugraph-gnn because RUNNER_ARCH wasn't defined.
If we added RAPIDS_ARCH
(or RAPIDS_CONDA_ARCH
) then we can cleanly solve this problem and minimize the logic here to just librmm-${{ env.get("RAPIDS_ARCH") }}
.
I think there should not be a default -- we require RAPIDS_CUDA_VERSION
so we can make this a hard requirement, too.
version: ${{ version }} | ||
build: | ||
number: ${{ GIT_DESCRIBE_NUMBER }} | ||
string: cuda${{ cuda_major }}_${{ date_string }}_${{ GIT_DESCRIBE_HASH }}_${{ GIT_DESCRIBE_NUMBER }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with removing the GIT_DESCRIBE_NUMBER
from the build string as long as it stays in the alpha version number. That is redundant.
I do not want to remove the GIT_DESCRIBE_HASH
. The GIT_DESCRIBE_HASH
is very useful for debugging other developers' conda environments because we can see what commit their packages were built from, and we can determine if their build is expected to have a certain feature or not. Otherwise we are stuck trying to use GIT_DESCRIBE_NUMBER
to count how many commits their build was from the last tag...
I think the hash also provides a nice confirmation that releases were built correctly from the tagged commit.
- fmt ${{ fmt_version }} | ||
- spdlog ${{ spdlog_version }} | ||
run: | ||
- ${{ pin_compatible('cuda-version', upper_bound='x', lower_bound='x') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inconsistent with double/single quotes in lots of places. Let's use double quotes in the general case -- unless we are wrapping single quotes (or if there are any other behavioral oddities).
- if: cuda_major == "11" | ||
then: cudatoolkit | ||
else: cuda-cudart-dev | ||
- fmt ${{ fmt_version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware that #1808 will require changes here.
- cython >=3.0.0 | ||
- rapids-build-backend >=0.3.0,<0.4.0.dev0 | ||
- librmm =${{ version }} | ||
- python >=3.7,<3.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa! RAPIDS requires >=3.10,<3.13
right now.
But shouldn't this be constrained by something else? I think python
is special and knows to use the current Python version in host
, which should also generate appropriate run-exports iirc. Can we just write python
as we did in the previous recipe?
- python >=3.7,<3.12 | |
- python |
- ${{ pin_compatible('cuda-version', upper_bound='x', lower_bound='x') }} | ||
- numba >=0.59.1,<0.61.0a0 | ||
- numpy >=1.23,<3.0a0 | ||
- python >=3.7,<3.12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this bound. See above.
- python >=3.7,<3.12 | |
- python |
- cuda-python | ||
- if: not (cuda_major == "11") | ||
then: "cuda-cudart-dev" | ||
else: "cuda-version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No "else" needed here.
else: "cuda-version" |
- python: | ||
imports: | ||
- rmm | ||
pip_check: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably invoke rapids-build-backend
or rapids-dependency-file-generator
to write out the appropriate dependencies before we build the package. We should try to pass pip check
, because it gives us a (strong? weak?) cross-check that dependencies.yaml
is compatible with recipe.yaml
.
license: Apache-2.0 | ||
license_family: Apache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has license_family
, which Rattler's docs say has been dropped in favor of license
being an SPDX identifier.
Similarly, license_file
might be okay to drop?
Let's pick a standard set of fields and make them match in all recipes/packages.
Also let's make sure the order of these fields is consistent (alphabetical? following the order in the docs?) -- homepage
is first and summary
is last here, but it's different in librmm.
I was going to suggest exactly the same thing. |
tests: | ||
- script: | ||
- "test -d \"${PREFIX}/include/rmm\"" | ||
about: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's somewhat out of scope for this PR, but also not really since one reason I never considered this sooner was because I wasn't sure what rattler would support by the time that we got here: should we try using load_from_file
to get this information from the pyproject.toml file now? rattler-build has this functionality now. It's pretty low-priority but given all of the other cleanup that we're doing in this PR it might be nice to really single-source .
Some notes on progress and changes needed:
GIT_DESCRIBE_HASH
andGIT_DESCRIBE_NUMBER
aren't supported, so thoseneed to be set in the environment.
rapids-configure-rattler
ingha-tools
For most of our recipes, we'll want to use the cache key that I have set
up in librmm, otherwise each output is built as a separate recipe so you
end up compiling everything N times
I suspect it's faster to port over recipes by hand, rather than with
conda-recipe-manager convert
. There's a fair bit of preparation required to make the meta.yaml files compatible in the first place, and failures can be hard to diagnoseminijinja
syntax!=
)conda-recipe-manager
doesn't support generating multi-output recipes at this point (although it can parse them)xref: rapidsai/build-planning#47