-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] rmm v25.02 #1807
Open
AyodeAwe
wants to merge
42
commits into
main
Choose a base branch
from
branch-25.02
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[RELEASE] rmm v25.02 #1807
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-24.12 into branch-25.02
Forward-merge branch-24.12 into branch-25.02
…r libraries (#1722) This PR defines a new way to produce a logger wrapping spdlog. The logger's interface is declared in a template header file that can be processed by CMake to produce an interface that may be customized for placement into any project. The new implementation uses the PImpl idiom to isolate the spdlog (and transitively, fmt) dependency from the public API of the logger. The implementation is defined in an impl header. A corresponding source template file is provided that simply includes this header. All of these files are wrapped in some CMake logic for producing a custom target for a given project. rmm leverages this new logger by requesting the creation of a logger target and a corresponding implementation. This is a breaking change because consumers of rmm will need to link the new `rmm_logger_impl` target into their own libraries to get logging support. Once this gets merged, the plan is to move this implementation out of rmm into its own repository. At that point, the logger may also be used to completely replace logger implementations in cudf, raft, and cuml (as well as any other RAPIDS libraries that are aiming to provide their own logging implementation). Once everything in RAPIDS is migrated to using the new logger, we will update the way that it uses spdlog to completely hide all spdlog symbols, which solves a half dozen different problems for us when it comes to packaging (symbol collision issues, ABI compatibility, conda environment conflicts, bundling of headers into conda packages, etc). Resolves #1709 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Mark Harris (https://github.com/harrism) - Jake Awe (https://github.com/AyodeAwe) URL: #1722
* add breaking change notifier [skip ci] * test commit * use target
By default, CI runs on draft PRs. This leads to many CI runs that may be unnecessary. With this PR's change to `.github/copy-pr-bot.yaml`, an `/ok to test` comment from a trusted user is required to trigger CI on draft PRs. Non-draft PRs will run CI by default, assuming that all commits are signed by trusted users. Otherwise an `/ok to test` is required (as before) -- see the `copy-pr-bot` docs at https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/ for more information. Part of rapidsai/build-planning#123. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1737
It looks like while #1722 introduced usage of the modern target_link_libraries syntax it did not adjust all other calls because I wasn't setting up coverage usage locally or anywhere else in CI. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Robert Maynard (https://github.com/robertmaynard) URL: #1738
This fixes a handful of issues uncovered in downstream CI after #1722. The following are bugs introduced in #1722: - When using rmm from the build directory rather than an installation, the namespaced targets are not present so we must generate aliases. - The `RMM_LOGGING_ASSERT` macro is never used in rmm itself, so we didn't catch that it was still using the old version of the logger. While fixing the above, I also uncovered that building fmt in this environment unearths a gcc bug. The following are underlying issues uncovered by #1722: - spdlog's fmt CMake linkage is determined at build time. As a result, the conda package for spdlog is hardcoded to use fmt as a library (static or shared depends on what the `fmt::fmt` target winds up being when a consumer using spdlog finds fmt in CMake), which means that is propagated to all consumers of the librmm package via its CMake. This means that we often wind up with both fmt_header_only and fmt as link targets for many RAPIDS libraries. For now, this PR makes it so that if `rapids_cpm_find(spdlog)` does not find a copy of spdlog locally, the cloned version will use an external header-only fmt via rapids-cmake's logic, which ensures that packages like wheels do not export a libfmt or libspdlog dependency. However, in environments where `rapids_cpm_find(spdlog)` does find a preexisting package, we allow that package's fmt linkage to propagate. In conda environments, we know that this fmt linkage is to the library, so we keep fmt as part of rmm's runtime dependencies (by placing it in host and relying on the run export) so that libfmt is always available in environments using rmm. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1739
Now that some upstream bugs have been fixed, we can allow cuda-python 12.6.2 and 11.8.5. See NVIDIA/cuda-python#226 (comment) for more information. Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1729
This PR applies some minor README revisions, like fixing outdated links. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Mark Harris (https://github.com/harrism) URL: #1747
Resolves #1742 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #1749
We require a newer cuda-python lower bound for new features and to use the new layout. This will fix a number of errors observed when the runtime version of cuda-python is older than the version used to build packages using Cython features from cuda-python. See rapidsai/build-planning#117 (comment) for details. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1751
…esource (#1743) This PR adds a new `fabric` handle type in `allocation_handle_type`. It also adds an optional `access_flags` to set the memory access desired when exporting (`prot_none`, or `prot_read_write`). Pools that are not meant to be shareable should omit these flags. Please note that I can't add a unit test that exports or imports these fabric handles, because it would require system setup that doesn't look to be portable. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Rong Ou (https://github.com/rongou) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) URL: #1743
This PR adds configuration for pre-commit.ci, reformats the pre-commit config file, and updates pre-commit hooks. See rapidsai/build-planning#124 for the motivation. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) - Matthew Murray (https://github.com/Matt711) - Jake Awe (https://github.com/AyodeAwe) URL: #1746
We're now pre-installing `wheel` in the CI images: rapidsai/ci-imgs#215 This proposes removing a `pip install wheel` in CI here... fewer network requests = fewer random CI failures 😁 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1748
Closes #1753 It is a follow up from #1743 I would like for rapidsai/cudf#17553 to merge first, that way I don't break the build. I've learned that I was using `cudaMemPoolSetAccess` incorrectly. This API should only be used from a `peer` device, not from the device that created the pool. This is the reason why calling `cudaMemPoolSetAccess` with none throws an error as documented here #1753. I have tested that I can still export the fabric handles and import them using UCX in a peer device with the default access that pool owner device gets (read+write is the default). Note that this read+write default access cannot be revoked from the owner, as it wouldn't make sense to have memory that nobody has access to, but peers can call `cudaMemPoolSetAccess` to gain read+write access or to stop accessing (none) a peer's pool memory. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1754
Because of the switch away from certificates/mTLS, we are having to rework a few things. In the meantime, telemetry jobs are failing. This PR adds a switch to turn all of the telemetry stuff off - to skip it instead. It is meant to be controlled by an org-wide environment variable, which can be applied to individual repos by ops. At the time of submitting this PR, the environment variable is 'false' and no telemetry is being reported. Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1752
Closes #1755. Replaces #1675. Authors: - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1756
Closes #1758. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Matthew Murray (https://github.com/Matt711) - James Lamb (https://github.com/jameslamb) URL: #1759
Closes #1762 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1765
Forward-merge branch-24.12 to branch-25.02
Following #1756, we no longer need to guard against this deprecation warning. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Matthew Murray (https://github.com/Matt711) - James Lamb (https://github.com/jameslamb) URL: #1768
Recently, I added support for `codespell` in CCCL (NVIDIA/cccl#3168). @shwina noticed some issues in my PR that were fixed in NVIDIA/cccl#3182. This PR ports similar fixes to RMM, to make `codespell` work better when run both inside and outside of `pre-commit`. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #1769
Closes #1616. This removes factory functions for resource adaptors that were previously deprecated in RMM 24.10, in PR #1626. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Mark Harris (https://github.com/harrism) URL: #1767
The goal here is to remove the need for certificates. Any worker that is not in our VPC can talk directly to fluentbit, and fluentbit will be configured with certificates to talk to Tempo. The implementation implication is that we need to run telemetry stuff ONLY on nodes in our VPC. To avoid needing to move all jobs to these nodes, we instead temporarily store telemetry data as artifacts, and in one final job, we process and send telemetry info for all jobs from one job. Part of rapidsai/shared-workflows#269 and rapidsai/shared-actions#28 Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1750
When stack trace is enabled we would run into compile failures since `array.h` wasn't explicitly included. We only work currently due to other headers bringing this include in. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1771
This PR makes `numba` an optional dependency of RMM. We are keeping `numba` as a hard dependency in tests, though I explored what it would look like as a soft dependency in e2ff7f1. It turns out that the current RMM test suite relies on `numba` for about 90% of the tests, as a way to copy data from host to device and back (to verify that the allocations are valid and usable). Closes #1760. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Matthew Murray (https://github.com/Matt711) - Mark Harris (https://github.com/harrism) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1761
This PR removes rmm's logger code in favor of using the [rapids-logger repo](https://github.com/rapidsai/rapids-logger) to which that code was moved. The main material change is that with the latest commit on that repo rmm will dump output to stderr instead of to a file by default, which was the generally requested behavior and also aligns with the rest of RAPIDS's loggers pre-rapids-logger. Nonetheless, I've marked that as a breaking change (also because the rapids-logger code is no longer available from this repository). Contributes to rapidsai/build-planning#104. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1774
Contributes to rapidsai/build-planning#127 This PR cannot be merged unless nightly CI has passed within the past 7 days, so if it remains unmerged that will itself be an indication that nightly CI needs fixing. Authors: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1772
This PR switches rmm to use rapids-cmake to fetch rapids-logger so that it uses a consistent version with the rest of RAPIDS to avoid any cases where transitive CPM loads result in multiple packages being built from source that require a different version of rapids-logger. Depends on rapidsai/rapids-cmake#737 Contributes to rapidsai/build-planning#104. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1776
updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.2 → v0.8.6](astral-sh/ruff-pre-commit@v0.8.2...v0.8.6) - [github.com/pre-commit/mirrors-clang-format: v16.0.6 → v19.1.6](pre-commit/mirrors-clang-format@v16.0.6...v19.1.6) - [github.com/rapidsai/dependency-file-generator: v1.16.0 → v1.17.0](rapidsai/dependency-file-generator@v1.16.0...v1.17.0) Authors: - https://github.com/apps/pre-commit-ci - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1778
See discussion in rapidsai/rapids-logger#10 (comment) and #1779. spdlog will remain a requirement for 25.02, but we will remove it in favor of a precompiled rapids-logger library in 25.04 (and that library will completely hide everything related to spdlog: APIs, package requirements, symbols, etc). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1780
Closes #1770. Authors: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1775
conda-forge is using GCC 13 for CUDA 12 builds. This PR updates CUDA 12 conda builds to use GCC 13, for alignment. These PRs should be merged in a specific order, see rapidsai/build-planning#129 for details. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1773
Numba 0.61.0 just got released with a couple of breaking changes, this pr is required to unblock the ci. xref: rapidsai/cudf#17777 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1789
`shellcheck` is a fast, static analysis tool for shell scripts. It's good at flagging up unused variables, unintentional glob expansions, and other potential execution and security headaches that arise from the wonders of `bash` (and other shlangs). This PR adds a `pre-commit` hook to run `shellcheck` on all of the `sh-lang` files in the `ci/` directory, and the changes requested by `shellcheck` to make the existing files pass the check. xref: rapidsai/build-planning#135 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1788
This PR uses CUDA 12.8.0 to build and test. xref: rapidsai/build-planning#139 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1797
We already suppress those for the constructors Authors: - Michael Schellenberger Costa (https://github.com/miscco) - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Rong Ou (https://github.com/rongou) - Bradley Dice (https://github.com/bdice) URL: #1790
This PR points the shared workflow branches back to the default 25.02 branches. xref: rapidsai/build-planning#139 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1805
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-25.02
and v25.02 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-25.02
until release (merging of this PR).What is the purpose of this PR?
branch-25.02
intomain
for the release