Merged
Conversation
This reverts commit 29b27d1.
This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.
Fixes these `pre-commit` errors blocking CI: ```text verify-hardcoded-version.................................................Failed - hook id: verify-hardcoded-version - exit code: 1 In file RAPIDS_BRANCH:1:9: release/26.04 warning: do not hard-code version, read from VERSION file instead In file RAPIDS_BRANCH:1:9: release/26.04 In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) warning: do not hard-code version, read from VERSION file instead In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) ``` By updating `verify-hardcoded-version` configuration and by updating the C++ examples to read `RMM_TAG` from the `RAPIDS_BRANCH` file. See rapidsai/pre-commit-hooks#121 for details Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2293
Contributes to rapidsai/build-planning#256 Broken out from rapidsai#2270 Proposes a stricter pattern for installing `torch` wheels, to prevent bugs of the form "accidentally used a CPU-only `torch` from pypi.org". This should help us to catch compatibility issues, improving release confidence. Other small changes: * splits torch wheel testing into "oldest" (PyTorch 2.9) and "latest" (PyTorch 2.10) * introduces a `require_gpu_pytorch` matrix filter so conda jobs can explicitly request `pytorch-gpu` (to similarly ensure solvers don't fall back to the GPU-only variant) * appends `rapids-generate-pip-constraint` output to file `PIP_CONSTRAINT` points - *(to reduce duplication and the risk of failing to apply constraints)* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2279
…adaptor (rapidsai#2304) So that the tracking resource adaptor is thread safe, the modification of the tracked allocations should be sandwiched by an "acquire-release" pair upstream.allocate-upstream.deallocate. Previously this was not the case, the upstream allocation occurred before updating the tracked allocations, but the dellocation did not occur after. This could lead to a scenario in multi-threaded use where we get a logged error that a deallocated pointer was not tracked. To solve this, actually use the correct pattern. Moreover, ensure that we don't observe ABA issues by using try_emplace when tracking an allocation. - Closes rapidsai#2303 Authors: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2304
…E 754 -0.0 (rapidsai#2302) ## Description `device_uvector::set_element_async` had a zero-value optimization that used `cudaMemsetAsync` when `value == value_type{0}`. For IEEE 754 floating-point types, `-0.0 == 0.0` is `true` per the standard, so `-0.0` was incorrectly routed through `cudaMemsetAsync(..., 0, ...)` which clears all bits — including the sign bit — normalizing `-0.0` to `+0.0`. This corrupts the in-memory representation of `-0.0` for any downstream library that creates scalars through RMM (`cudf::fixed_width_scalar::set_value` → `rmm::device_scalar::set_value_async` → `device_uvector::set_element_async`), causing observable behavioral divergence in spark-rapids (e.g., `cast(-0.0 as string)` returns `"0.0"` on GPU instead of `"-0.0"`). ### Fix Per the discussion in rapidsai#2298, remove all `constexpr` special casing in `set_element_async` — both the `bool` `cudaMemsetAsync` path and the `is_fundamental_v` zero-detection path — and always use `cudaMemcpyAsync`. This preserves exact bit-level representations for all types, which is the correct contract for a memory management library that sits below cuDF, cuML, and cuGraph. `set_element_to_zero_async` is unchanged — its explicit "set to zero" semantics make `cudaMemsetAsync` the correct implementation. ### Testing Added `NegativeZeroTest.PreservesFloatNegativeZero` and `NegativeZeroTest.PreservesDoubleNegativeZero` regression tests that verify the sign bit of `-0.0f` / `-0.0` survives a round-trip through `set_element_async` → `element`. All 122 tests pass locally (CUDA 13.0, RTX 5880). Closes rapidsai#2298 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. Made with [Cursor](https://cursor.com) --------- Signed-off-by: Allen Xu <allxu@nvidia.com>
## Description I found that the `ulimit` settings for CUDA 13.1 devcontainers were missing. This fixes it. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
This PR sets an upper bound on the `numba-cuda` dependency to `<0.29.0` Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2306
…tests (rapidsai#2291) Scope the `FixedSizeMemoryResource` fixture to the test class so the resource is created once per upstream rather than once per `(dtype, nelem, alloc)` combination, matching the pattern from rapidsai#2284. Rework `test_binning_memory_resource` so allocations span multiple distinct bins. The previous bin range `(2^18–2^22)` pre-allocated ~992 MiB of managed memory while test data never exceeded 1 KiB — every allocation landed in the same bin. The new range `(2^10–2^17)` creates bins from 1 KiB to 128 KiB, and each `_BINNING_NELEMS` value routes to a different fixed-size bin with `float64`. An explicit 128 MiB `CudaMemoryResource` bin and a dedicated `test_binning_large_allocation` exercise the large-allocation path. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: rapidsai#2291
Forward-merge release/26.04 into main
Compiles rmm with `-Wsign-conversion -Werror` and fixes the failures - Closes rapidsai#2307 Authors: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2308
bdice
commented
Mar 17, 2026
Collaborator
Author
bdice
left a comment
There was a problem hiding this comment.
Self-reviewed, all seems fine.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This merges the following changes into the
stagingbranch:-Wsign-conversion(#2308)Checklist