Merge main into staging by bdice · Pull Request #2311 · rapidsai/rmm

bdice · 2026-03-17T07:16:07Z

Description

This merges the following changes into the staging branch:

Compile RMM with -Wsign-conversion (#2308)
Scope fixed-size fixture to class; exercise distinct bins in binning tests (#2291)
Cap numba-cuda upper bound at <0.29.0 (#2306)
Fix missing ulimit in CUDA 13.1 devcontainers (#2309)
Remove zero-value special casing in set_element_async to preserve IEEE 754 -0.0 (#2302)
Fix ABA problem in tracking resource adaptor and statistics resource adaptor (#2304)
ensure 'torch' CUDA wheels are installed in CI (#2279)
examples: read tag from RAPIDS_BRANCH file (#2293)
Update to 26.06 (#2290)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

This reverts commit 29b27d1.

This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.

Fixes these `pre-commit` errors blocking CI: ```text verify-hardcoded-version.................................................Failed - hook id: verify-hardcoded-version - exit code: 1 In file RAPIDS_BRANCH:1:9: release/26.04 warning: do not hard-code version, read from VERSION file instead In file RAPIDS_BRANCH:1:9: release/26.04 In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) warning: do not hard-code version, read from VERSION file instead In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) ``` By updating `verify-hardcoded-version` configuration and by updating the C++ examples to read `RMM_TAG` from the `RAPIDS_BRANCH` file. See rapidsai/pre-commit-hooks#121 for details Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2293

Contributes to rapidsai/build-planning#256 Broken out from rapidsai#2270 Proposes a stricter pattern for installing `torch` wheels, to prevent bugs of the form "accidentally used a CPU-only `torch` from pypi.org". This should help us to catch compatibility issues, improving release confidence. Other small changes: * splits torch wheel testing into "oldest" (PyTorch 2.9) and "latest" (PyTorch 2.10) * introduces a `require_gpu_pytorch` matrix filter so conda jobs can explicitly request `pytorch-gpu` (to similarly ensure solvers don't fall back to the GPU-only variant) * appends `rapids-generate-pip-constraint` output to file `PIP_CONSTRAINT` points - *(to reduce duplication and the risk of failing to apply constraints)* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2279

…adaptor (rapidsai#2304) So that the tracking resource adaptor is thread safe, the modification of the tracked allocations should be sandwiched by an "acquire-release" pair upstream.allocate-upstream.deallocate. Previously this was not the case, the upstream allocation occurred before updating the tracked allocations, but the dellocation did not occur after. This could lead to a scenario in multi-threaded use where we get a logged error that a deallocated pointer was not tracked. To solve this, actually use the correct pattern. Moreover, ensure that we don't observe ABA issues by using try_emplace when tracking an allocation. - Closes rapidsai#2303 Authors: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2304

…E 754 -0.0 (rapidsai#2302) ## Description `device_uvector::set_element_async` had a zero-value optimization that used `cudaMemsetAsync` when `value == value_type{0}`. For IEEE 754 floating-point types, `-0.0 == 0.0` is `true` per the standard, so `-0.0` was incorrectly routed through `cudaMemsetAsync(..., 0, ...)` which clears all bits — including the sign bit — normalizing `-0.0` to `+0.0`. This corrupts the in-memory representation of `-0.0` for any downstream library that creates scalars through RMM (`cudf::fixed_width_scalar::set_value` → `rmm::device_scalar::set_value_async` → `device_uvector::set_element_async`), causing observable behavioral divergence in spark-rapids (e.g., `cast(-0.0 as string)` returns `"0.0"` on GPU instead of `"-0.0"`). ### Fix Per the discussion in rapidsai#2298, remove all `constexpr` special casing in `set_element_async` — both the `bool` `cudaMemsetAsync` path and the `is_fundamental_v` zero-detection path — and always use `cudaMemcpyAsync`. This preserves exact bit-level representations for all types, which is the correct contract for a memory management library that sits below cuDF, cuML, and cuGraph. `set_element_to_zero_async` is unchanged — its explicit "set to zero" semantics make `cudaMemsetAsync` the correct implementation. ### Testing Added `NegativeZeroTest.PreservesFloatNegativeZero` and `NegativeZeroTest.PreservesDoubleNegativeZero` regression tests that verify the sign bit of `-0.0f` / `-0.0` survives a round-trip through `set_element_async` → `element`. All 122 tests pass locally (CUDA 13.0, RTX 5880). Closes rapidsai#2298 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. Made with [Cursor](https://cursor.com) --------- Signed-off-by: Allen Xu <allxu@nvidia.com>

## Description I found that the `ulimit` settings for CUDA 13.1 devcontainers were missing. This fixes it. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

This PR sets an upper bound on the `numba-cuda` dependency to `<0.29.0` Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2306

…tests (rapidsai#2291) Scope the `FixedSizeMemoryResource` fixture to the test class so the resource is created once per upstream rather than once per `(dtype, nelem, alloc)` combination, matching the pattern from rapidsai#2284. Rework `test_binning_memory_resource` so allocations span multiple distinct bins. The previous bin range `(2^18–2^22)` pre-allocated ~992 MiB of managed memory while test data never exceeded 1 KiB — every allocation landed in the same bin. The new range `(2^10–2^17)` creates bins from 1 KiB to 128 KiB, and each `_BINNING_NELEMS` value routes to a different fixed-size bin with `float64`. An explicit 128 MiB `CudaMemoryResource` bin and a dedicated `test_binning_large_allocation` exercise the large-allocation path. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Tom Augspurger (https://github.com/TomAugspurger) URL: rapidsai#2291

Forward-merge release/26.04 into main

Compiles rmm with `-Wsign-conversion -Werror` and fixes the failures - Closes rapidsai#2307 Authors: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#2308

bdice

Self-reviewed, all seems fine.

AyodeAwe and others added 14 commits March 12, 2026 09:35

Revert "Prepare release/26.04"

a3cd68d

This reverts commit 29b27d1.

Prepare release/26.04

29b27d1

Update to 26.06 (rapidsai#2290)

22d812c

This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.

Merge branch 'release/26.04' into main-merge-release/26.04

7ddf10f

Merge pull request rapidsai#2310 from bdice/main-merge-release/26.04

485d79a

Forward-merge release/26.04 into main

Merge remote-tracking branch 'upstream/main' into staging-merge-main

28ba832

bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Mar 17, 2026

bdice requested review from a team as code owners March 17, 2026 07:16

bdice requested review from jameslamb and removed request for a team March 17, 2026 07:16

github-project-automation bot added this to RMM Project Board Mar 17, 2026

bdice requested review from miscco, robertmaynard and wence- and removed request for a team March 17, 2026 07:16

bdice self-assigned this Mar 17, 2026

bdice moved this to In Progress in RMM Project Board Mar 17, 2026

bdice commented Mar 17, 2026

View reviewed changes

bdice merged commit e4f4106 into rapidsai:staging Mar 17, 2026
26 of 29 checks passed

github-project-automation bot moved this from In Progress to Done in RMM Project Board Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main into staging#2311

Merge main into staging#2311
bdice merged 14 commits intorapidsai:stagingfrom
bdice:staging-merge-main

bdice commented Mar 17, 2026

Uh oh!

bdice left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

bdice commented Mar 17, 2026

Description

Checklist

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants