Skip to content

Merge main into staging#2311

Merged
bdice merged 14 commits intorapidsai:stagingfrom
bdice:staging-merge-main
Mar 17, 2026
Merged

Merge main into staging#2311
bdice merged 14 commits intorapidsai:stagingfrom
bdice:staging-merge-main

Conversation

@bdice
Copy link
Copy Markdown
Collaborator

@bdice bdice commented Mar 17, 2026

Description

This merges the following changes into the staging branch:

  • Compile RMM with -Wsign-conversion (#2308)
  • Scope fixed-size fixture to class; exercise distinct bins in binning tests (#2291)
  • Cap numba-cuda upper bound at <0.29.0 (#2306)
  • Fix missing ulimit in CUDA 13.1 devcontainers (#2309)
  • Remove zero-value special casing in set_element_async to preserve IEEE 754 -0.0 (#2302)
  • Fix ABA problem in tracking resource adaptor and statistics resource adaptor (#2304)
  • ensure 'torch' CUDA wheels are installed in CI (#2279)
  • examples: read tag from RAPIDS_BRANCH file (#2293)
  • Update to 26.06 (#2290)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

AyodeAwe and others added 14 commits March 12, 2026 09:35
This PR updates the repository to version 26.06.

This is part of the 26.04 release burndown process.
Fixes these `pre-commit` errors blocking CI:

```text
verify-hardcoded-version.................................................Failed
- hook id: verify-hardcoded-version
- exit code: 1

In file RAPIDS_BRANCH:1:9:
 release/26.04
warning: do not hard-code version, read from VERSION file instead

In file RAPIDS_BRANCH:1:9:
 release/26.04

In file cpp/examples/versions.cmake:8:21:
 set(RMM_TAG release/26.04)
warning: do not hard-code version, read from VERSION file instead

In file cpp/examples/versions.cmake:8:21:
 set(RMM_TAG release/26.04)
```

By updating `verify-hardcoded-version` configuration and by updating the C++ examples to read `RMM_TAG` from the `RAPIDS_BRANCH` file.

See rapidsai/pre-commit-hooks#121 for details

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#2293
Contributes to rapidsai/build-planning#256

Broken out from rapidsai#2270 

Proposes a stricter pattern for installing `torch` wheels, to prevent bugs of the form "accidentally used a CPU-only `torch` from pypi.org". This should help us to catch compatibility issues, improving release confidence.

Other small changes:

* splits torch wheel testing into "oldest" (PyTorch 2.9) and "latest" (PyTorch 2.10)
* introduces a `require_gpu_pytorch` matrix filter so conda jobs can explicitly request `pytorch-gpu` (to similarly ensure solvers don't fall back to the GPU-only variant)
* appends `rapids-generate-pip-constraint` output to file `PIP_CONSTRAINT` points
  - *(to reduce duplication and the risk of failing to apply constraints)*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#2279
…adaptor (rapidsai#2304)

So that the tracking resource adaptor is thread safe, the modification of the tracked allocations should be sandwiched by an "acquire-release" pair upstream.allocate-upstream.deallocate. Previously this was not the case, the upstream allocation occurred before updating the tracked allocations, but the dellocation did not occur after. This could lead to a scenario in multi-threaded use where we get a logged error that a deallocated pointer was not tracked.

To solve this, actually use the correct pattern. Moreover, ensure that we don't observe ABA issues by using try_emplace when tracking an allocation.

- Closes rapidsai#2303

Authors:
  - Lawrence Mitchell (https://github.com/wence-)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#2304
…E 754 -0.0 (rapidsai#2302)

## Description

`device_uvector::set_element_async` had a zero-value optimization that
used `cudaMemsetAsync` when `value == value_type{0}`. For IEEE 754
floating-point types, `-0.0 == 0.0` is `true` per the standard, so
`-0.0` was incorrectly routed through `cudaMemsetAsync(..., 0, ...)`
which clears all bits — including the sign bit — normalizing `-0.0` to
`+0.0`.

This corrupts the in-memory representation of `-0.0` for any downstream
library that creates scalars through RMM
(`cudf::fixed_width_scalar::set_value` →
`rmm::device_scalar::set_value_async` →
`device_uvector::set_element_async`), causing observable behavioral
divergence in spark-rapids (e.g., `cast(-0.0 as string)` returns `"0.0"`
on GPU instead of `"-0.0"`).

### Fix

Per the discussion in rapidsai#2298, remove all `constexpr` special casing in
`set_element_async` — both the `bool` `cudaMemsetAsync` path and the
`is_fundamental_v` zero-detection path — and always use
`cudaMemcpyAsync`. This preserves exact bit-level representations for
all types, which is the correct contract for a memory management library
that sits below cuDF, cuML, and cuGraph.

`set_element_to_zero_async` is unchanged — its explicit "set to zero"
semantics make `cudaMemsetAsync` the correct implementation.

### Testing

Added `NegativeZeroTest.PreservesFloatNegativeZero` and
`NegativeZeroTest.PreservesDoubleNegativeZero` regression tests that
verify the sign bit of `-0.0f` / `-0.0` survives a round-trip through
`set_element_async` → `element`. All 122 tests pass locally (CUDA 13.0,
RTX 5880).

Closes rapidsai#2298

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

Made with [Cursor](https://cursor.com)

---------

Signed-off-by: Allen Xu <allxu@nvidia.com>
## Description
I found that the `ulimit` settings for CUDA 13.1 devcontainers were
missing. This fixes it.

## Checklist
- [x] I am familiar with the [Contributing
Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
This PR sets an upper bound on the `numba-cuda` dependency to `<0.29.0`

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#2306
…tests (rapidsai#2291)

Scope the `FixedSizeMemoryResource` fixture to the test class so the resource is created once per upstream rather than once per `(dtype, nelem, alloc)` combination, matching the pattern from rapidsai#2284.

Rework `test_binning_memory_resource` so allocations span multiple distinct bins. The previous bin range `(2^18–2^22)` pre-allocated ~992 MiB of managed memory while test data never exceeded 1 KiB — every allocation landed in the same bin. The new range `(2^10–2^17)` creates bins from 1 KiB to 128 KiB, and each `_BINNING_NELEMS` value routes to a different fixed-size bin with `float64`. An explicit 128 MiB `CudaMemoryResource` bin and a dedicated `test_binning_large_allocation` exercise the large-allocation path.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Tom Augspurger (https://github.com/TomAugspurger)

URL: rapidsai#2291
Compiles rmm with `-Wsign-conversion -Werror` and fixes the failures

- Closes rapidsai#2307

Authors:
  - Matthew Murray (https://github.com/Matt711)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#2308
@bdice bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Mar 17, 2026
@bdice bdice requested review from a team as code owners March 17, 2026 07:16
@bdice bdice requested review from jameslamb and removed request for a team March 17, 2026 07:16
@bdice bdice requested review from miscco, robertmaynard and wence- and removed request for a team March 17, 2026 07:16
@bdice bdice self-assigned this Mar 17, 2026
@bdice bdice moved this to In Progress in RMM Project Board Mar 17, 2026
Copy link
Copy Markdown
Collaborator Author

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-reviewed, all seems fine.

@bdice bdice merged commit e4f4106 into rapidsai:staging Mar 17, 2026
26 of 29 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in RMM Project Board Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants