[FEA] Make `device_vector` safer to use in multi-device setting #1527

wence- · 2024-04-11T07:22:47Z

Is your feature request related to a problem? Please describe.

Since #1370, device_buffer is safe to use in a multi-device setting wrt active devices when the destructor runs. While it was always possible (and relatively straightforward) to arrange for the active device to be correct in scenarios where no exceptions occurred, when there are exceptions setting the correct device for destruction was much more complicated.

We therefore added the cuda_set_device_raii helper object and stored the active device id in the device_buffer to ensure that the correct device is always active when calling allocate/deallocate functions.

In contrast, since device_vector is just an alias for thrust::device_vector, it still suffers from the old issue: the user must manually arrange that the correct device is active for the dtor.

Describe the solution you'd like

#1523 documents this restriction, but it would be good if we could lift it. One way would be to store the active device in the thrust allocator wrapper that we use to interface RMM's memory resources with the thrust allocator model.

We would then use cuda_set_device_raii in all the allocate/deallocate functions.

This was discounted as an approach in #1370 since it produces more device switches than necessary in some circumstances (pushing the device switching as far out as possible was preferred), so there would be some overhead compared to use of device_buffer (though hopefully small). And we note that since device_vector isn't stream ordered there are other disadvantages to using it, so the small performance cost is probably not that terminal.

Describe alternatives you've considered

Maintain status quo, and eventually deprecate and then remove device_vector, since it is not stream-ordered anyway and we are trying to move away from that model.

The text was updated successfully, but these errors were encountered:

harrism · 2024-04-11T09:58:58Z

Perhaps we should benchmark it. In the single-device case, the rmm:::cuda_set_device_raii should be fast.

harrism · 2024-04-11T10:06:22Z

This was discounted as an approach in #1370 since it produces more device switches than necessary in some circumstances (pushing the device switching as far out as possible was preferred),

I think you are refering to this: #1370 (comment) . I think the case in RMM's thrust allocator isn't as bad -- putting the set device in allocate/deallocate wouldn't be the same as putting it way down in device_buffer::allocate_async()/deallocate_async(). In the latter, we would need it in those and in higher level functions that call those, resulting in duplicate device checks. Even if device_vector calls the allocator's functions multiple times (e.g. in a resize call -- deallocate and allocate), basically the RAII class would just cudaGetDeviceID() multiple times, which isn't that expensive AFAIK.

Previously, the user had to arrange that the device active when a thrust_allocator object was created was also active when allocate and deallocate was called. This is hard to manage if exceptions are thrown. Instead, save the active device on construction and ensure that it is active when calling deallocate and deallocate. This means that device_vector is safe to destruct with RAII semantics in a multi-device setting. Add tests of this facility, and correct the parameterization usage in the other thrust allocator tests such that we actually check the MRs we're parameterizing over. - Closes rapidsai#1527

Previously, the user had to arrange that the device active when a thrust_allocator object was created was also active when allocate and deallocate was called. This is hard to manage if exceptions are thrown. Instead, save the active device on construction and ensure that it is active when calling deallocate and deallocate. This means that device_vector is safe to destruct with RAII semantics in a multi-device setting. Add tests of this facility, and correct the parameterization usage in the other thrust allocator tests such that we actually check the MRs we're parameterizing over. - Closes #1527 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Rong Ou (https://github.com/rongou) - Mark Harris (https://github.com/harrism) URL: #1533

wence- added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 11, 2024

wence- mentioned this issue Apr 11, 2024

Update multi-gpu discussion for device_buffer and device_vector dtors #1524

Merged

3 tasks

wence- self-assigned this Apr 11, 2024

wence- added 1 - On Deck To be worked on next cpp Pertains to C++ code and removed ? - Needs Triage Need team to review and classify labels Apr 11, 2024

wence- mentioned this issue Apr 15, 2024

Make thrust_allocator deallocate safe in multi-device setting #1533

Merged

3 tasks

rapids-bot bot closed this as completed in #1533 Apr 16, 2024

jrhemstad mentioned this issue Jul 25, 2024

Ensure that cuda_memory_resource allocates memory on the proper device NVIDIA/cccl#2073

Merged

harrism mentioned this issue Aug 6, 2024

[PoC]: Implement cuda::experimental::uninitialized_async_buffer NVIDIA/cccl#1854

Merged

bdice added this to RMM Project Board Jan 15, 2025

github-project-automation bot moved this to To-do in RMM Project Board Jan 15, 2025

bdice moved this from To-do to Done in RMM Project Board Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Make `device_vector` safer to use in multi-device setting #1527

[FEA] Make `device_vector` safer to use in multi-device setting #1527

wence- commented Apr 11, 2024

harrism commented Apr 11, 2024

harrism commented Apr 11, 2024

[FEA] Make device_vector safer to use in multi-device setting #1527

[FEA] Make device_vector safer to use in multi-device setting #1527

Comments

wence- commented Apr 11, 2024

harrism commented Apr 11, 2024

harrism commented Apr 11, 2024

[FEA] Make `device_vector` safer to use in multi-device setting #1527

[FEA] Make `device_vector` safer to use in multi-device setting #1527