Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion cuda_bindings/cuda/bindings/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

from typing import Any, Callable

from ._ptx_utils import get_minimal_required_cuda_ver_from_ptx_ver, get_ptx_ver
Expand Down
11 changes: 9 additions & 2 deletions cuda_core/cuda/core/experimental/_event.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ from cuda.core.experimental._utils.cuda_utils import (
driver,
handle_return,
)

import sys
if TYPE_CHECKING:
import cuda.bindings
from cuda.core.experimental._device import Device
Expand Down Expand Up @@ -108,6 +108,13 @@ cdef class Event:
self._ctx_handle = ctx_handle
return self

cpdef safe_close(self, is_shutting_down=sys.is_finalizing):
"""Destroy the event."""
if self._handle is not None:
if not is_shutting_down():
err, = driver.cuEventDestroy(self._handle)
Copy link
Contributor

@cpcloud cpcloud Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with this?

Suggested change
err, = driver.cuEventDestroy(self._handle)
if (destroy := getattr(driver, "cuEventDestroy", None)) is not None:
err, = destroy(self._handle)

Then you don't need the global hack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use a field-tested solution (see the internal thread) instead of coming up with a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about sys.is_finalizing()? There's probably not a more field tested solution than what's in the standard library.

In this case it also seems particularly suited to the problem being solved here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely better, but this is still a hack where if we run things under tools like cuda-memcheck that check for resource leaks it will still pop.

Additionally, we'd need to carry these checks everywhere that code could be called in __del__ functions, I believe even transitively. I.E. the raise_if_driver_error function.

What if we moved to a __dealloc__ function?

Copy link
Member

@leofang leofang Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed internally,sys.is_finalizing() - while official - only returns True at the very late stage of interpreter shutdown, later than all of the exit handlers (which this PR is based on). It is unclear to me if this solves the problem. I feel nervous about this. Do we have any known, big projects using this solution?

What if we moved to a __dealloc__ function?

No, we can't do this (yet), because we currently call Python bindings. Once #866 lands we can switch to this, but I need some time to work it out and I prefer this to be fixed independently (and asap).

Copy link
Contributor

@cpcloud cpcloud Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the relevant steps, in order, during interpreter shutdown:

  1. wait for threads to shutdown
  2. wait for any pending calls
  3. call atexit handlers (where the flag would be set)
  4. set the interpreter to be officially in finalizing mode (this information is what sys.is_finalizing() uses)
  5. collect garbage (__del__ would be called here)

So, it doesn't really matter which approach we take, and it's overall less code and less hacky code to use a standard library builtin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: Turns out is_finalizing becomes True too late, and this PR does not fix all shutdown errors: #1063. We'll fix this in #1070.

Copy link
Contributor

@cpcloud cpcloud Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like the previously proposed solution using atexit would work, but it wouldn't, because __del__ is still called at the same point in the program regardless of where the state of the interpreter is checked.

It would be more helpful to provide thorough reasoning (as I did above). Right now, it just looks like everything I said was incorrect without any description as to why that is.

raise_if_driver_error(err)

cpdef close(self):
"""Destroy the event."""
if self._handle is not None:
Expand All @@ -116,7 +123,7 @@ cdef class Event:
raise_if_driver_error(err)

def __del__(self):
self.close()
self.safe_close()

def __isub__(self, other):
return NotImplemented
Expand Down
8 changes: 7 additions & 1 deletion cuda_core/cuda/core/experimental/_memory.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ from cuda.core.experimental._utils.cuda_utils cimport (
_check_driver_error as raise_if_driver_error,
check_or_create_options,
)
import sys

from dataclasses import dataclass
from typing import TypeVar, Union, TYPE_CHECKING
Expand Down Expand Up @@ -69,7 +70,12 @@ cdef class Buffer:
return self

def __del__(self):
self.close()
self.safe_close()

cpdef safe_close(self, stream: Stream = None, is_shutting_down=sys.is_finalizing):
if self._ptr and self._mr is not None:
if not is_shutting_down():
self._mr.deallocate(self._ptr, self._size, stream)

cpdef close(self, stream: Stream = None):
"""Deallocate this buffer asynchronously on the given stream.
Expand Down
9 changes: 8 additions & 1 deletion cuda_core/cuda/core/experimental/_stream.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ from cuda.core.experimental._utils.cuda_utils cimport (
_check_driver_error as raise_if_driver_error,
check_or_create_options,
)
import sys

import cython
import os
Expand Down Expand Up @@ -186,7 +187,13 @@ cdef class Stream:
return self

def __del__(self):
self.close()
self.safe_close()

cpdef safe_close(self, is_shutting_down=sys.is_finalizing):
if self._owner is None:
if self._handle and not self._builtin:
if not is_shutting_down():
handle_return(driver.cuStreamDestroy(self._handle))

cpdef close(self):
"""Destroy the stream.
Expand Down