Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cuda_bindings/cuda/bindings/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NVIDIA-SOFTWARE-LICENSE

import atexit
from typing import Any, Callable

from ._ptx_utils import get_minimal_required_cuda_ver_from_ptx_ver, get_ptx_ver
Expand Down
6 changes: 4 additions & 2 deletions cuda_core/cuda/core/experimental/_event.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ from cuda.core.experimental._utils.cuda_utils import (
CUDAError,
driver,
handle_return,
is_shutting_down
)

if TYPE_CHECKING:
Expand Down Expand Up @@ -108,10 +109,11 @@ cdef class Event:
self._ctx_handle = ctx_handle
return self

cpdef close(self):
cpdef close(self, is_shutting_down=is_shutting_down):
"""Destroy the event."""
if self._handle is not None:
err, = driver.cuEventDestroy(self._handle)
if not is_shutting_down()
err, = driver.cuEventDestroy(self._handle)
Copy link
Contributor

@cpcloud cpcloud Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue with this?

Suggested change
err, = driver.cuEventDestroy(self._handle)
if (destroy := getattr(driver, "cuEventDestroy", None)) is not None:
err, = destroy(self._handle)

Then you don't need the global hack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use a field-tested solution (see the internal thread) instead of coming up with a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about sys.is_finalizing()? There's probably not a more field tested solution than what's in the standard library.

In this case it also seems particularly suited to the problem being solved here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely better, but this is still a hack where if we run things under tools like cuda-memcheck that check for resource leaks it will still pop.

Additionally, we'd need to carry these checks everywhere that code could be called in __del__ functions, I believe even transitively. I.E. the raise_if_driver_error function.

What if we moved to a __dealloc__ function?

Copy link
Member

@leofang leofang Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed internally,sys.is_finalizing() - while official - only returns True at the very late stage of interpreter shutdown, later than all of the exit handlers (which this PR is based on). It is unclear to me if this solves the problem. I feel nervous about this. Do we have any known, big projects using this solution?

What if we moved to a __dealloc__ function?

No, we can't do this (yet), because we currently call Python bindings. Once #866 lands we can switch to this, but I need some time to work it out and I prefer this to be fixed independently (and asap).

Copy link
Contributor

@cpcloud cpcloud Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the relevant steps, in order, during interpreter shutdown:

  1. wait for threads to shutdown
  2. wait for any pending calls
  3. call atexit handlers (where the flag would be set)
  4. set the interpreter to be officially in finalizing mode (this information is what sys.is_finalizing() uses)
  5. collect garbage (__del__ would be called here)

So, it doesn't really matter which approach we take, and it's overall less code and less hacky code to use a standard library builtin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: Turns out is_finalizing becomes True too late, and this PR does not fix all shutdown errors: #1063. We'll fix this in #1070.

Copy link
Contributor

@cpcloud cpcloud Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like the previously proposed solution using atexit would work, but it wouldn't, because __del__ is still called at the same point in the program regardless of where the state of the interpreter is checked.

It would be more helpful to provide thorough reasoning (as I did above). Right now, it just looks like everything I said was incorrect without any description as to why that is.

self._handle = None
raise_if_driver_error(err)

Expand Down
7 changes: 4 additions & 3 deletions cuda_core/cuda/core/experimental/_memory.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ from typing import TypeVar, Union

from cuda.core.experimental._dlpack import DLDeviceType, make_py_capsule
from cuda.core.experimental._stream import Stream, default_stream
from cuda.core.experimental._utils.cuda_utils import driver
from cuda.core.experimental._utils.cuda_utils import driver, is_shutting_down

# TODO: define a memory property mixin class and make Buffer and
# MemoryResource both inherit from it
Expand Down Expand Up @@ -59,7 +59,7 @@ cdef class Buffer:
def __del__(self):
self.close()

cpdef close(self, stream: Stream = None):
cpdef close(self, stream: Stream = None, is_shutting_down=is_shutting_down):
"""Deallocate this buffer asynchronously on the given stream.

This buffer is released back to their memory resource
Expand All @@ -72,7 +72,8 @@ cdef class Buffer:
the behavior depends on the underlying memory resource.
"""
if self._ptr and self._mr is not None:
self._mr.deallocate(self._ptr, self._size, stream)
if not is_shutting_down():
self._mr.deallocate(self._ptr, self._size, stream)
self._ptr = 0
self._mr = None
self._ptr_obj = None
Expand Down
6 changes: 4 additions & 2 deletions cuda_core/cuda/core/experimental/_stream.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ from cuda.core.experimental._utils.cuda_utils import (
driver,
get_device_from_ctx,
handle_return,
is_shutting_down
)


Expand Down Expand Up @@ -187,7 +188,7 @@ cdef class Stream:
def __del__(self):
self.close()

cpdef close(self):
cpdef close(self, is_shutting_down=is_shutting_down):
"""Destroy the stream.

Destroy the stream if we own it. Borrowed foreign stream
Expand All @@ -196,7 +197,8 @@ cdef class Stream:
"""
if self._owner is None:
if self._handle and not self._builtin:
handle_return(driver.cuStreamDestroy(self._handle))
if not is_shutting_down():
handle_return(driver.cuStreamDestroy(self._handle))
else:
self._owner = None
self._handle = None
Expand Down
23 changes: 23 additions & 0 deletions cuda_core/cuda/core/experimental/_utils/cuda_utils.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#
# SPDX-License-Identifier: Apache-2.0

import atexit
import functools
import importlib.metadata
from collections import namedtuple
Expand Down Expand Up @@ -222,3 +223,25 @@ def get_binding_version():
except importlib.metadata.PackageNotFoundError:
major_minor = importlib.metadata.version("cuda-python").split(".")[:2]
return tuple(int(v) for v in major_minor)

# This code is to signal when the interpreter is in shutdown mode
# to prevent using globals that could be already deleted in
# objects `__del__` method
#
# This solution is taken from the Numba/llvmlite code
_shutting_down = [False]


@atexit.register
def _at_shutdown():
_shutting_down[0] = True


def is_shutting_down(_shutting_down=_shutting_down):
"""
Whether the interpreter is currently shutting down.
For use in finalizers, __del__ methods, and similar; it is advised
to early bind this function rather than look it up when calling it,
since at shutdown module globals may be cleared.
"""
return _shutting_down[0]