-
Notifications
You must be signed in to change notification settings - Fork 233
Feature/occupancy #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Feature/occupancy #648
Changes from 14 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
f79a7f0
Fix typo: stream._handle -> stream.handle
oleksandr-pavlyk de7b3c9
Move definition of LaunchConfig class to separate file
oleksandr-pavlyk 5bd64a7
Introduce _module.KernelOccupancy class
oleksandr-pavlyk b89c95f
Add occupancy tests, except for cluster-related queries
oleksandr-pavlyk 9679e0e
Fix type in querying handle from Stream argument
oleksandr-pavlyk ff322ec
Add tests for cluster-related occupancy descriptors
oleksandr-pavlyk fd8302f
Introduce MaxPotentialBlockSizeOccupancyResult named tuple
oleksandr-pavlyk 40d799a
KernelOccupancy.max_potential_block_size support for CUoccupancyB2DSize
oleksandr-pavlyk 5968ff0
Add test for B2DSize usage in max_potential_block_size
oleksandr-pavlyk fdbad93
Merge branch 'main' into feature/occupancy
oleksandr-pavlyk 436f111
Merge branch 'main' into feature/occupancy
oleksandr-pavlyk 428f4fa
Improved max_potential_block_size.__doc__
oleksandr-pavlyk f1ff0f5
Add test for dynamic_shared_memory_needed arg of invalid type
oleksandr-pavlyk 39a08f6
Mention feature/occupancy in 0.3.0 release notes
oleksandr-pavlyk f74dcf1
Add symbols to api_private.rst
oleksandr-pavlyk e2adc57
Reduce test name verbosity
oleksandr-pavlyk 496eb5b
Add doc-strings to KernelOccupancy methods.
oleksandr-pavlyk f74db2c
fix rendering
leofang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. ALL RIGHTS RESERVED. | ||
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Optional, Union | ||
|
|
||
| from cuda.core.experimental._device import Device | ||
| from cuda.core.experimental._utils.cuda_utils import ( | ||
| CUDAError, | ||
| cast_to_3_tuple, | ||
| driver, | ||
| get_binding_version, | ||
| handle_return, | ||
| ) | ||
|
|
||
| # TODO: revisit this treatment for py313t builds | ||
| _inited = False | ||
|
|
||
|
|
||
| def _lazy_init(): | ||
| global _inited | ||
| if _inited: | ||
| return | ||
|
|
||
| global _use_ex | ||
| # binding availability depends on cuda-python version | ||
| _py_major_minor = get_binding_version() | ||
| _driver_ver = handle_return(driver.cuDriverGetVersion()) | ||
| _use_ex = (_driver_ver >= 11080) and (_py_major_minor >= (11, 8)) | ||
| _inited = True | ||
|
|
||
|
|
||
| @dataclass | ||
| class LaunchConfig: | ||
| """Customizable launch options. | ||
|
|
||
| Attributes | ||
| ---------- | ||
| grid : Union[tuple, int] | ||
| Collection of threads that will execute a kernel function. | ||
| cluster : Union[tuple, int] | ||
| Group of blocks (Thread Block Cluster) that will execute on the same | ||
| GPU Processing Cluster (GPC). Blocks within a cluster have access to | ||
| distributed shared memory and can be explicitly synchronized. | ||
| block : Union[tuple, int] | ||
| Group of threads (Thread Block) that will execute on the same | ||
| streaming multiprocessor (SM). Threads within a thread blocks have | ||
| access to shared memory and can be explicitly synchronized. | ||
| shmem_size : int, optional | ||
| Dynamic shared-memory size per thread block in bytes. | ||
| (Default to size 0) | ||
|
|
||
| """ | ||
|
|
||
| # TODO: expand LaunchConfig to include other attributes | ||
| grid: Union[tuple, int] = None | ||
| cluster: Union[tuple, int] = None | ||
| block: Union[tuple, int] = None | ||
| shmem_size: Optional[int] = None | ||
|
|
||
| def __post_init__(self): | ||
| _lazy_init() | ||
| self.grid = cast_to_3_tuple("LaunchConfig.grid", self.grid) | ||
| self.block = cast_to_3_tuple("LaunchConfig.block", self.block) | ||
| # thread block clusters are supported starting H100 | ||
| if self.cluster is not None: | ||
| if not _use_ex: | ||
| err, drvers = driver.cuDriverGetVersion() | ||
| drvers_fmt = f" (got driver version {drvers})" if err == driver.CUresult.CUDA_SUCCESS else "" | ||
| raise CUDAError(f"thread block clusters require cuda.bindings & driver 11.8+{drvers_fmt}") | ||
| cc = Device().compute_capability | ||
| if cc < (9, 0): | ||
| raise CUDAError( | ||
| f"thread block clusters are not supported on devices with compute capability < 9.0 (got {cc})" | ||
| ) | ||
| self.cluster = cast_to_3_tuple("LaunchConfig.cluster", self.cluster) | ||
| if self.shmem_size is None: | ||
| self.shmem_size = 0 | ||
|
|
||
|
|
||
| def _to_native_launch_config(config: LaunchConfig) -> driver.CUlaunchConfig: | ||
| _lazy_init() | ||
| drv_cfg = driver.CUlaunchConfig() | ||
| drv_cfg.gridDimX, drv_cfg.gridDimY, drv_cfg.gridDimZ = config.grid | ||
| drv_cfg.blockDimX, drv_cfg.blockDimY, drv_cfg.blockDimZ = config.block | ||
| drv_cfg.sharedMemBytes = config.shmem_size | ||
| attrs = [] # TODO: support more attributes | ||
| if config.cluster: | ||
| attr = driver.CUlaunchAttribute() | ||
| attr.id = driver.CUlaunchAttributeID.CU_LAUNCH_ATTRIBUTE_CLUSTER_DIMENSION | ||
| dim = attr.value.clusterDim | ||
| dim.x, dim.y, dim.z = config.cluster | ||
| attrs.append(attr) | ||
| drv_cfg.numAttrs = len(attrs) | ||
| drv_cfg.attrs = attrs | ||
| return drv_cfg |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.