Skip to content

[Refactor][NFC] Vendor-in cpython components for future CUDA-specific changes #468

Closed
atmnp wants to merge 14 commits intoNVIDIA:mainfrom
atmnp:atmn/vendor-in-cpython
Closed

[Refactor][NFC] Vendor-in cpython components for future CUDA-specific changes #468
atmnp wants to merge 14 commits intoNVIDIA:mainfrom
atmnp:atmn/vendor-in-cpython

Conversation

@atmnp
Copy link
Contributor

@atmnp atmnp commented Sep 10, 2025

This change vendors in parts of the cpython components for future CUDA-specific changes. In particular, this contains the implementations that do not explicitly require additional tests from Numba to verify behavior. Additionally, there are a few small cleanup fixes such as removing the defunct CPUTarget that is unnecessary in our implementation of the CUDA target.

This PR has been temporarily abandoned in favor of #493 which is a bulk change.

@atmnp atmnp self-assigned this Sep 10, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 10, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@atmnp atmnp added the 3 - Ready for Review Ready for review by team label Sep 10, 2025
@atmnp
Copy link
Contributor Author

atmnp commented Sep 10, 2025

/ok to test

@atmnp
Copy link
Contributor Author

atmnp commented Sep 10, 2025

/ok to test

rparolin
rparolin previously approved these changes Sep 15, 2025
Copy link
Contributor

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atmnp
Copy link
Contributor Author

atmnp commented Sep 24, 2025

/ok to test

@atmnp
Copy link
Contributor Author

atmnp commented Sep 26, 2025

/ok to test

@atmnp
Copy link
Contributor Author

atmnp commented Sep 26, 2025

/ok to test

@atmnp
Copy link
Contributor Author

atmnp commented Sep 26, 2025

/ok to test

@VijayKandiah
Copy link
Contributor

/ok to test

"""This module provides the unsafe things for targets/numbers.py"""

from numba.core import types, errors
from numba.core.extending import intrinsic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with numba.cuda.extending.intrinsic now?


from numba.core import types, typing, errors
from numba.cuda.cgutils import alloca_once
from numba.core.extending import intrinsic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about intrinsic

# SPDX-License-Identifier: BSD-2-Clause

"""
Implementation of enums.
Copy link
Contributor

@brandon-b-miller brandon-b-miller Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we vendor a few of the enum related tests? Things appear to work with the CUDA target:

from enum import IntEnum
from numba import cuda
import numpy as np

class Color(IntEnum):
    RED = 1
    GREEN = 2
    BLUE = 3

@cuda.jit(device=True)
def is_primary(color):
    return int(color) in (Color.RED, Color.BLUE)

@cuda.jit
def kernel(input, output):
    idx = cuda.grid(1)
    if idx < input.size:
        output[idx] = is_primary(input[idx])

input = cuda.to_device([1, 2, 3])
output = cuda.to_device([0, 0, 0])

kernel[1, 3](input, output)
print(output.copy_to_host())
[1 0 1]

We don't need everything but things might map reasonably well to simple kernels that test the CPU implementations of Enum properties on perhaps a single cuda thread

# SPDX-License-Identifier: BSD-2-Clause

"""
Implementation of the range object for fixed-size integers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These also appear to be supported so it would be nice to include so basic tests

@cuda.jit
def kernel(output):
    idx = cuda.grid(1)
    if idx < output.size:
        r = range(10)
        sum = 0
        for i in r:
            sum += 1
        output[idx] = sum

output = cuda.to_device([0, 0, 0])

kernel[1, 3](output)
print(output.copy_to_host())
[10 10 10]

# SPDX-License-Identifier: BSD-2-Clause

"""
Implementation of tuple objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use a few test cases since this ports to cuda. Things like iteration, possibly __getitem__, etc.

@cuda.jit
def kernel(output):
    idx = cuda.grid(1)
    if idx < output.size:
        t = (1, 2, 3)
        sum = 0
        for i in t:
            sum += 1
        output[idx] = sum

output = cuda.to_device([0, 0, 0])

kernel[1, 3](output)
print(output.copy_to_host())
[3 3 3]

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good. I think we should at least create a few basic test cases for each of the builtin objects for whom lowering is being vendored here. It need not be comprehensive but lets put a few files in place to get things started and we can move more things over if it feels like there's gaps later on.

@atmnp atmnp added 4 - Waiting on author Waiting for author to respond to review 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team 4 - Waiting on author Waiting for author to respond to review labels Sep 30, 2025
@atmnp atmnp closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - In Progress Currently a work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants