Skip to content

Conversation

@isVoid
Copy link
Contributor

@isVoid isVoid commented Feb 11, 2026

This PR adds high level exposure of fp8 data type to Numba-CUDA.

Supported features include:

  • FP8 constructors from existing data type (elementwise and packed)
  • Conversion intrinsics that provide finer control of saturation type

Supported FP8 variants (element wise):

  • fp8_[e5m2, e4m3, e8m0]

Supported packed FP8 variants:

  • fp8[x2, x4]_[e5m2, e4m3, e8m0]

This PR also adds tests for packed type bindings introduced in #686.

closes #200

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 11, 2026

Automatic reviews are disabled for this repository.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 11, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

from numba.cuda.extending import register_jitable


@register_jitable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of @register_jittable is recommended by code agents. Is this a good choice in today's Numba?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only makes sense if you want to call the function as a pure Python function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(So register_jitable just makes no sense in the context of Numba-CUDA)

Copy link
Contributor Author

@isVoid isVoid Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR shows that register_jittable may enable jitting function that's not directly callable in pure python. As shown by

@register_jitable
def bfloat16_to_e8m0(x, saturate, rounding):
    return _cvt_bfloat16raw_to_e8m0(
        _bfloat16_as_bfloat16_raw(x), saturate, rounding
    )

Where _bfloat16_as_bfloat16_raw is written as numba intrinsics and may not be called with these arguments as-is.

Proposing updating the docstring of register_jittable and keeping the function.

@isVoid
Copy link
Contributor Author

isVoid commented Feb 12, 2026

/ok to test 894b79d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Support fp8 data types

2 participants