Conversation
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
Co-authored-by: Graham Markall <535640+gmarkall@users.noreply.github.com>
…mba-cuda into fea-bfloat16-highlevel
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
I think we should have a separate PR to enable LTO by default in general as long as pynvjitlink is available and it is new enough for the current GPU. There might be some caveats / nuance to this behaviour, but I think the general approach is that we should be doing LTO as much as possible because it's such a performance win with any external code. |
|
Closing: LTO is on by default. This means bfloat16 are tested under LTO mode currently. This PR now adds little value. |
In #245, we added bfloat16 API bindings. And it turns out that we missed testing the usage of the bindings with
lto=True. This PR adds that.