Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Jul 31, 2025

This pull request extends the WebGPU execution provider to support int64 data type casting in the Cast operator, with conditional support based on whether graph capture is enabled. It refactors kernel registration to allow toggling int64 support and updates the shader code and kernel logic to handle int64 tensors efficiently.

It's part of the work to enable graph capture in phi4 #25868

@qjia7 qjia7 requested review from fs-eire and guschmue July 31, 2025 09:51
@qjia7 qjia7 marked this pull request as draft July 31, 2025 09:58
@qjia7
Copy link
Contributor Author

qjia7 commented Jul 31, 2025

Sorry, please hold on the review. I may miss some situations.

@jywu-msft jywu-msft added the ep:WebGPU ort-web webgpu provider label Aug 2, 2025
This PR adds the int64 type to cast to avoid data readback to cpu for
some models during the execution.

It won't bring the perf regression since no matter the cast is from
int64 to int32 or int32 to int64, usually the other type is webgpu supported which
means that the previous op or the following op runs on gpu.
@qjia7 qjia7 marked this pull request as ready for review September 29, 2025 06:57
@qjia7
Copy link
Contributor Author

qjia7 commented Sep 30, 2025

@guschmue @fs-eire This PR is ready for review. Please take a look, thanks.
cc @sushraja-msft @xhcao

guschmue
guschmue previously approved these changes Oct 9, 2025
@xhcao
Copy link
Contributor

xhcao commented Oct 11, 2025

Hi, @qjia7 , is it not very reasonable that bind the usage of Cast(tensor_int64) and enable_graph_capture together, sometimes someone only want (or only can use) Cast(tensor_int64)?
Why not add an extra string list option in SessionOptions, for example add enable_cast_int64 into the string list if want Cast(tensor_int64), and query the string when register EP kernels? It also easily extend to enable_pad_int64, enable_gathernd_int64, etc.
It is my personal point. If other reviewers think the current code is fine, it is also ok for me.

@qjia7
Copy link
Contributor Author

qjia7 commented Oct 11, 2025

Hi, @qjia7 , is it not very reasonable that bind the usage of Cast(tensor_int64) and enable_graph_capture together, sometimes someone only want (or only can use) Cast(tensor_int64)? Why not add an extra string list option in SessionOptions, for example add enable_cast_int64 into the string list if want Cast(tensor_int64), and query the string when register EP kernels? It also easily extend to enable_pad_int64, enable_gathernd_int64, etc. It is my personal point. If other reviewers think the current code is fine, it is also ok for me.

Hi Xinghua, this change is target for graph capture and doesn't affect the existing logic. I understand other cases may also need the cast_int64 support in webgpu. But I'm unsure if a session option is the best solution since many other ops may also need such kind of changes. I also offline talked with @fs-eire. He is considering an alternative way to support int64 in GetCapability. Therefore, we need to consider more factors. I prefer to proceed with this PR to unblock progress. Even if the final solution involves adding a session option, it won't affect the main changes of this PR—only the dynamic registration conditions would need to be modified.

@qjia7 qjia7 requested a review from fs-eire October 11, 2025 09:56
@qjia7 qjia7 requested a review from fs-eire October 11, 2025 12:10
@fs-eire fs-eire merged commit f0015b9 into main Oct 15, 2025
95 of 100 checks passed
@fs-eire fs-eire deleted the cast_int64 branch October 15, 2025 08:52
fs-eire pushed a commit that referenced this pull request Oct 24, 2025
This pull request extends the WebGPU execution provider to support int64
data type casting in the `Cast` operator, with conditional support based
on whether graph capture is enabled. It refactors kernel registration to
allow toggling int64 support and updates the shader code and kernel
logic to handle int64 tensors efficiently.

It's part of the work to enable graph capture in phi4
#25868
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025
This pull request extends the WebGPU execution provider to support int64
data type casting in the `Cast` operator, with conditional support based
on whether graph capture is enabled. It refactors kernel registration to
allow toggling int64 support and updates the shader code and kernel
logic to handle int64 tensors efficiently.

It's part of the work to enable graph capture in phi4
microsoft#25868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants