Skip to content

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Feb 5, 2025

Description

This pr applies DP4A to generation shader. It uses similar ideas compared with previous generation shader, but different data types.

In easy mode: NV 42 tokens/s -> 45 tokens/s, Meteor Lake 19 tokens/s -> 20 tokens/s

Without this PR
NV

Kernel Time (ms) Percentage (%)
MatMulNBits 14.28 71.49

Meteor Lake

Kernel Time (ms) Percentage (%)
MatMulNBits 34.74 79.93

With this PR
NV

Kernel Time (ms) Percentage (%)
MatMulNBits|DP4AMatMulNBitsSmallMProgram 11.24 71.38

Meteor Lake

Kernel Time (ms) Percentage (%)
MatMulNBits|DP4AMatMulNBitsSmallMProgram 31.71 76.94

@qjia7 qjia7 force-pushed the matmulnbits-generation branch from 6b5f15e to bcb37fe Compare February 8, 2025 02:21
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Feb 10, 2025
@qjia7
Copy link
Contributor Author

qjia7 commented Mar 17, 2025

Replaced by #24064

@qjia7 qjia7 closed this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants