vulkan: skip all-negative-inf blocks in FA#17186
Merged
0cc4m merged 1 commit intoggml-org:masterfrom Nov 15, 2025
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #17033. See #17033 (comment).
Overhead for this check generally seems low enough. Perf for just one parallel prompt in llama-batched-bench is a little lower for coopmat2 mode, but I think it's OK. The scalar path is slower due to increased register usage decreasing occupancy. I've filed an internal bug about that, but I'm not too worried about it since I think it only affects Blackwell and the scalar path isn't used for prompt processing on NVIDIA. Would be good to spot check perf on other HW.