Skip to content

Conversation

@JohannesGaessler
Copy link
Collaborator

As pointed out by #3110 (comment) , the recent PR #3110 has increased VRAM usage. The problem is that at some point I added a condition for using mul_mat_q over cuBLAS for debugging purposes and forgot to remove it again. This PR removes said condition which fixes the increased VRAM usage for the output tensor.

@JohannesGaessler JohannesGaessler merged commit 89e8959 into ggml-org:master Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants