-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantized model is not working properly when CUBLAS is ON #1661
Comments
At the same time we created :) |
Tried syncing the latest ggml from llama.cpp, but no luck. Both talk-llama and whisper are behaving identically.
I've found that when I use main in llama.cpp, |
The output logits of quantized models are near zero which is not correct. I dumped the |
Do you know how to fix this bug? @slaren @ggerganov |
I find it strange, if the error occurred during the |
Okay, I've tested all the quantization modes, and none of them work properly on CUDA. I wonder if our quantization method has changed? All |
As far as I can tell, quantized models work correctly both with the gpt-2 example in ggml and in llama.cpp. I am not sure what is different in whisper.cpp. |
Can you test changing the if (GGML_CUDA_SOURCES)
message(STATUS "GGML CUDA sources found")
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
# Only configure gmml CUDA architectures is not globally set
if (NOT DEFINED GGML_CUDA_ARCHITECTURES)
# Not overriden by user, so set defaults
set(GGML_CUDA_ARCHITECTURES 52 61 70)
endif()
message(STATUS "GGML Configuring CUDA architectures ${GGML_CUDA_ARCHITECTURES}")
set_property(TARGET ggml PROPERTY CUDA_ARCHITECTURES ${GGML_CUDA_ARCHITECTURES})
endif()
set_property(TARGET ggml PROPERTY CUDA_SELECT_NVCC_ARCH_FLAGS "Auto")
if (NOT MSVC)
target_link_libraries(ggml PUBLIC stdc++)
endif()
endif() Maybe only the part that sets |
Is this the section you're referring to? Lines 522 to 526 in 9286d3f
|
Yes. Replace the first # Only configure gmml CUDA architectures is not globally set
if (NOT DEFINED GGML_CUDA_ARCHITECTURES)
# Not overriden by user, so set defaults
set(GGML_CUDA_ARCHITECTURES 52 61 70)
endif()
message(STATUS "GGML Configuring CUDA architectures ${GGML_CUDA_ARCHITECTURES}")
set_property(TARGET whisper PROPERTY CUDA_ARCHITECTURES ${GGML_CUDA_ARCHITECTURES}) |
Wow, it works! |
Discussed in #1656
Originally posted by Sing303 December 19, 2023
Now when I try to use quantization models with FULL GPU CUBLAS, instead of recognizing text it writes nonsense, all kinds of signs instead of words.
The text was updated successfully, but these errors were encountered: