Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are the three linear layers--q,k,v , merged in TensorRT-LLM for llama model?( potential bad influence on quantization accuracy) #964

Closed
brisker opened this issue Jan 25, 2024 · 5 comments
Assignees
Labels
stale triaged Issue has been triaged by maintainers

Comments

@brisker
Copy link

brisker commented Jan 25, 2024

here:
https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/hf_llama_convert.py#L37

https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/convert.py#L88 (the same output quantization scale for q,k,v)

It seems that the qkv ,three linear layers' quantization scales are merged and shared, but this seems to have potential bad influence on quantization model accuracy.

Will this be fixed in the future?

@brisker brisker changed the title Are the three linear layers--q,k,v , merged in TensorRT-LLM for llama model? Are the three linear layers--q,k,v , merged in TensorRT-LLM for llama model?( potential bad influence on quantization accuracy) Jan 25, 2024
@byshiue
Copy link
Collaborator

byshiue commented Jan 26, 2024

From our experiment, it could keep good accuracy even if we merge the scales. Do you encounter any accuracy issue?

@byshiue byshiue self-assigned this Jan 26, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label Jan 26, 2024
@brisker
Copy link
Author

brisker commented Jan 26, 2024

@byshiue
yes, the accuracy drops too much. The details can be found here:

#967

@brisker
Copy link
Author

brisker commented Jan 29, 2024

@byshiue
It seems that llama2 quantization(not matter smoothquant or kv-cache quantization) has some bugs, which causes consistent bad accuracy ,and qkv scale merged or not may not be the main reason (but still a reason).

@byshiue
Copy link
Collaborator

byshiue commented Jan 30, 2024

I think we could discuss this issue in #967, do you agree?

@nv-guomingz
Copy link
Collaborator

Hi @brisker do u still have further issue or question now? If not, we'll close it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants