-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enc_dec model results are not aligned with the HF model #612
Comments
Hi @NSun-S , thanks for posting this. Conclusion first: this is normal, and it's likely a HF/PyTorch gemm problem. I was obsessed by the same tiny numerical difference issue during my development of enc-dec too. You're checking the |
@symphonylyh Thank you very much!!! Your reply has answered the questions that have troubled me. |
A little more explanation on this numerical analysis: Key takeaway from this is: we should better evaluate on real downstream tasks and see whether & how much such numerical difference affects the output quality, rather than pursuing exact match of logit values. Of course, sometimes it's not easy to conclude whether it's implementation bug or numerical deviation, but so far from our analysis and user feedback we think it's not from implementation bug in TRT-LLM's encoder-decoder models. |
@NSun-S , for Flan-T5 specifically, I found a fix that should be very relevant to your observation above: see #474 (comment). Please apply the fix locally and see if it solves your problem. The above discussion regarding numerical difference still holds true. But I guess this fix may solve the issue you described. |
@symphonylyh Thanks for your support!! We have tried this modification mentioned above and the output will be different somewhat. However, the difference caused by this numerical issue still exists and has not completely disappeared. I am willing to try any suggestions, and if there are any new findings, I will also synchronize them in the issue. |
@NSun-S just to confirm, you have commented out the |
@symphonylyh Thank you for your reminder. I confirm that I have made the corresponding modifications. I also noticed the difference between Flan-T5 and T5 when locating the issue. At least from some examples, it can be seen that this modification has not fundamentally solved this problem(as mentioned earlier, differences exist from the encoder and continue to accumulate). Anyway, the original Flan-T5 did not use |
I tried to use the trt-llm v0.6.1 to optimize the enc-dec model (Flan-T5-small). During usage, I observed that it could not be aligned with the huggingface model under fp32 precision(as well as bf16). The following is the process to reproduce this phenomenon:
The prompt word I used is the one you provided "translate English to German: The house is wonderful, radiating timeless charm and offering a warm, inviting interior with beautiful details and a serene backyard."
I observed some slight anomalies in the encoder output as following:
Although the output of the HF model is terrible, from my perspective the output of trt-llm should be perfectly aligned with the HF model. The difference in output will be more pronounced in scenarios such as beam search and batch input.
Is this normal, or is there a known precision problem? I have read related issues, but none of them can solve my problem. Could you awesome guys give me some suggestions and possible solutions to this issue?
The text was updated successfully, but these errors were encountered: