-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have trouble in using flan-t5-xxl #474
Comments
@Vergissmeinicht we're investigating an issue in #500 and it's probably related. Did you observe the same incorrect pattern that only the 1st token is correct? The error might be hidden in t5-small case so didn't get caught earlier. |
Recommend to try the v0.6.1 latest release: https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.6.1 since there are several related bug fixes. Mark as closed for now. Please don't hesitate to reopen it if you still encounter the issue |
Problem solved! Many thanks. |
@Vergissmeinicht Nice to hear! |
@Vergissmeinicht important note: I found a tiny but important fix for Flan-T5. Update: already merged in latest |
@symphonylyh Thanks for the heads up! Just to confirm, this change only needs to happen when we perform inference ie https://github.com/NVIDIA/TensorRT-LLM/blob/rel/examples/enc_dec/run.py ? Do we need to rebuild the engines or image? |
@shannonphu Yes, you need to rebuild the engine. Because the change is in Also update: the above fix has been merged in latest |
Been trying to use use flan-t5-xxl like enc_dec example but failed to get correct output from trt inference. The T5 (t5-small) and Flan-T5 (google/flan-t5-small) examples prove to be getting correct output. So i wonder whether flan-t5-xxl is supported or there's something with my building&running code. Hope to get your response. Many thanks.
The text was updated successfully, but these errors were encountered: