Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have trouble in using flan-t5-xxl #474

Closed
Vergissmeinicht opened this issue Nov 27, 2023 · 7 comments
Closed

Have trouble in using flan-t5-xxl #474

Vergissmeinicht opened this issue Nov 27, 2023 · 7 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@Vergissmeinicht
Copy link

Been trying to use use flan-t5-xxl like enc_dec example but failed to get correct output from trt inference. The T5 (t5-small) and Flan-T5 (google/flan-t5-small) examples prove to be getting correct output. So i wonder whether flan-t5-xxl is supported or there's something with my building&running code. Hope to get your response. Many thanks.

@jdemouth-nvidia jdemouth-nvidia added the triaged Issue has been triaged by maintainers label Nov 27, 2023
@symphonylyh
Copy link
Collaborator

@Vergissmeinicht we're investigating an issue in #500 and it's probably related. Did you observe the same incorrect pattern that only the 1st token is correct? The error might be hidden in t5-small case so didn't get caught earlier.

@symphonylyh
Copy link
Collaborator

Recommend to try the v0.6.1 latest release: https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.6.1 since there are several related bug fixes. Mark as closed for now. Please don't hesitate to reopen it if you still encounter the issue

@Vergissmeinicht
Copy link
Author

Recommend to try the v0.6.1 latest release: https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.6.1 since there are several related bug fixes. Mark as closed for now. Please don't hesitate to reopen it if you still encounter the issue

Problem solved! Many thanks.

@symphonylyh
Copy link
Collaborator

@Vergissmeinicht Nice to hear!

@symphonylyh
Copy link
Collaborator

symphonylyh commented Dec 13, 2023

@Vergissmeinicht important note: I found a tiny but important fix for Flan-T5.
As a temporary fix, please manually comment out these lines when you run Flan-T5 (but NOT T5!) — basically T5 has this rescaling while Flan-T5 doesn’t have, but currently v0.6.1 implementation will always fall into the default=True path.
This will be fixed soon in a week in v0.7 release and/or weekly dev release, but if you have Flan-T5 on the fly it would be good to apply the local fix first. Thanks

Update: already merged in latest main. Will be merged in v0.7 release soon!

@shannonphu
Copy link

@symphonylyh Thanks for the heads up! Just to confirm, this change only needs to happen when we perform inference ie https://github.com/NVIDIA/TensorRT-LLM/blob/rel/examples/enc_dec/run.py ? Do we need to rebuild the engines or image?

@symphonylyh
Copy link
Collaborator

@shannonphu Yes, you need to rebuild the engine. Because the change is in model.py, which directly maps to what the engine is doing, engines have to be rebuilt. If a change is made in run.py without touching model.py, usually the engines can be reused.

Also update: the above fix has been merged in latest main and soon in v0.7. You can just update to main, rebuilt engines, and everything should work fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants