-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480
Comments
I am getting the same issue when trying speculative decoding (medusa) with vicuna, after some inference, it is getting buffer size exceeds 2560 |
Encountered an issue while using speculative decoding: '[TensorRT LM] [ERROR] Encountered an error in forward function: slice 501760 excesses buffer size 250880', 0.9.0 dev20240222000 is normal |
Hi, thanks for reporting this issue. I haven't been able to reproduce on latest |
I also just tested on 2xA30 and cannot reproduce using latest
|
hi, this issue is reproduced by using I built it using max_batch = 24. |
System Info
GPU A30 * 2
TensorRT-LLM version: v0.9.0
Model: vicuna 13B
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
No error message.
actual behavior
additional notes
no
The text was updated successfully, but these errors were encountered: