You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @tloen , the issue should be addressed after this PR, can you please try and see if that solves the problem? Feel free to let us know if there are any more questions, thanks!
System Info
Who can help?
@kaiyux @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Inside
examples/run.py
, add a for loop to the generation.python run.py \ --max_output_len=50 \ --lookahead_config='[2,2,1]' \ --tokenizer_dir=[DIR] \ --engine_dir=[DIR]
Expected behavior
actual behavior
Nondeterminism and incorrect responses after first iteration.
additional notes
Model is Llama architecture.
max_draft_len is 107.
Error doesn't happen when number of verification branches is zero or window size is 1.
The text was updated successfully, but these errors were encountered: