You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the interference has been working cleanly and without problems for a while, this suddenly appears:
[...] generator.iterate()
^^^^^^^^^^^^^^^^^^^
File "exllamav2/exl2_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "exllamav2/generator/dynamic.py", line 973, in iterate
self.iterate_gen(results)
File "exllamav2/generator/dynamic.py", line 1213, in iterate_gen
job.receive_logits(job_logits)
File "exllamav2/generator/dynamic.py", line 1817, in receive_logits
ExLlamaV2Sampler.sample(
File "exllamav2/generator/sampler.py", line 434, in sample
ExLlamaV2Sampler.apply_dry(settings, tokenizer, sequence_ids, logits)
File "exllamav2/generator/sampler.py", line 272, in apply_dry
logits.scatter_add_(-1, indices, penalties)
RuntimeError: index 1000000000 is out of bounds for dimension 2 with size 131072
131072 is my total context/cache size. What is going wrong here? Thanks a lot!
Reproduction steps
not easy, happens after some (heavy) usage.
Expected behavior
generator.iterate() should operate without error
Logs
No response
Additional context
No response
Acknowledgements
I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.
The text was updated successfully, but these errors were encountered:
I'm not able to reproduce this, but I have an idea as to why it happens. The index of 1000000000 suggests it's trying to penalize an image token. Is this happening with a vision model?
Turns out it's just a special case I hadn't considered. When the same image appears multiple times in a context and you're using DRY, the sampler tries to apply a penalty to image tokens, and since they're not represented in the logits you get an out-of-bounds error. Should be fixed in dev with the latest commit.
OS
Linux
GPU Library
CUDA 12.x
Python version
3.11
Pytorch version
2.5.1
Model
No response
Describe the bug
After the interference has been working cleanly and without problems for a while, this suddenly appears:
131072 is my total context/cache size. What is going wrong here? Thanks a lot!
Reproduction steps
not easy, happens after some (heavy) usage.
Expected behavior
generator.iterate() should operate without error
Logs
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: