Caching for generation #95

murbard · 2022-12-27T21:34:56Z

Currently, generation is done by recomputing every activation after a token is added to the prompt. Normally, one would want to cache the intermediate activations to avoid recomputing them every time. It doesn't compose as well with using the forward function, but that's precisely why a clean and simple implementation should be a part of minGPT. It's very surprising that this is not afforded by pytorch's native TransformerEncoder module either.

karpathy · 2022-12-27T22:19:57Z

agree, a good todo item

dfyz mentioned this issue Jan 22, 2023

Cache the KV projection history when generating karpathy/nanoGPT#76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching for generation #95

Caching for generation #95

murbard commented Dec 27, 2022

karpathy commented Dec 27, 2022

Caching for generation #95

Caching for generation #95

Comments

murbard commented Dec 27, 2022

karpathy commented Dec 27, 2022