Hello. First of all, thank you for your wonderful work.
I was curious if adding locally typical sampling was on the roadmap.
If not, would you be interested in a PR adding the functionality?
I think adding typical_p will be trivial, but wanted to ask first before working on it.
Personally, I'm having great success with typical_p in transformers and text-generation-inference.
I want to switch to vLLM but I can't go back to top_p.