I found out this problem because I tried to let model help me read the [arxiv paper](https://arxiv.org/pdf/2307.08621.pdf)! (Linux 5.19 Ubuntu, 16k fineturn model + rope scaling)