Proposed in https://github.com/Lightning-AI/lit-llama/pull/255 The only difference in logic is the instruction tuning. We could add a flag for it as in https://github.com/Lightning-AI/lit-llama/pull/278