Implementation Details about the Student Model #8

hzf1174 · 2023-11-18T15:39:37Z

Hi Yuxin,

Thank you for your great work! In your paper you mentioned your method conducts 3 iterations to train, and in each iteration, you train the student model for 3 epochs using an AdamW optimizer with learning rate = 2e-5. I would like to clarify that, in each iteration for the student model, did you start with the same pre-trained LLaMA model, or start with the model trained in the last iteration? Thank you for your clarification!

YJiangcm · 2023-11-20T01:33:11Z

Thanks for your interest in our work. In each iteration for the student model, we start with the model trained in the last iteration.

hzf1174 changed the title ~~Implementation Details about the student model~~ Implementation Details about the Student Model Nov 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Details about the Student Model #8

Implementation Details about the Student Model #8

hzf1174 commented Nov 18, 2023

YJiangcm commented Nov 20, 2023

Implementation Details about the Student Model #8

Implementation Details about the Student Model #8

Comments

hzf1174 commented Nov 18, 2023

YJiangcm commented Nov 20, 2023