Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the training loss. #2

Open
tianshuocong opened this issue May 29, 2023 · 3 comments
Open

About the training loss. #2

tianshuocong opened this issue May 29, 2023 · 3 comments
Labels
question Further information is requested

Comments

@tianshuocong
Copy link

Hi Yuxin!

Thanks for your great work!

When reading the paper, I am confused about the training loss of the student model. The paper said "we fine-tune our student model S by minimizing the cross-entropy loss." So, how to use the CE loss to fine-tune the model, and where is the code implementation for this part, thank you very much!

Best wishes!

@YJiangcm
Copy link
Owner

Thank you for your interest in our work.

In our research, we utilize the autoregressive language modeling objective to train the student model. This involves using the teacher model's responses to a set of instructions as the target for the student model. Since the language modeling objective is actually the cross-entropy loss, we refer to this objective as "cross-entropy loss" in our paper. We apologize for any ambiguity caused by this terminology. The primary goal of training the student model is to align its responses with those of the teacher model.

The code is implemented in src/train.py, which follows a similar approach to instruction tuning in Stanford Alpaca.

I hope this addresses your concerns. If you have any further questions, please let me know.

@YJiangcm YJiangcm added the question Further information is requested label May 29, 2023
@tianshuocong
Copy link
Author

Hi!

Thanks for your prompt response!

Actually, I am still confused about the loss. Firstly, in which line of the file src/train.py you define the loss. Secondly, if the teacher model and the student model both output text, then how to use cross-entropy to calculate the loss?

Thank you very much!

@YJiangcm
Copy link
Owner

From lines 116 to 142 in src/train.py, we define the dataset used for training, which contains the input as well as the label (target). The training loss is inner defined in transformers.AutoModelForCausalLM, and will be automatically computed when we call transformers.Trainer.train(). You may check the related documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants