-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I am using models like EleutherAI/gpt-j-6B and llama-7b-hf for text generation.
I have added special tokens to the vocabulary as I want a structured output.
Prompt
"<|begincontext|>I want to make a restaurant reservation for 2 people at half past 11 in the morning.<|endcontext|>",
Target
"<|begintarget|><|begindsts|><|begindst|><|beginintent|>FindRestaurants<|endintent|><|beginbelief|><|endbelief|><|enddst|><|enddsts|><|beginuseraction|>INFORM_INTENT->Restaurants^intent~FindRestaurants<|enduseraction|><|beginaction|>REQUEST->Restaurants^city~<|endaction|><|beginresponse|>Do you have a specific which you want the eating place to be located at?<|endresponse|><|endtarget|>"
I have an example Colab Notebook
https://colab.research.google.com/drive/16qKy92cGoNPWrlQ4zlvntVGeSgjrknVF?usp=sharing
I am able to train the model without any errors.
However, when I perform inference, it does not produce any structured output, it just produces some random generation.
Here is a sample generation
<|endintent|> I\'ll make the reservation for 6 o"clock in the evening, for two people. I\'ll make the reservation for 6 o"clock in the evening, for two people. I\'ll make the reservation for 6 o"clock in the evening, for two people.
In my original code, when I train on a lot of data and plot the train/eval loss I can see that the train/eval loss decreases to values, train_loss=0.2163, eval_loss = 0.2416. With such low loss values, I am surprised why the generation has absolutely no structure. With a GPT-2 model, training for a few steps with a small amount of data produces a structured output.
This issue #326 talks about additional tokens in the vocabulary, which is similar to what I want to do.
Can you please give me some pointers on where I am going wrong.