Skip to content

Adding additional tokens to vocabulary #334

@adibMosharrof

Description

@adibMosharrof

I am using models like EleutherAI/gpt-j-6B and llama-7b-hf for text generation.
I have added special tokens to the vocabulary as I want a structured output.

Prompt
"<|begincontext|>I want to make a restaurant reservation for 2 people at half past 11 in the morning.<|endcontext|>",
Target
"<|begintarget|><|begindsts|><|begindst|><|beginintent|>FindRestaurants<|endintent|><|beginbelief|><|endbelief|><|enddst|><|enddsts|><|beginuseraction|>INFORM_INTENT->Restaurants^intent~FindRestaurants<|enduseraction|><|beginaction|>REQUEST->Restaurants^city~<|endaction|><|beginresponse|>Do you have a specific which you want the eating place to be located at?<|endresponse|><|endtarget|>"

I have an example Colab Notebook
https://colab.research.google.com/drive/16qKy92cGoNPWrlQ4zlvntVGeSgjrknVF?usp=sharing

I am able to train the model without any errors.
However, when I perform inference, it does not produce any structured output, it just produces some random generation.

Here is a sample generation
<|endintent|> I\'ll make the reservation for 6 o"clock in the evening, for two people. I\'ll make the reservation for 6 o"clock in the evening, for two people. I\'ll make the reservation for 6 o"clock in the evening, for two people.

In my original code, when I train on a lot of data and plot the train/eval loss I can see that the train/eval loss decreases to values, train_loss=0.2163, eval_loss = 0.2416. With such low loss values, I am surprised why the generation has absolutely no structure. With a GPT-2 model, training for a few steps with a small amount of data produces a structured output.

This issue #326 talks about additional tokens in the vocabulary, which is similar to what I want to do.

Can you please give me some pointers on where I am going wrong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions