-
Notifications
You must be signed in to change notification settings - Fork 346
Description
hello, I would be grateful if someone answer this question clearly:
Can dialogpt finetuned on model other than GPT-2, if so, how?.
I tried to finetune this model to GPT-J, as I changed the LSP_train.py
line 195 from
model = load_model(GPT2LMHeadModel(config), args.init_checkpoint, args, verbose=True)
to
model = load_model(GPTJForCausalLM.from_pretrained('EleutherAI/gpt-j-6B),args.init_checkpoint, args,verbose=True)
but get this error:
File "LSP_train.py", line 287, in <module> loss, ppl = model(input_ids, position_ids, token_ids, label_ids) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 832, in forward return_dict=return_dict, File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 589, in forward past_length = past_key_values[0][0].size(-2) IndexError: dimension specified as -2 but tensor has no dimensions
The script above get an error when I'm using either GPU or CPU, but it's working fine on gpt-2 model.
Would appreciate any help!