Finetune on model other than GPT-2

hello, I would be grateful if someone answer this question clearly:
**Can dialogpt finetuned on model other than GPT-2, if so, how?**.
I tried to finetune this model to GPT-J, as I changed the `LSP_train.py` line 195 from 
`model = load_model(GPT2LMHeadModel(config), args.init_checkpoint,
                   args, verbose=True)` 
to 
`model = load_model(GPTJForCausalLM.from_pretrained('EleutherAI/gpt-j-6B),args.init_checkpoint,
                    args,verbose=True)`
 but get this error:
`
  File "LSP_train.py", line 287, in <module>
    loss, ppl = model(input_ids, position_ids, token_ids, label_ids)
  File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 832, in forward
    return_dict=return_dict,
  File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 589, in forward
    past_length = past_key_values[0][0].size(-2)
IndexError: dimension specified as -2 but tensor has no dimensions
`

The script above get an error when I'm using either GPU or CPU, but it's working fine on gpt-2 model. 
Would appreciate any help!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetune on model other than GPT-2 #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetune on model other than GPT-2 #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions