Skip to content

Finetune on model other than GPT-2 #87

@raihan0824

Description

@raihan0824

hello, I would be grateful if someone answer this question clearly:
Can dialogpt finetuned on model other than GPT-2, if so, how?.
I tried to finetune this model to GPT-J, as I changed the LSP_train.py line 195 from
model = load_model(GPT2LMHeadModel(config), args.init_checkpoint, args, verbose=True)
to
model = load_model(GPTJForCausalLM.from_pretrained('EleutherAI/gpt-j-6B),args.init_checkpoint, args,verbose=True)
but get this error:
File "LSP_train.py", line 287, in <module> loss, ppl = model(input_ids, position_ids, token_ids, label_ids) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 832, in forward return_dict=return_dict, File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/dialogpt/lib/python3.7/site-packages/transformers/models/gptj/modeling_gptj.py", line 589, in forward past_length = past_key_values[0][0].size(-2) IndexError: dimension specified as -2 but tensor has no dimensions

The script above get an error when I'm using either GPU or CPU, but it's working fine on gpt-2 model.
Would appreciate any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions