Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when converting sequential model to HF #1323

Open
SilverSulfide opened this issue Nov 20, 2024 · 1 comment
Open

Error when converting sequential model to HF #1323

SilverSulfide opened this issue Nov 20, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@SilverSulfide
Copy link

SilverSulfide commented Nov 20, 2024

I am trying to convert a locally trained gpt-neox model to huggingface and I encounter the following error:

Detected MLP naming convention: new 0%| | 0/16 [00:00<?, ?it/s] Traceback (most recent call last): File "gpt-neox/tools/ckpts/convert_neox_to_hf.py", line 906, in <module> main() File "gpt-neox/tools/ckpts/convert_neox_to_hf.py", line 856, in main hf_model = convert( File "gpt-neox/tools/ckpts/convert_neox_to_hf.py", line 609, in convert get_state( File "gpt-neox/tools/ckpts/convert_neox_to_hf.py", line 198, in get_state return [state_dict["module"][key] for state_dict in state_dicts] File "gpt-neox/tools/ckpts/convert_neox_to_hf.py", line 198, in <listcomp> return [state_dict["module"][key] for state_dict in state_dicts] KeyError: 'sequential.2.input_layernorm.weight'

The relevant config params are as follows:

{
"tokenizer_type": "SPMTokenizer",
"vocab_file": "./model.model",

"num_layers": 16,
"hidden_size": 1024,
"intermediate_size": 4096,
"num_attention_heads": 16,
"seq_length": 256,

"init_method": "small_init",
"output_layer_init_method": "wang_init",
"no_weight_tying": true,

"activation": "gelu",
"attention_config": [[["global"], 16]],

"pos_emb": "rotary",
"max_position_embeddings": 256,

"train_micro_batch_size_per_gpu": 64,
"gradient_accumulation_steps": 1,
"num_nodes": 1,
"train_iters": 100000,
"lr_decay_style": "cosine",
"lr_decay_iters": 38000,
"warmup": 0.05,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.0001,
"betas": [0.9, 0.95],
"eps": 1.0e-8,
}
},
"deepspeed": true,

"weight_decay": 0.1,
"norm": "rms",
"rms_norm_epsilon": 0.01,
#"finetune": true,

"bf16": {
"enabled": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"precision": "bfloat16",
"fp32_allreduce": true,
"distributed_backend": "nccl",

"pipe_parallel_size": 0,
"model_parallel_size": 1,

"log_dir": "logs",
"log_interval": 1,
"tensorboard_dir": "test_neo",
}

Any help would be appreciated!

@SilverSulfide SilverSulfide added the bug Something isn't working label Nov 20, 2024
@iPRET
Copy link

iPRET commented Dec 18, 2024

Disclosure: I'm not a maintainer.
I think the problem might be that the models on the huggingface side are very limited in what they can do. if you're trying to convert to GPTNeoXForCausalLM, I think the problem is that it doesn't support RMSNorm, and expects you to use layernorm instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants