Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 基于Paddle 3.0.0b1 使用PaddleNLP,执行llama 7B 报错,AttributeError: module 'paddle.base.libpaddle.eager.ops.legacy' has no attribute 'c_identity'. Did you mean: 'npu_identity'? #9211

Open
1 task done
shang-mt opened this issue Sep 27, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@shang-mt
Copy link

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 3.0.0b1
- paddlenlp: https://github.com/ZHUI/PaddleNLP/tree/sci/benchmark

重复问题

  • I have searched the existing issues

错误描述

Traceback (most recent call last):
  File "/workspace/PaddleNLP/llm/run_pretrain.py", line 597, in <module>
    main()
  File "/workspace/PaddleNLP/llm/run_pretrain.py", line 575, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 926, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 1950, in training_step
    return self.training_pipeline_step(model, inputs)
  File "/workspace/PaddleNLP/paddlenlp/trainer/trainer.py", line 2019, in training_pipeline_step
    loss = model.forward_backward_pipeline(inputs, self.scaler if self.do_grad_scaling else None)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py", line 543, in forward_backward_pipeline
    output_tensor = self._forward_step(input_tensor, micro_dataset)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py", line 810, in _forward_step
    output_tensor = self._layers.forward(input_tensor, chunk_id=chunk_id)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/parallel_layers/pp_layers.py", line 809, in forward
    input = self.forward_function(0, len(self.run_function))(input)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/meta_parallel/parallel_layers/pp_layers.py", line 785, in execute_func
    x = layer(x)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 1540, in forward
    logits = parallel_matmul(hidden_states, self.weight, tensor_parallel_output=tensor_parallel_output)
  File "/workspace/PaddleNLP/paddlenlp/transformers/llama/modeling.py", line 166, in parallel_matmul
    input_parallel = paddle.distributed.collective._c_identity(x, group=model_parallel_group)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/layers/mpu/mp_ops.py", line 100, in _c_identity
    return c_identity_eager.apply(tensor, group, skip_c_identity_dynamic)
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/fleet/layers/mpu/mp_ops.py", line 34, in forward
    return _legacy_C_ops.c_identity(
AttributeError: module 'paddle.base.libpaddle.eager.ops.legacy' has no attribute 'c_identity'. Did you mean: 'npu_identity'?

稳定复现步骤 & 代码

bash run_dist.sh

{
"model_name_or_path": "facebook/llama-7b",
"tokenizer_name_or_path": "facebook/llama-7b",
"input_dir": "/workspace",
"output_dir": "/root/llama-7b",
"per_device_train_batch_size": 2,
"gradient_accumulation_steps": 256,
"per_device_eval_batch_size": 64,
"tensor_parallel_degree": 2,
"pipeline_parallel_degree": 2,
"pipeline_parallel_config": "disable_partial_send_recv",
"sharding_parallel_degree": -1,
"virtual_pp_degree": 1,
"sharding": "stage1",
"sequence_parallel": 1,
"adam_beta1": 0.9,
"adam_beta2": 0.95,
"use_flash_attention": true,
"use_fused_rms_norm": true,
"use_fused_rope": true,
"max_seq_length": 2048,
"learning_rate": 1e-04,
"initializer_range": 0.002,
"min_learning_rate": 1e-05,
"warmup_steps": 2000,
"logging_steps": 1,
"max_steps": 200000,
"save_steps": 200000,
"eval_steps": 2000,
"weight_decay": 0.1,
"max_grad_norm": 1.0,
"amp_master_grad": 1,
"fp16": true,
"fp16_opt_level": "O2",
"dataloader_num_workers": 1,
"continue_training": 0,
"do_train": true,
"do_eval": true,
"do_predict": true,
"disable_tqdm": true,
"recompute": false,
"distributed_dataloader": 0,
"recompute_granularity": "full",
"save_total_limit": 2,
"eval_accumulation_steps": 16
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants