顺利在 Apple silicon M3 上运行 README 中 Llama3-8B 相关示例工作流的小波折

1.  微调训练的时候开始就出现 b16 问题，直接改配置 yaml， 增加  `fp16: false` 一路无痛
2. 当合并 Lora 进行 Chat 推理时问题会稍微麻烦点
```
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
```
会有如下错误
```
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.gate_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.gate_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.up_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.up_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.down_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.down_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Traceback (most recent call last):
  File "/Users/mapix/miniconda/envs/llama-factory/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
    run_chat()
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 127, in run_chat
    chat_model = ChatModel()
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 43, in __init__
    self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 58, in __init__
    self.model = load_model(
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/loader.py", line 160, in load_model
    model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/adapter.py", line 301, in init_adapter
    model = _setup_lora_tuning(
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/adapter.py", line 191, in _setup_lora_tuning
    model: "LoraModel" = PeftModel.from_pretrained(model, adapter, **init_kwargs)
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 475, in from_pretrained
    model.load_adapter(
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 1076, in load_adapter
    self._update_offload(offload_index, adapters_weights)
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 957, in _update_offload
    safe_module = dict(self.named_modules())[extended_prefix]
KeyError: 'base_model.model.model.model.layers.10.input_layernorm'
```

2.1 首先很多这样的 UserWarning 要么直接全局禁 Warning，或者你的本地内存足够大的时候可以直接去掉 offline 内存的逻辑， 在配置 yaml 中添加 `low_cpu_mem_usage: false`，于是这部分 warning 消失。


2.2  至于这个 KeyError，不是很清楚不过根据这个出错看 model 这个单词连着拼了三遍，发现 dict 中都是两个，直接改源码绕过 peft/peft_model.py
```
                       #extended_prefix = prefix + block_id + safe_key[:suffix_pos] 
                       extended_prefix = prefix + safe_key[:suffix_pos]
```

2.3   进一步会出现 MPS 兼容问题，在命令行设置环境变量可以解决。

```
NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
```
最终命令
`PYTORCH_ENABLE_MPS_FALLBACK=1 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml`

不是太确定是否是因为缓存问题，当我想回头记录一下的时候会退代码并没有复现，先记录一下防止有人踩同样的坑。 

![image](https://github.com/hiyouga/LLaMA-Factory/assets/932699/e0902792-f19e-4738-b758-80b43f968070)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

顺利在 Apple silicon M3 上运行 README 中 Llama3-8B 相关示例工作流的小波折 #4341

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

顺利在 Apple silicon M3 上运行 README 中 Llama3-8B 相关示例工作流的小波折 #4341

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions