Skip to content

顺利在 Apple silicon M3 上运行 README 中 Llama3-8B 相关示例工作流的小波折 #4341

@mapix

Description

@mapix
  1. 微调训练的时候开始就出现 b16 问题,直接改配置 yaml, 增加 fp16: false 一路无痛
  2. 当合并 Lora 进行 Chat 推理时问题会稍微麻烦点
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

会有如下错误

/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.gate_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.gate_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.up_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.up_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.down_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/torch/nn/modules/module.py:2026: UserWarning: for base_model.model.model.layers.31.mlp.down_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Traceback (most recent call last):
  File "/Users/mapix/miniconda/envs/llama-factory/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
    run_chat()
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 127, in run_chat
    chat_model = ChatModel()
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 43, in __init__
    self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 58, in __init__
    self.model = load_model(
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/loader.py", line 160, in load_model
    model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/adapter.py", line 301, in init_adapter
    model = _setup_lora_tuning(
  File "/Users/mapix/workspace/LLaMA-Factory/src/llamafactory/model/adapter.py", line 191, in _setup_lora_tuning
    model: "LoraModel" = PeftModel.from_pretrained(model, adapter, **init_kwargs)
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 475, in from_pretrained
    model.load_adapter(
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 1076, in load_adapter
    self._update_offload(offload_index, adapters_weights)
  File "/Users/mapix/miniconda/envs/llama-factory/lib/python3.10/site-packages/peft/peft_model.py", line 957, in _update_offload
    safe_module = dict(self.named_modules())[extended_prefix]
KeyError: 'base_model.model.model.model.layers.10.input_layernorm'

2.1 首先很多这样的 UserWarning 要么直接全局禁 Warning,或者你的本地内存足够大的时候可以直接去掉 offline 内存的逻辑, 在配置 yaml 中添加 low_cpu_mem_usage: false,于是这部分 warning 消失。

2.2 至于这个 KeyError,不是很清楚不过根据这个出错看 model 这个单词连着拼了三遍,发现 dict 中都是两个,直接改源码绕过 peft/peft_model.py

                       #extended_prefix = prefix + block_id + safe_key[:suffix_pos] 
                       extended_prefix = prefix + safe_key[:suffix_pos]

2.3 进一步会出现 MPS 兼容问题,在命令行设置环境变量可以解决。

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

最终命令
PYTORCH_ENABLE_MPS_FALLBACK=1 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

不是太确定是否是因为缓存问题,当我想回头记录一下的时候会退代码并没有复现,先记录一下防止有人踩同样的坑。

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueGood for newcomerssolvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions