Qwen2VL+Lora微调后合并模型丢失文件bug #5749

gxlover0625 · 2024-10-19T03:51:43Z

Reminder

I have read the README and searched the existing issues.

System Info

[2024-10-19 11:31:58,444] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)

llamafactory version: 0.9.1.dev0
Platform: Linux-5.10.134-010.ali5000.al8.x86_64-x86_64-with-glibc2.32
Python version: 3.10.15
PyTorch version: 2.3.1+cu118 (GPU)
Transformers version: 4.45.2
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA H20
DeepSpeed version: 0.15.2

Reproduction

按照官网执行Qwen2VL+Lora微调代码

llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml
llamafactory-cli export examples/merge_lora/qwen2vl_lora_sft.yaml

其中的qwen2vl_lora_sft.yaml如下

### model
model_name_or_path: /home/admin/workspace/aop_lab/llm/qwen/Qwen2-VL-7B-Instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: paco_all  # video: mllm_video_demo
template: qwen2_vl
cutoff_len: 1024
max_samples: 10000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /home/admin/workspace/aop_lab/sft_results/paco_all/qwen2_vl-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 8
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

qwen2vl_lora_sft.yaml如下

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters

### model
model_name_or_path: /home/admin/workspace/aop_lab/llm/qwen/Qwen2-VL-7B-Instruct
adapter_name_or_path: /home/admin/workspace/aop_lab/sft_results/paco_all/qwen2_vl-7b/lora/sft
template: qwen2_vl
finetuning_type: lora

### export
export_dir: /home/admin/workspace/aop_lab/save_models/paco_all/qwen2_vl_lora_sft
export_size: 2
export_device: cpu
export_legacy_format: false

Expected behavior

Bug

微调过程没有出错，在合并lora的时候也没有报错，但是观察原始文件和微调模型文件发现丢失chat_template.json，具体如下

这是原始模型文件夹
这是微调后模型文件夹

经过两张图的对比，发现合并后的文件夹中丢失了以下文件
chat_template.json
configuration.json
多了以下文件
added_tokens.json
speicial_tokens_map.json

其中的chat_template.json是很重要的文件，模型缺失了这个就无法推理报错
官方推理的推理代码

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
model_dir = snapshot_download("qwen/Qwen2-VL-7B-Instruct")

model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_dir, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_dir)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

报错如下

ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information

Others

额外信息

Qwen2VL是7B模型，在modelscope官网下的，使用的是以下代码

from modelscope import snapshot_download
model_dir = snapshot_download("qwen/Qwen2-VL-7B-Instruct")

The text was updated successfully, but these errors were encountered:

gxlover0625 · 2024-10-19T03:55:13Z

哪怕我把chat_template.json复制到了微调后模型后依然报错，怀疑是整个Processor保存有问题，只能通过processor.chat_template=xx强行设定才行

aliencaocao · 2024-10-20T11:59:32Z

同样问题。现在只能用预训练的qwen/Qwen2-VL-7B-Instruct的processor，然后weight用自己的local file

gxlover0625 · 2024-10-20T12:43:06Z

同样问题。现在只能用预训练的qwen/Qwen2-VL-7B-Instruct的processor，然后weight用自己的local file

我现在用的方法如下，运行后看起来没问题，不过不知道对不对，是否正确加载了微调后的权重。

processor = AutoProcessor.from_pretrained("sft_dir")
processor.chat_template = "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"

aliencaocao · 2024-10-20T13:12:19Z

这样也可以，你这个和权重没关系，权重是用automodel加载

gxlover0625 · 2024-10-20T13:15:47Z

这样也可以，你这个和权重没关系，权重是用automodel加载

好的感谢

aliencaocao · 2024-10-21T01:32:14Z

感觉你还是open比较好，这个是一个明显的bug

Wiselnn570 · 2024-10-23T05:37:52Z

same issue

Syazvinski · 2024-10-30T07:05:39Z

same issue

gxlover0625 · 2024-11-08T17:00:36Z

same issue

It has been fixed，and i have try new version of llamafactory for two-stage sft and there is no problems

PangziZhang523 · 2024-11-12T06:07:06Z

用lora训练完的模型，merge以后用官方代码推理：processor = AutoProcessor.from_pretrained(model_path) 还是报错Exception: data did not match any variant of untagged enum ModelWrapper at line 757371 column 3

gxlover0625 · 2024-11-12T06:13:23Z

用lora训练完的模型，merge以后用官方代码推理：processor = AutoProcessor.from_pretrained(model_path) 还是报错Exception: data did not match any variant of untagged enum ModelWrapper at line 757371 column 3

我也碰到过这个问题，后来按照qwen2vl官网的安装引导重新开了一个环境，好像就没这个问题了。我的transformers版本是4.45.0，不知道能不能解决您的问题

PangziZhang523 · 2024-11-12T06:40:52Z

感谢回复，我的transformers版本也是4.45.0.dev0，用官方代码测试官方的模型正常推理，把模型换成lora微调后的模型就出现这个问题了，看报错是这句processor = AutoProcessor.from_pretrained(model_path)出现的问题。同时看合并以后的文件是有chat_template.json的

PangziZhang523 · 2024-11-12T09:39:05Z

llamafactory-cli webchat examples/inference/qwen2_vl.yaml是正常的吗？为什么我的会是这样的页面：

binhoul · 2024-11-18T08:15:45Z

用lora训练完的模型，merge以后用官方代码推理：processor = AutoProcessor.from_pretrained(model_path) 还是报错Exception: data did not match any variant of untagged enum ModelWrapper at line 757371 column 3

请问这个解决了吗？

github-actions bot added the pending This problem is yet to be addressed label Oct 19, 2024

gxlover0625 closed this as completed Oct 21, 2024

gxlover0625 reopened this Oct 21, 2024

hiyouga added a commit that referenced this issue Oct 29, 2024

fix #5749

23dbe9a

hiyouga mentioned this issue Oct 29, 2024

[train] fix saving processor #5857

Merged

2 tasks

hiyouga closed this as completed in #5857 Oct 29, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2VL+Lora微调后合并模型丢失文件bug #5749

Qwen2VL+Lora微调后合并模型丢失文件bug #5749

gxlover0625 commented Oct 19, 2024

gxlover0625 commented Oct 19, 2024

aliencaocao commented Oct 20, 2024

gxlover0625 commented Oct 20, 2024

aliencaocao commented Oct 20, 2024

gxlover0625 commented Oct 20, 2024

aliencaocao commented Oct 21, 2024

Wiselnn570 commented Oct 23, 2024

Syazvinski commented Oct 30, 2024

gxlover0625 commented Nov 8, 2024

PangziZhang523 commented Nov 12, 2024

gxlover0625 commented Nov 12, 2024

PangziZhang523 commented Nov 12, 2024

PangziZhang523 commented Nov 12, 2024

binhoul commented Nov 18, 2024

Qwen2VL+Lora微调后合并模型丢失文件bug #5749

Qwen2VL+Lora微调后合并模型丢失文件bug #5749

Comments

gxlover0625 commented Oct 19, 2024

Reminder

System Info

Reproduction

Expected behavior

Bug

Others

额外信息

gxlover0625 commented Oct 19, 2024

aliencaocao commented Oct 20, 2024

gxlover0625 commented Oct 20, 2024

aliencaocao commented Oct 20, 2024

gxlover0625 commented Oct 20, 2024

aliencaocao commented Oct 21, 2024

Wiselnn570 commented Oct 23, 2024

Syazvinski commented Oct 30, 2024

gxlover0625 commented Nov 8, 2024

PangziZhang523 commented Nov 12, 2024

gxlover0625 commented Nov 12, 2024

PangziZhang523 commented Nov 12, 2024

PangziZhang523 commented Nov 12, 2024

binhoul commented Nov 18, 2024