-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
Package Version Editable project location
accelerate 1.7.0
aiofiles 24.1.0
aiohappyeyeballs 2.6.1
aiohttp 3.12.15
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.10.0
attrs 25.3.0
audioread 3.0.1
av 15.1.0
bitsandbytes 0.47.0
brotli 1.1.0
certifi 2025.8.3
cffi 2.0.0b1
charset-normalizer 3.4.3
click 8.2.1
contourpy 1.3.3
cut-cross-entropy 25.1.1
cycler 0.12.1
datasets 3.6.0
decorator 5.2.1
diffusers 0.35.1
dill 0.3.8
docstring-parser 0.17.0
einops 0.8.1
fastapi 0.116.1
ffmpy 0.6.1
filelock 3.19.1
fire 0.7.1
fonttools 4.59.2
frozenlist 1.7.0
fsspec 2025.9.0
gradio 5.42.0
gradio-client 1.11.1
groovy 0.1.2
h11 0.16.0
hf-transfer 0.1.9
hf-xet 1.1.9
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.35.0rc0
idna 3.10
importlib-metadata 8.7.0
jieba 0.42.1
jinja2 3.1.6
joblib 1.5.2
kiwisolver 1.4.10rc0
lazy-loader 0.4
librosa 0.11.0
llamafactory 0.9.4.dev0 /data/works/LLaMA-Factory
llvmlite 0.45.0rc1
markdown-it-py 4.0.0
markupsafe 3.0.2
matplotlib 3.10.6
mdurl 0.1.2
modelscope 1.29.1
mpmath 1.3.0
msgpack 1.1.1
msgspec 0.19.0
multidict 6.6.4
multiprocess 0.70.16
networkx 3.5
nltk 3.9.1
numba 0.62.0rc1
numpy 2.3.3
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.4.1.4
nvidia-cufile-cu12 1.14.1.1
nvidia-curand-cu12 10.3.10.19
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-cusparselt-cu12 0.7.1
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvtx-cu12 12.9.79
omegaconf 2.4.0.dev3
orjson 3.11.3
packaging 25.0
pandas 2.3.2
peft 0.17.2.dev0
pillow 11.3.0
platformdirs 4.4.0
pooch 1.8.2
propcache 0.3.2
protobuf 6.32.0
psutil 7.0.0
pyarrow 21.0.0
pycparser 2.22
pydantic 2.10.6
pydantic-core 2.27.2
pydub 0.25.1
pygments 2.19.2
pyparsing 3.2.3
python-dateutil 2.9.0.post0
python-multipart 0.0.20
pytz 2025.2
pyyaml 6.0.2
regex 2025.9.1
requests 2.32.5
rich 13.9.4
rouge-chinese 1.0.3
ruff 0.12.11
safehttpx 0.1.6
safetensors 0.6.2
scikit-learn 1.7.1
scipy 1.16.1
semantic-version 2.10.0
sentencepiece 0.2.1
setuptools 80.9.0
shellingham 1.5.4
shtab 1.7.2
six 1.17.0
sniffio 1.3.1
soundfile 0.13.1
soxr 0.5.0.post1
sse-starlette 3.0.2
starlette 0.47.3
sympy 1.14.0
termcolor 3.1.0
threadpoolctl 3.6.0
tiktoken 0.11.0
tokenizers 0.21.4
tomlkit 0.13.3
torch 2.8.0+cu129
torchao 0.13.0
torchaudio 2.8.0+cu129
torchvision 0.23.0+cu129
tqdm 4.67.1
transformers 4.55.0
triton 3.4.0
trl 0.9.6
typer 0.17.3
typing-extensions 4.15.0
tyro 0.8.14
tzdata 2025.2
unsloth 2025.9.1
unsloth-zoo 2025.9.2
urllib3 2.5.0
uvicorn 0.35.0
websockets 15.0.1
wheel 0.45.1
xformers 0.0.32.post2
xxhash 3.5.0
yarl 1.20.1
zipp 3.23.0
Reproduction
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path /data/models/openbmb/MiniCPM-V-4_5 \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--template minicpm_v \
--flash_attn auto \
--dataset_dir /data/woli/dataset \
--dataset woli \
--cutoff_len 2048 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 100000 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 100 \
--warmup_steps 0 \
--packing False \
--enable_thinking True \
--report_to none \
--output_dir saves/MiniCPM-V-4_5/lora/train_2025-09-10-14-06-01 \
--bf16 True \
--plot_loss True \
--trust_remote_code True \
--ddp_timeout 180000000 \
--include_num_input_tokens_seen True \
--optim adamw_torch \
--quantization_bit 4 \
--quantization_method bnb \
--double_quantization True \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0 \
--lora_target all \
--freeze_vision_tower True \
--freeze_multi_modal_projector True \
--image_max_pixels 589824 \
--image_min_pixels 1024 \
--video_max_pixels 65536 \
--video_min_pixels 256
报错信息如下:
File "/data/works/LLaMA-Factory/.venv/bin/llamafactory-cli", line 10, in <module>
sys.exit(main())
^^^^^^
File "/data/works/LLaMA-Factory/src/llamafactory/cli.py", line 151, in main
COMMAND_MAP[command]()
File "/data/works/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/data/works/LLaMA-Factory/src/llamafactory/train/tuner.py", line 72, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/data/works/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 96, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 2238, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 2582, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3796, in training_step
loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 108, in compute_loss
return super().compute_loss(model, inputs, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3884, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 818, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 806, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/peft/peft_model.py", line 1885, in forward
return self.base_model(
^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 228, in forward
return self.model.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-V-4_5/modeling_minicpmv.py", line 206, in forward
vllm_embedding, vision_hidden_states = self.get_vllm_embedding(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-V-4_5/modeling_minicpmv.py", line 127, in get_vllm_embedding
vision_embedding = self.resampler(vision_embedding, tgt_sizes, all_temporal_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1879, in _call_impl
return inner()
^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1827, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-V-4_5/resampler.py", line 232, in forward
out = self.batch_attn_forward(q, k, v, pos_embed_temporal, temporal_ids, key_padding_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-V-4_5/resampler.py", line 274, in batch_attn_forward
out = self.attn(
^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/modules/activation.py", line 1380, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/functional.py", line 6191, in multi_head_attention_forward
return handle_torch_function(
^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/overrides.py", line 1747, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 397, in __torch_function__
return super().__torch_function__(func, types, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/works/LLaMA-Factory/.venv/lib/python3.11/site-packages/torch/nn/functional.py", line 6457, in multi_head_attention_forward
attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x4096 and 1x4194304)
Others
No response