LLava Infer 推理代码报错：ValueError: Image features and image tokens do not match #3103

OverFlooded · 2025-02-13T16:58:59Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
qwen推理没问题，但是llava推理报错了
模型为 swift/llava-v1.6-vicuna-7b-hf 数据集为 rlaif-v

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

报错信息：

[WARNING:swift] 👆👆👆There are errors in the dataset, the data will be deleted
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83132/83132 [00:02<00:00, 31317.62 examples/s]
[INFO:swift] Dataset filtered, origin length: 83132, filtered dataset length: 82926
[INFO:swift] val_dataset: Dataset({
features: ['messages', 'rejected_response', 'images'],
num_rows: 20
})
0%| | 0/20 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]: File "/media/5/ofd/ms-swift/swift/cli/infer.py", line 5, in
[rank0]: infer_main()
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 247, in infer_main
[rank0]: return SwiftInfer(args).main()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 82, in main
[rank0]: return super().main()
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/base.py", line 46, in main
[rank0]: result = self.run()
[rank0]: ^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 89, in run
[rank0]: result = self.infer_dataset()
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 226, in infer_dataset
[rank0]: resp_list = self.infer(
[rank0]: ^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 483, in infer
[rank0]: res += self._infer(
[rank0]: ^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 439, in _infer
[rank0]: return self._update_metrics(infer_func(**kwargs), metrics)
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 311, in _infer_full
[rank0]: output = dict(template.generate(self.model, **generate_kwargs))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/template/base.py", line 346, in generate
[rank0]: return model.generate(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
[rank0]: result = self._sample(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
[rank0]: outputs = self(**model_inputs, return_dict=True)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 865, in forward
[rank0]: raise ValueError(
[rank0]: ValueError: Image features and image tokens do not match: tokens: 1, features 2144
0%| | 0/20 [00:01<?, ?it/s]
[rank0]:[W214 00:48:08.601455932 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E0214 00:48:09.019000 1659573 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 1659669) of binary: /usr/local/anaconda3/bin/python
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 923, in
main()
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/media/5/ofd/ms-swift/swift/cli/infer.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2025-02-14_00:48:09
host : llwang-SYS-4028GR-TR
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1659669)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

报错位置的代码：

n_image_tokens = (input_ids == self.config.image_token_index).sum().item()
            n_image_features = image_features.shape[0]
            if n_image_tokens != n_image_features:
                raise ValueError(
                    f"Image features and image tokens do not match: tokens: {n_image_tokens}, features {n_image_features}"
                )

sh代码：

NPROC_PER_NODE=1 \
CUDA_VISIBLE_DEVICES=1 \
MODELSCOPE_CACHE="/media/5/ofd/.cache/models" \
MAX_PIXELS=1003520 \
MASTER_PORT=29600 \
swift infer \
    --model swift/llava-v1.6-vicuna-7b-hf \
    --infer_backend pt \
    --val_dataset /media/5/ofd/Work2/data/RLAIF_V_Dataset_83132.json#20 \
    --max_batch_size 1 \
    --max_new_tokens 512

Additional context
Add any other context about the problem here(在这里补充其他信息)

#2460 跟我有一样的问题，但是我这里还没有经过任何训练，只是推理阶段就报错了。

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLava Infer 推理代码报错：ValueError: Image features and image tokens do not match #3103

LLava Infer 推理代码报错：ValueError: Image features and image tokens do not match #3103

OverFlooded commented Feb 13, 2025 •

edited

Loading

LLava Infer 推理代码报错：ValueError: Image features and image tokens do not match #3103

LLava Infer 推理代码报错：ValueError: Image features and image tokens do not match #3103

Comments

OverFlooded commented Feb 13, 2025 • edited Loading

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2025-02-14_00:48:09 host : llwang-SYS-4028GR-TR rank : 0 (local_rank: 0) exitcode : 1 (pid: 1659669) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

OverFlooded commented Feb 13, 2025 •

edited

Loading

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2025-02-14_00:48:09
host : llwang-SYS-4028GR-TR
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1659669)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html