Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLava Infer 推理代码报错:ValueError: Image features and image tokens do not match #3103

Open
OverFlooded opened this issue Feb 13, 2025 · 0 comments

Comments

@OverFlooded
Copy link

OverFlooded commented Feb 13, 2025

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
qwen推理没问题,但是llava推理报错了
模型为 swift/llava-v1.6-vicuna-7b-hf 数据集为 rlaif-v

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

报错信息:

[WARNING:swift] 👆👆👆There are errors in the dataset, the data will be deleted
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83132/83132 [00:02<00:00, 31317.62 examples/s]
[INFO:swift] Dataset filtered, origin length: 83132, filtered dataset length: 82926
[INFO:swift] val_dataset: Dataset({
features: ['messages', 'rejected_response', 'images'],
num_rows: 20
})
0%| | 0/20 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]: File "/media/5/ofd/ms-swift/swift/cli/infer.py", line 5, in
[rank0]: infer_main()
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 247, in infer_main
[rank0]: return SwiftInfer(args).main()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 82, in main
[rank0]: return super().main()
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/base.py", line 46, in main
[rank0]: result = self.run()
[rank0]: ^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 89, in run
[rank0]: result = self.infer_dataset()
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer.py", line 226, in infer_dataset
[rank0]: resp_list = self.infer(
[rank0]: ^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 483, in infer
[rank0]: res += self._infer(
[rank0]: ^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 439, in _infer
[rank0]: return self._update_metrics(infer_func(**kwargs), metrics)
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/infer/infer_engine/pt_engine.py", line 311, in _infer_full
[rank0]: output = dict(template.generate(self.model, **generate_kwargs))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/media/5/ofd/ms-swift/swift/llm/template/base.py", line 346, in generate
[rank0]: return model.generate(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
[rank0]: result = self._sample(
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
[rank0]: outputs = self(**model_inputs, return_dict=True)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/ofd/.local/lib/python3.11/site-packages/transformers/models/llava_next/modeling_llava_next.py", line 865, in forward
[rank0]: raise ValueError(
[rank0]: ValueError: Image features and image tokens do not match: tokens: 1, features 2144
0%| | 0/20 [00:01<?, ?it/s]
[rank0]:[W214 00:48:08.601455932 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E0214 00:48:09.019000 1659573 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 1659669) of binary: /usr/local/anaconda3/bin/python
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 923, in
main()
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ofd/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/media/5/ofd/ms-swift/swift/cli/infer.py FAILED


Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2025-02-14_00:48:09
host : llwang-SYS-4028GR-TR
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1659669)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

报错位置的代码:

n_image_tokens = (input_ids == self.config.image_token_index).sum().item()
            n_image_features = image_features.shape[0]
            if n_image_tokens != n_image_features:
                raise ValueError(
                    f"Image features and image tokens do not match: tokens: {n_image_tokens}, features {n_image_features}"
                )

sh代码:

NPROC_PER_NODE=1 \
CUDA_VISIBLE_DEVICES=1 \
MODELSCOPE_CACHE="/media/5/ofd/.cache/models" \
MAX_PIXELS=1003520 \
MASTER_PORT=29600 \
swift infer \
    --model swift/llava-v1.6-vicuna-7b-hf \
    --infer_backend pt \
    --val_dataset /media/5/ofd/Work2/data/RLAIF_V_Dataset_83132.json#20 \
    --max_batch_size 1 \
    --max_new_tokens 512

Additional context
Add any other context about the problem here(在这里补充其他信息)

#2460 跟我有一样的问题,但是我这里还没有经过任何训练,只是推理阶段就报错了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant