[Bug] cogvlm-chat-hf picture understanding ability is bad. #3121

zhulinJulia24 · 2025-02-08T08:09:26Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

cogvlm-chat-hf picture understanding ability is bad.

The issue seems the issue appeared before version 0.6.5.

Reproduction

according to https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/multi_modal/cogvlm.md to download cogvlm-chat-hf

run script

from lmdeploy import pipeline
from lmdeploy.vl import load_image


if __name__ == "__main__":
    pipe = pipeline('THUDM/cogvlm-chat-hf')

    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
    response = pipe(('describe this image', image))
    print(response)

Expected to get tiger-related content, but the actual returned content had nothing to do with tigers

Environment

sys.platform: linux
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 90.1
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.20.1+cu118
LMDeploy: 0.7.0.post2+
transformers: 4.48.1
gradio: 5.13.1
fastapi: 0.115.7
pydantic: 2.10.6
triton: 3.1.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_2  mlx5_3  CPU Affinity    NUMA Affinity
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PXB     NODE    0-31,64-95      0
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PXB     NODE    0-31,64-95      0
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    NODE    PXB     0-31,64-95      0
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    NODE    PXB     0-31,64-95      0
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     SYS     32-63,96-127    1
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     SYS     32-63,96-127    1
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    SYS     SYS     32-63,96-127    1
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      SYS     SYS     32-63,96-127    1
mlx5_2  PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      NODE
mlx5_3  NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE     X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

Response(text=" This image is a screenshot of a webpage. The webpage appears to be a news or media site, as indicated by the headline and the layout. The headline reads 'BREAKING: Trump Indicted for Classified Documents', suggesting that the news is of significant importance. The layout includes a navigation bar at the top, a headline in bold, and a date indicating when the article was published. There are also social media sharing options available.", generate_token_len=93, input_token_len=299, finish_reason='stop', token_ids=[910, 1967, 338, 263, 17286, 310, 263, 24499, 29889, 450, 24499, 5692, 304, 367, 263, 9763, 470, 5745, 3268, 29892, 408, 18694, 491, 278, 2343, 1220, 322, 278, 5912, 29889, 450, 2343, 1220, 13623, 525, 29933, 1525, 22311, 4214, 29901, 27504, 1894, 18186, 363, 4134, 2164, 10854, 29879, 742, 26233, 393, 278, 9763, 338, 310, 7282, 13500, 29889, 450, 5912, 7805, 263, 11322, 2594, 472, 278, 2246, 29892, 263, 2343, 1220, 297, 14288, 29892, 322, 263, 2635, 23941, 746, 278, 4274, 471, 6369, 29889, 1670, 526, 884, 5264, 5745, 19383, 3987, 3625, 29889], logprobs=None, logits=None, last_hidden_state=None, index=0)

The text was updated successfully, but these errors were encountered:

zhulinJulia24 · 2025-02-10T04:56:08Z

similar with microsoft/Phi-3-vision-128k-instruct

lvhan028 assigned RunningLeon Feb 8, 2025

RunningLeon mentioned this issue Feb 12, 2025

Fix cogvlm and phi3vision #3137

Merged

lvhan028 added the awaiting response label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] cogvlm-chat-hf picture understanding ability is bad. #3121

[Bug] cogvlm-chat-hf picture understanding ability is bad. #3121

zhulinJulia24 commented Feb 8, 2025

zhulinJulia24 commented Feb 10, 2025

[Bug] cogvlm-chat-hf picture understanding ability is bad. #3121

[Bug] cogvlm-chat-hf picture understanding ability is bad. #3121

Comments

zhulinJulia24 commented Feb 8, 2025

Checklist

Describe the bug

Reproduction

Environment

Error traceback

zhulinJulia24 commented Feb 10, 2025