feat: add llm_vllm_deepseek_ocr and llm_office_vllm_deepseek_ocr by weedge · Pull Request #205 · ai-bot-pro/achatbot

weedge · 2025-10-24T05:19:56Z

Note

vllm deepseek_ocr only support Gundam: base_size = 1024, image_size = 640, crop_mode = True
although vllm deepseek_ocr implementation fast generate, it is not as stable as deepseek ocr with transformers and has bug: recognition repeated output

feat:

add llm_vllm_deepseek_ocr and officially vLLM supported deepseek OCR
add llm_vllm_deepseek_ocr/llm_office_vllm_deepseek_ocr test on modal

IMAGE_GPU=L40s modal run src/llm/vllm/vlm/ocr_deepseek.py --task stream_infer
IMAGE_GPU=L40s OCR_TAG=llm_office_vllm_deepseek_ocr modal run src/llm/vllm/vlm/ocr_deepseek.py --task offline_infer
APP_NAME=achatbot IMAGE_GPU=L40s OCR_TAG=llm_vllm_deepseek_ocr modal run src/llm/vllm/vlm/ocr_deepseek.py --task achatbot_stream_infer
APP_NAME=achatbot IMAGE_GPU=L40s OCR_TAG=llm_office_vllm_deepseek_ocr modal run src/llm/vllm/vlm/ocr_deepseek.py --task achatbot_stream_infer

add deepseek ocr vllm (register DeepseekOCRForCausalLM(deepencoder+decoder) and DeepseekVLV2Processor)
deploy deepseek_ocr vision ocr bot with fastapi_webrtc_vision_bot_serve

# deepseek single room bot
modal run src/download_models.py --repo-ids "FunAudioLLM/SenseVoiceSmall"
modal run src/download_models.py --repo-ids "deepseek-ai/DeepSeek-OCR" --revision "refs/pr/23"
modal volume put config ./config/bots/daily_ocr_vllm_vision_bot.json /bots/ -f
EXTRA_INDEX_URL=https://pypi.org/simple/ \
    SERVE_TYPE=room_bot \
    CONFIG_FILE=/root/.achatbot/config/bots/daily_ocr_vllm_vision_bot.json \
    ACHATBOT_VERSION=0.0.28.post1 \
    IMAGE_NAME=deepseek_ocr_vllm IMAGE_CONCURRENT_CN=1 IMAGE_GPU=L40s \
    modal serve src/fastapi_webrtc_vision_bot_serve.py
EXTRA_INDEX_URL=https://pypi.org/simple/ \
    SERVE_TYPE=room_bot \
    CONFIG_FILE=/root/.achatbot/config/bots/daily_ocr_vllm_vision_bot.json \
    ACHATBOT_VERSION=0.0.28.post2 \
    IMAGE_NAME=deepseek_ocr_office_vllm IMAGE_CONCURRENT_CN=1 IMAGE_GPU=L40s \
    modal serve src/fastapi_webrtc_vision_bot_serve.py

# run DailyOCRVisionBot with config
curl -XPOST "https://weedge--fastapi-webrtc-vision-deepseek-ocr-vllm-bot-srv-app-dev.modal.run/bot_join/chat-room/DailyOCRVisionBot"
curl -XPOST "https://weedge--fastapi-webrtc-vision-deepseek-ocr-office-vllm-bot-srv-app-dev.modal.run/bot_join/chat-room/DailyOCRVisionBot"

daily_ocr_vllm_vision_bot.json

{
  "chat_bot_name": "DailyOCRVisionBot",
  "handle_sigint": true,
  "is_background": false,
  "room_name": "chat-room",
  "room_url": "",
  "token": "",
  "services": {},
  "config": {
    "vad": {
      "tag": "silero_vad_analyzer",
      "args": {
        "start_secs": 0.0,
        "stop_secs": 0.0,
        "confidence": 0.7,
        "min_volume": 0.6,
        "onnx": true
      }
    },
    "asr": {
      "tag": "sense_voice_asr",
      "args": {
        "language": "zn",
        "model_name_or_path": "/root/.achatbot/models/FunAudioLLM/SenseVoiceSmall"
      }
    },
    "vision_ocr": {
      "tag": "llm_vllm_deepseek_ocr",
      "args": {
        "serv_args": {
          "model": "/root/.achatbot/models/deepseek-ai/DeepSeek-OCR",
          "enforce_eager":false,
          "trust_remote_code":true,
          "max_model_len":8192,
          "tensor_parallel_size":1,
          "enable_prefix_caching":false,
          "gpu_memory_utilization": 0.75
        }
      }
    },
    "tts": {
      "tag": "tts_edge",
      "args": {
        "voice_name": "zh-CN-YunxiNeural",
        "language": "zh",
        "gender": "Male"
      }
    }
  },
  "config_list": []
}

reference

Signed-off-by: weedge <weege007@gmail.com>

…coder) and DeepseekVLV2Processor) Signed-off-by: weedge <weege007@gmail.com>

gemini-code-assist · 2025-10-24T05:20:27Z

Summary of Changes

Hello @weedge, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's optical character recognition (OCR) capabilities by integrating Deepseek OCR models, leveraging vLLM for efficient inference. It introduces both a custom and an officially supported vLLM variant of Deepseek OCR, complete with robust vision encoding and image preprocessing pipelines. The changes also include a new testing task for the vision bot and updates to core interfaces to support asynchronous operations, ensuring a more performant and feature-rich multimodal experience.

Highlights

Deepseek OCR Integration: Introduced two new vLLM-based Deepseek OCR models: llm_vllm_deepseek_ocr for a custom implementation and llm_office_vllm_deepseek_ocr for the officially vLLM-supported version. This significantly expands the OCR capabilities within the system.
New Vision Bot Test Task: Added a new achatbot_stream_infer task to ocr_deepseek.py for comprehensive testing of the Deepseek OCR vision bot within the achatbot framework, including session management and image processing.
Vision Encoder and Image Processing Modules: Incorporated a suite of new modules (deepencoder, model, process within src/thirdparty/deepseek_ocr_vllm) that implement the Deepseek OCR's vision encoder (utilizing SAM and CLIP-based components), an MLP projector, and advanced image processing logic, including dynamic cropping and tokenization for multimodal input.
Asynchronous Interface Update: Refactored the IVisionOCR interface to use async_generate instead of generate, ensuring better compatibility with asynchronous operations and improving overall system responsiveness for vision-related tasks.
Dependency and Version Management: Updated the achatbot project version to 0.0.28.post1 and introduced conditional transformers versioning (4.57.1 for llm_office_vllm_deepseek_ocr and 4.47.1 otherwise) to align with specific vLLM requirements.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for DeepSeek OCR models using vLLM, including both a custom implementation and the officially supported version. The changes are well-structured, primarily introducing new files for the model implementation and testing scripts. My review focuses on improving type safety and reducing code duplication for better maintainability. Overall, the changes look good and the new feature is a valuable addition.

src/common/interface.py

src/core/llm/vllm/deepseek_ocr.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…pi_webrtc_vision_bot_serve on modal Signed-off-by: weedge <weege007@gmail.com>

Signed-off-by: weedge <weege007@gmail.com>

weedge added 5 commits October 23, 2025 23:35

feat: add llm_vllm_deepseek_ocr

cbfc4d4

Signed-off-by: weedge <weege007@gmail.com>

import VllmDeepSeekOCR

04e98c1

Signed-off-by: weedge <weege007@gmail.com>

add officially vLLM supported deepseek OCR

b42e889

Signed-off-by: weedge <weege007@gmail.com>

add llm_vllm_deepseek_ocr/llm_office_vllm_deepseek_ocr test on modal

1b3bc99

Signed-off-by: weedge <weege007@gmail.com>

add deepseek ocr vllm (register DeepseekOCRForCausalLM(deepencoder+de…

15e3778

…coder) and DeepseekVLV2Processor) Signed-off-by: weedge <weege007@gmail.com>

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

src/common/interface.py Outdated Show resolved Hide resolved

src/core/llm/vllm/deepseek_ocr.py Show resolved Hide resolved

Update src/common/interface.py

921a552

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

weedge added vllm VLM vision language model ViT vision transformer CLIP Contrastive Language-Image Pre-training labels Oct 24, 2025

weedge added 2 commits October 24, 2025 15:04

feat: deploy deepseek_ocr_vllm and deepseek_ocr_office_vllm bot fasta…

5b8828a

…pi_webrtc_vision_bot_serve on modal Signed-off-by: weedge <weege007@gmail.com>

change readme and version

f0e46f3

Signed-off-by: weedge <weege007@gmail.com>

weedge merged commit 1a96549 into main Oct 24, 2025

weedge changed the title ~~feat: add add llm_vllm_deepseek_ocr and llm_office_vllm_deepseek_ocr~~ feat: add llm_vllm_deepseek_ocr and llm_office_vllm_deepseek_ocr Oct 24, 2025

weedge mentioned this pull request Oct 24, 2025

[achatbot] add DeepSeek-OCR (transformers/vllm) deepseek-ai/DeepSeek-OCR#127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add llm_vllm_deepseek_ocr and llm_office_vllm_deepseek_ocr#205

feat: add llm_vllm_deepseek_ocr and llm_office_vllm_deepseek_ocr#205
weedge merged 8 commits intomainfrom
feat/vision-ocr

weedge commented Oct 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weedge commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

reference

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weedge commented Oct 24, 2025 •

edited

Loading