Skip to content

lazy load vllm.utils.serial_utils import tensor2base64 to avoid break. #30094

Closed
QiliangCui wants to merge 1 commit intovllm-project:mainfrom
QiliangCui:dev1204
Closed

lazy load vllm.utils.serial_utils import tensor2base64 to avoid break. #30094
QiliangCui wants to merge 1 commit intovllm-project:mainfrom
QiliangCui:dev1204

Conversation

@QiliangCui
Copy link
Copy Markdown
Contributor

@QiliangCui QiliangCui commented Dec 4, 2025

Purpose

Fix the vllm on tpu loading issue.

After the PR #29970, vllm on tpu is blocked at loading by

^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843] EngineCore failed to start.
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843] Traceback (most recent call last):
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/engine/core.py", line 834, in run_engine_core
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     engine_core = EngineCoreProc(*args, **kwargs)
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/engine/core.py", line 610, in __init__
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     super().__init__(
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/engine/core.py", line 102, in __init__
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     self.model_executor = executor_class(vllm_config)
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     self._init_executor()
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/executor/uniproc_executor.py", line 46, in _init_executor
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     self.driver_worker.init_worker(all_kwargs=[kwargs])
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/worker/worker_base.py", line 255, in init_worker
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     worker_class = resolve_obj_by_qualname(
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]                    ^^^^^^^^^^^^^^^^^^^^^^^^
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/utils/import_utils.py", line 122, in resolve_obj_by_qualname
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     module = importlib.import_module(module_name)
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     return _bootstrap._gcd_import(name[level:], package, level)
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap_external>", line 999, in exec_module
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]   File "/workspace/vllm/vllm/v1/worker/tpu_worker.py", line 41, in <module>
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843]     import torch_xla.core.xla_model as xm
^[[0;36m(EngineCore_DP0 pid=309)^[[0;0m ERROR 12-04 07:44:22 [core.py:843] ModuleNotFoundError: No module named 'torch_xla'

Lazy loading vllm.utils.serial_utils import can address it.

Test Plan

  1. wait for ci/cd test.

  2. manually load vllm tpu with

vllm serve \
  --model=Qwen/Qwen2.5-7B-Instruct   \
  --download_dir /mnt/disks/persist \
  --tensor-parallel-size=1   \
  --swap-space=16   \
  --enable-chunked-prefill   \
  --max-model-len=128

with the fix, it can loaded.

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ing tpu.

Signed-off-by: Qiliang Cui <derrhein@gmail.com>
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Dec 4, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a module loading issue by lazy-loading the tensor2base64 utility. The change correctly moves the import statement from the module level into the encode_base64 method where it is used. This is a standard and appropriate approach to resolve import-related problems, preventing an undesirable import chain from being triggered at application startup. The implementation is sound and effectively resolves the issue described in the pull request.

@DarkLight1337
Copy link
Copy Markdown
Member

Sorry for breaking this

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 5, 2025 04:44
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025
@QiliangCui
Copy link
Copy Markdown
Contributor Author

thank you @DarkLight1337 ! No problem! We will add some test in vllm main branch so that will know if it impacts TPU.

Jun from tpu team merged in fix in tpu branch vllm-project/tpu-inference#1251. So, I don't need to update this for now.

@QiliangCui QiliangCui closed this Dec 5, 2025
auto-merge was automatically disabled December 5, 2025 15:31

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants