Skip to content

Use slow tokenizer for LLaMA#84

Merged
WoosukKwon merged 31 commits intomainfrom
tokenizer
May 9, 2023
Merged

Use slow tokenizer for LLaMA#84
WoosukKwon merged 31 commits intomainfrom
tokenizer

Conversation

@WoosukKwon
Copy link
Copy Markdown
Collaborator

Fixes #80
Should be merged after #82

This PR fixes the frontends to not use LLaMA fast tokenizer, which causes a protobuf bug. We should use the normal tokenizer instead.

@WoosukKwon WoosukKwon changed the title Do not use LLaMA fast tokenizer Use slow tokenizer for LLaMA May 8, 2023
@WoosukKwon WoosukKwon merged commit 85eb631 into main May 9, 2023
@WoosukKwon WoosukKwon deleted the tokenizer branch May 9, 2023 23:03
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Aug 15, 2024
Co-authored-by: Krzysztof Laskowski <klaskowski@habana.ai>
wuhuikx pushed a commit to wuhuikx/vllm that referenced this pull request Mar 27, 2025
### What this PR does / why we need it?

Changed default block_size in platform.py from 16 to 128, as Ascend
Devices have a better affinity for block size 128.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: hzji210@gmail.com <hzji210@gmail.com>
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Apr 17, 2025
iwooook pushed a commit to moreh-dev/vllm that referenced this pull request Nov 29, 2025
…_devices (vllm-project#84)

Signed-off-by: Salar <skhorasgani@tenstorrent.com>
(cherry picked from commit 5999673)
tjtanaa pushed a commit to tjtanaa/vllm that referenced this pull request Jan 29, 2026
)

Signed-off-by: syedmba <syedmba7@connect.hku.hk>
jtechapps pushed a commit to jtechapps/vllm-1 that referenced this pull request Jan 29, 2026
vllm-project#84)

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

---------

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
gaidandawang-afk pushed a commit to gaidandawang-afk/vllm that referenced this pull request Mar 16, 2026
vllm-project#84)

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

---------

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
gaidandawang-afk pushed a commit to gaidandawang-afk/vllm that referenced this pull request Mar 16, 2026
vllm-project#84)

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

---------

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
jianzs pushed a commit to jianzs/vllm that referenced this pull request Apr 2, 2026
* Milestone 1 of Internal Process-level Fault Tolerance

* Milestone 1 of Internal Process-level Fault Tolerance (vllm-project#61)

* feat(fault-tolerance): add class skeletons for fault tolerance

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* config: add configuration options for fault tolerance

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* 增加generate_identity和generate_identitys函数 Generate a unique identity for ZMQ ROUTER node

* add service startup configuradtion fault report addr

* add init WorkerGuard

* add engine_core_cmd_addr、fault_report_addr、client_cmd_addr、engine_core_identitys in EngineZmqAddresses
init engine_core_cmd_addr、fault_report_addr、client_cmd_addr in launch_core_engines func
add _report_engine_dead func in CoreEngineProcManager

* init ClientGuard
init EngineZmqAddresses engine_core_identitys

* init EngineCoreGuard

* change generate_identitys to generate_identity_group

* code typesetting is optimized

* code typesetting is optimized

* changed code format ensure every line < 88 chars

* changed code format ensure every line < 88 chars
fix error Value of type "dict[Any, Any] | None" is not indexable  [index]

* fix bug
Error: vllm/v1/engine/utils.py:122:89: E501 Line too long (117 > 88)
Error: vllm/v1/engine/utils.py:1059:9: F402 Import `uuid` from line 6 shadowed by loop variable

* fix
Error: vllm/v1/engine/utils.py:1045: error: Need type annotation for "uuids" (hint: "uuids: set[<type>] = ...")  [var-annotated]

* fix
error: Value of type "dict[Any, Any] | None" is not indexable  [index]

* fix
error: Value of type "dict[Any, Any] | None" is not indexable  [index]

Signed-off-by: a798347923 <2645302020@qq.com>

* add _send_msg in EngineCoreGuard

Signed-off-by: a798347923 <2645302020@qq.com>

* add import torch.cuda

* add _recv_cmd function docstring that clearly explains the meaning of the return value.

* changed recv_fault_msg to recv_msg
add ClientGuard __init__ func parameter types

* add engine monitor

Signed-off-by: TianZhuo <2770730562@qq.com>

* Delete requirements/test.txt~

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Delete vllm/v1/engine/core_client.py~

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* simply _send_msg and _recv_cmd in EngineCoreGuard

* simply recv_msg in ClientGuard

* engine: add fault tolerance features for EngineCore.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* engine: add timeout mechanism in retry.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* add engine monitor

* Delete vllm/v1/engine/exceptions.py~

Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com>

* updata actor_index

* updata enginedead flag

* handle fault and report exception

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix engine_actor

* fix engine_actor fault_info

* handle fault and report exception

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* delete num_identity

* changed try expect

* fix debug error

* fix one bug.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* add fault_report_addr in FaultToleranceConfig

* add handle fault&get_fault_info api

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* remove fault_report_address in CoreEngineActorManager __init__

Signed-off-by: a798347923 <2645302020@qq.com>

* ruff format

Signed-off-by: a798347923 <2645302020@qq.com>

* add handle fault&get_fault_info api

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix one bug.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* add fault_report_port in FaultToleranceConfig

Signed-off-by: a798347923 <2645302020@qq.com>

* add zmq_addr concatenate with fault_report_addr and fault_report_port

Signed-off-by: a798347923 <2645302020@qq.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix some bug

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fault reporter bug fix

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* remove fault_report_addr in FaultToleranceConfig

Signed-off-by: a798347923 <2645302020@qq.com>

* refactor: relocate method serialization functions to serial_util.py

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* fix actor bug

* fix actor bug

* add engine_core_cmd_addr in FaultToleranceConfig

Signed-off-by: a798347923 <2645302020@qq.com>

* add and use _stop_worker_execution in EngineCoreGuard

Signed-off-by: a798347923 <2645302020@qq.com>

* add and use run in WorkerGuard

Signed-off-by: a798347923 <2645302020@qq.com>

* fix actor bug

* fix bug

* fix sentinel

* fix bug vllm/v1/engine/core.py:847: error: Missing positional argument "tp_size" in call to "EngineCoreGuard"

Signed-off-by: a798347923 <2645302020@qq.com>

* fix bug error: Missing positional arguments "length", "byteorder" in call to "to_bytes" of "int"

Signed-off-by: a798347923 <2645302020@qq.com>

* fix bug in fault tolerance mode

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix bug in fault tolerance mode

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* change fault_report_port to internal_fault_report_port
add external_fault_notify_port

Signed-off-by: a798347923 <2645302020@qq.com>

* change fault_report_port to internal_fault_report_port
add external_fault_notify_port

Signed-off-by: a798347923 <2645302020@qq.com>

* add _recv_cmd func
use deserialize_method_call and run_method in run func

Signed-off-by: a798347923 <2645302020@qq.com>

* Update core.py

fix bug error: Need type annotation for "kwargs" (hint: "kwargs: dict[<type>, <type>] = ...")

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* add self.ctx.term() in shutdown()

Signed-off-by: a798347923 <2645302020@qq.com>

* changed import deserialize_method_call,serialize_method_call

Signed-off-by: a798347923 <2645302020@qq.com>

* changed init worker_guard in init_device

Signed-off-by: a798347923 <2645302020@qq.com>

* Update core.py

add import serialize_method_call

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Update gpu_worker.py

changed init WorkerGuard in init_device

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Update gpu_worker.py

FIX BUG self.worker_guard: WorkerGuard|None = None

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Update gpu_worker.py

fix bug error: Argument 1 to "deserialize_method_call" has incompatible type "str | None"; expected "str"  [arg-type]

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Update gpu_worker.py

ruff format

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Update core.py

ruff-format

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* actively send exception information

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* actively send exception information

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* actively send exception information

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses

Signed-off-by: a798347923 <2645302020@qq.com>

* change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses

Signed-off-by: a798347923 <2645302020@qq.com>

* Update utils.py

delete engine_core_cmd_addr in EngineZmqAddresses

Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>

* Remove redundant configuration: fault-pub-port

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Send pause instructions after receiving fault info in ClientGuard

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* change engine_core_guard_identities from dict[int, bytes] to list[bytes]

Signed-off-by: a798347923 <2645302020@qq.com>

* fix bug "only the worker guard of engine core 0 can receive messages sent from engine core guard

Signed-off-by: a798347923 <2645302020@qq.com>

* change local_rank to rank_in_group in WorkerGuard

Signed-off-by: a798347923 <2645302020@qq.com>

* changed del self.client_cmd_registry[int(unhealthy_engine.engine_id)]

Signed-off-by: a798347923 <2645302020@qq.com>

* add gloo communication timeout

* fix some bug

* add  stateless_process_group gloo_comm_timeout

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix some bug

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* reconstruct fault receiver&fault handler

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix return format

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix return format

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix return format

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* add abort request

* fix some bug

* fix some bug

* fix some bug

* add dt for client guard

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* add dt for client guard

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* add dt for client guard

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* Implementation of two types of pause: a soft one by using flag signals and a hard one by aborting nccl communicators.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Refine certain log forms and fix a minor bug in pause function.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Refactor and abstract the recv_msg logic in CG,ECG,WG.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Add and check method uuid when sending commands and receiving results.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Abstract the logic of sending instructions and waiting responses from FaultHandler

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Add options in EngineCoreGuard to recv execution results from WorkerGuard

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Support worker reinitialization after hard pause; add task queue in FaultHandler to ensure sequential task execution

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* resolve conflicts

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* add engine core ut

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* add engine core ut

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* Ensure WorkerGuard command execution returns result; fix missing set_device when TP>1

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* rename& format logger

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* rename& format logger

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* feat(nccl): enable non-blocking NCCL communicators to support ncclCommAbort

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* reinit dp_group

* fix bug

* fix bug

* fix bug

* fix bug (vllm-project#54)

* Move requests to waiting queue instead of abandoing them directly.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* add annotation

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* fix typos

Signed-off-by: fangyuchu <fangyuchu@qq.com>

---------

Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: a798347923 <2645302020@qq.com>
Signed-off-by: TianZhuo <2770730562@qq.com>
Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com>
Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Co-authored-by: a798347923 <2645302020@qq.com>
Co-authored-by: TianZhuo <2770730562@qq.com>
Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com>
Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>

* Fix DT and zmq socket closing issues, updated names per feedback and reinitialize dp_group with new port

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Improve documentation and logging in API server

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix hanging issue in DT; fix hang when aborting communicators from Python side; use queue.Queue for engine_exception_q

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Refactor fault tolerance modules by renaming classes to Sentinel and converting engine_registry to a dict

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* reject requests when engine is in fault status

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* clear batch_queue for async scheduling

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix incorrect initialization of worker_cmd_socket in multi-node setups

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Switch from field to Field

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Unify start_engine_core_monitor in MPClient and CoreEngineProcManager to reduce duplication

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault … (vllm-project#84)

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

* refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic

Signed-off-by: w00689259 <wangzhuo66@huawei.com>

---------

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>

* fix bug in tests

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>

* fix bug in tests

Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <389750525@qq.com>

* refactor: improve naming and add comments for readability

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Pass fault_tolerance_config through process group creation for future extensibility

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Switch to native preempt_request implementation

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Rename base_sentinel.py

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor(api_server): Split fault_tolerance interfaces into standalone files

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Use zmq poll for socket receive in Sentinel DT to avoid hanging

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Add shutdown-on-fault-tolerance-failure config option

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* ClientSentinel: add extra check to prevent repeated pause commands on error

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* feat(pause): apply pause with target index

Signed-off-by: zWaNg3 <389750525@qq.com>

* Add middleware for fault tolerance

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* fix engine_actor monitoring function bug

Signed-off-by: TianZhuo <2770730562@qq.com>

* fix engine_actor monitoring bug

Signed-off-by: TianZhuo <2770730562@qq.com>

* logger output format

Signed-off-by: TianZhuo <2770730562@qq.com>

* refactor(client_sentinel): support ClientSentinel-Client communication; refactor internal socket logic

Signed-off-by: zWaNg3 <389750525@qq.com>

* feat: add FaultToleranceRequest and FaultToleranceResult

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* feat: add EngineStatusType enum and support paused state

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Unify the logic of engine monitor for engine process manager and engine actor manager

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix the hanging issue of ClientSentinel in the shutdown

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor(client_sentinel): rename process_ft_requests_loop function and run function

Signed-off-by: zWaNg3 <389750525@qq.com>

* Use VllmConfig as the input of Sentinel Modules

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Remove redundant @DataClass from FaultToleranceConfig

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Move hardcoded vllm_fault topic string into FaultToleranceConfig

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Update corresponding tests to new ClientSentinel design.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Update engine core sentinel tests.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix incorrect device settings in the pause of worker sentinel.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Code cleanup and readability improvements

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Simplify FaultInfo and improve the readability

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Move sentinels into one file

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Remove recv_router_dealer_message

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Simplify the code in BaseSentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Simplify the code in EngineCoreSentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Introduce fault_tolerance utils and address dataclass

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor: split different sentinels into separate files

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor: split worker sentinel into v1/worker/sentinel for better plugin support and hardware adaptation

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* remove ThreadSafeDict

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Simplify the communication between client, client sentinel and engine core sentinel (vllm-project#137)

* refactor(client_sentinel): use core_client input_socket to broadcast ft_requst

Signed-off-by: zWaNg3 <389750525@qq.com>

* refactor(client_sentinel): use core_client input_socket to broadcast ft_requst

Signed-off-by: zWaNg3 <389750525@qq.com>

* refactor(client_sentinel): use core_client input_socket to broadcast ft_requst

Signed-off-by: zWaNg3 <389750525@qq.com>

* add _send_utility_result in ClientSentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Processes fault-tolerant requests and forwards them to output.

Signed-off-by: yzchang-plus <1078477584@qq.com>

* replace uncertain code with TODO

Signed-off-by: yzchang-plus <1078477584@qq.com>

* add monitoring logic in client sentinel and implement thread-safe pause in monitoring.

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor(client_sentinel): send ft request using input_address

Signed-off-by: zWaNg3 <389750525@qq.com>

* refactor(client_sentinel): return ft result to client

Signed-off-by: zWaNg3 <389750525@qq.com>

* Use call_utility_async for interactions between client, client_sentinel and engine core sentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Rename engine recovery timeout config

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Remove upstream and downstream concept from the base sentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Support passing stateless dp port to retry

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Improve the shutdown of client sentinel

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Add try except for handle_fault in engine core

Signed-off-by: fangyuchu <fangyuchu@qq.com>

---------

Signed-off-by: zWaNg3 <389750525@qq.com>
Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: yzchang-plus <1078477584@qq.com>
Co-authored-by: zWaNg3 <389750525@qq.com>
Co-authored-by: yzchang-plus <1078477584@qq.com>

---------

Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: a798347923 <2645302020@qq.com>
Signed-off-by: TianZhuo <2770730562@qq.com>
Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com>
Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
Signed-off-by: yzchang-plus <1078477584@qq.com>
Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Co-authored-by: a798347923 <2645302020@qq.com>
Co-authored-by: TianZhuo <2770730562@qq.com>
Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com>
Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: zWaNg3 <389750525@qq.com>
Co-authored-by: yzchang-plus <1078477584@qq.com>
Signed-off-by: fangyuchu <fangyuchu@qq.com>

* refactor(dt tests of sentinels): add dt tests for sentinels

Signed-off-by: zWaNg3 <389750525@qq.com>
Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Remove torch.cuda API call (vllm-project#148)

* Remove torch.cuda API call

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Remove unwanted shutdown

Signed-off-by: fangyuchu <fangyuchu@qq.com>

---------

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fault Tolerant EP: Implement fault-report

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* merge engine monitor codes

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Move FT router attachment point and simplify FaultInfo initialization logic

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Revise DT for Fault Report

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix incorrect count of engine core index

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Update engine process monitoring codes

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* [Bugfix] revise engine monitor logic on account of dead processes

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Improve the format of the fault report json

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Fix incorrect shutdown of engine manager

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* Avoid error logging in normal shutdown

Signed-off-by: fangyuchu <fangyuchu@qq.com>

* handle zmq error

Signed-off-by: fangyuchu <fangyuchu@qq.com>

---------

Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: a798347923 <2645302020@qq.com>
Signed-off-by: TianZhuo <2770730562@qq.com>
Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com>
Signed-off-by: w00689259 <wangzhuo66@huawei.com>
Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
Signed-off-by: yzchang-plus <1078477584@qq.com>
Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com>
Co-authored-by: a798347923 <2645302020@qq.com>
Co-authored-by: TianZhuo <2770730562@qq.com>
Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com>
Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com>
Co-authored-by: w00689259 <wangzhuo66@huawei.com>
Co-authored-by: zWaNg3 <389750525@qq.com>
Co-authored-by: yzchang-plus <1078477584@qq.com>
Signed-off-by: fangyuchu <fangyuchu@qq.com>
stecasta added a commit to stecasta/vllm that referenced this pull request Apr 30, 2026
`https://pytorch.org/docs/stable/objects.inv` started returning 404
because pytorch/docs PR vllm-project#84 (merged 2026-04-29 23:09 UTC) replaced
the previous filesystem symlinks under `stable/` with per-page HTML
redirect stubs. Non-HTML build artifacts such as `objects.inv`,
`searchindex.js` and `_static/version_switcher.json` were not stubbed
or copied over, so any tool that fetches them under `/stable/` 404s.

Every Read the Docs build of `latest` since 2026-04-29 23:59 UTC
aborts with:

    ERROR - mkdocstrings: Couldn't load inventory
       https://pytorch.org/docs/stable/objects.inv through handler
       'python': HTTP Error 404: Not Found
    Aborted with 1 errors, 20 warnings in strict mode\!

Temporary fix: point at `pytorch.org/docs/2.11/objects.inv`, which is
what `/stable/` aliases to today (verified: `docs.pytorch.org/docs/stable/index.html`
is a JS redirect to `../2.11/index.html`, and 2.11 is the latest
PyTorch release on PyPI). This needs a one-line bump on the next
stable release. Revert to `/stable/` once the upstream stub script is
fixed to carry non-HTML assets through.

Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in LLaMA fast tokenizer

1 participant