Add quickstart guide#148
Merged
zhuohan123 merged 8 commits intomainfrom Jun 17, 2023
Merged
Conversation
WoosukKwon
reviewed
Jun 14, 2023
WoosukKwon
reviewed
Jun 14, 2023
WoosukKwon
approved these changes
Jun 14, 2023
Collaborator
WoosukKwon
left a comment
There was a problem hiding this comment.
LGTM! Left minor comments
WoosukKwon
reviewed
Jun 15, 2023
WoosukKwon
reviewed
Jun 17, 2023
Comment on lines
+95
to
+96
| $ python -m vllm.entrypoints.openai.api_server \ | ||
| $ --model facebook/opt-125m |
Collaborator
There was a problem hiding this comment.
I think we can make it a single line?
WoosukKwon
approved these changes
Jun 17, 2023
Collaborator
WoosukKwon
left a comment
There was a problem hiding this comment.
Looks very good to me! Thanks!
hongxiayang
pushed a commit
to hongxiayang/vllm
that referenced
this pull request
Feb 13, 2024
yukavio
pushed a commit
to yukavio/vllm
that referenced
this pull request
Jul 3, 2024
Summary: Copy benchmark results to EFS Tag-along fixes : fix `if` statements so all `github-action-benchmark` jobs execute. Test: Manual testing : - A10g x 4 Job : https://github.com/neuralmagic/nm-vllm/actions/runs/8458516524 --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
dtrifiro
pushed a commit
to dtrifiro/vllm
that referenced
this pull request
Sep 30, 2024
…herry-pick-145-to-release [release] Start by updating the image
iwooook
pushed a commit
to moreh-dev/vllm
that referenced
this pull request
Nov 29, 2025
xinyu-intel
added a commit
to xinyu-intel/vllm
that referenced
this pull request
Mar 12, 2026
* Support vLLM IR on XPU Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> * test layernorm on xpu Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> --------- Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
jianzs
pushed a commit
to jianzs/vllm
that referenced
this pull request
Apr 2, 2026
* Milestone 1 of Internal Process-level Fault Tolerance * Milestone 1 of Internal Process-level Fault Tolerance (vllm-project#61) * feat(fault-tolerance): add class skeletons for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * config: add configuration options for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * 增加generate_identity和generate_identitys函数 Generate a unique identity for ZMQ ROUTER node * add service startup configuradtion fault report addr * add init WorkerGuard * add engine_core_cmd_addr、fault_report_addr、client_cmd_addr、engine_core_identitys in EngineZmqAddresses init engine_core_cmd_addr、fault_report_addr、client_cmd_addr in launch_core_engines func add _report_engine_dead func in CoreEngineProcManager * init ClientGuard init EngineZmqAddresses engine_core_identitys * init EngineCoreGuard * change generate_identitys to generate_identity_group * code typesetting is optimized * code typesetting is optimized * changed code format ensure every line < 88 chars * changed code format ensure every line < 88 chars fix error Value of type "dict[Any, Any] | None" is not indexable [index] * fix bug Error: vllm/v1/engine/utils.py:122:89: E501 Line too long (117 > 88) Error: vllm/v1/engine/utils.py:1059:9: F402 Import `uuid` from line 6 shadowed by loop variable * fix Error: vllm/v1/engine/utils.py:1045: error: Need type annotation for "uuids" (hint: "uuids: set[<type>] = ...") [var-annotated] * fix error: Value of type "dict[Any, Any] | None" is not indexable [index] * fix error: Value of type "dict[Any, Any] | None" is not indexable [index] Signed-off-by: a798347923 <2645302020@qq.com> * add _send_msg in EngineCoreGuard Signed-off-by: a798347923 <2645302020@qq.com> * add import torch.cuda * add _recv_cmd function docstring that clearly explains the meaning of the return value. * changed recv_fault_msg to recv_msg add ClientGuard __init__ func parameter types * add engine monitor Signed-off-by: TianZhuo <2770730562@qq.com> * Delete requirements/test.txt~ Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Delete vllm/v1/engine/core_client.py~ Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * simply _send_msg and _recv_cmd in EngineCoreGuard * simply recv_msg in ClientGuard * engine: add fault tolerance features for EngineCore. Signed-off-by: fangyuchu <fangyuchu@qq.com> * engine: add timeout mechanism in retry. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add engine monitor * Delete vllm/v1/engine/exceptions.py~ Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> * updata actor_index * updata enginedead flag * handle fault and report exception Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix engine_actor * fix engine_actor fault_info * handle fault and report exception Signed-off-by: w00689259 <wangzhuo66@huawei.com> * delete num_identity * changed try expect * fix debug error * fix one bug. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add fault_report_addr in FaultToleranceConfig * add handle fault&get_fault_info api Signed-off-by: w00689259 <wangzhuo66@huawei.com> * remove fault_report_address in CoreEngineActorManager __init__ Signed-off-by: a798347923 <2645302020@qq.com> * ruff format Signed-off-by: a798347923 <2645302020@qq.com> * add handle fault&get_fault_info api Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix one bug. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add fault_report_port in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * add zmq_addr concatenate with fault_report_addr and fault_report_port Signed-off-by: a798347923 <2645302020@qq.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix some bug * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * remove fault_report_addr in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * refactor: relocate method serialization functions to serial_util.py Signed-off-by: fangyuchu <fangyuchu@qq.com> * fix actor bug * fix actor bug * add engine_core_cmd_addr in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * add and use _stop_worker_execution in EngineCoreGuard Signed-off-by: a798347923 <2645302020@qq.com> * add and use run in WorkerGuard Signed-off-by: a798347923 <2645302020@qq.com> * fix actor bug * fix bug * fix sentinel * fix bug vllm/v1/engine/core.py:847: error: Missing positional argument "tp_size" in call to "EngineCoreGuard" Signed-off-by: a798347923 <2645302020@qq.com> * fix bug error: Missing positional arguments "length", "byteorder" in call to "to_bytes" of "int" Signed-off-by: a798347923 <2645302020@qq.com> * fix bug in fault tolerance mode Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix bug in fault tolerance mode Signed-off-by: w00689259 <wangzhuo66@huawei.com> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <2645302020@qq.com> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <2645302020@qq.com> * add _recv_cmd func use deserialize_method_call and run_method in run func Signed-off-by: a798347923 <2645302020@qq.com> * Update core.py fix bug error: Need type annotation for "kwargs" (hint: "kwargs: dict[<type>, <type>] = ...") Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * add self.ctx.term() in shutdown() Signed-off-by: a798347923 <2645302020@qq.com> * changed import deserialize_method_call,serialize_method_call Signed-off-by: a798347923 <2645302020@qq.com> * changed init worker_guard in init_device Signed-off-by: a798347923 <2645302020@qq.com> * Update core.py add import serialize_method_call Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py changed init WorkerGuard in init_device Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py FIX BUG self.worker_guard: WorkerGuard|None = None Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py fix bug error: Argument 1 to "deserialize_method_call" has incompatible type "str | None"; expected "str" [arg-type] Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py ruff format Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update core.py ruff-format Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <2645302020@qq.com> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <2645302020@qq.com> * Update utils.py delete engine_core_cmd_addr in EngineZmqAddresses Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Remove redundant configuration: fault-pub-port Signed-off-by: fangyuchu <fangyuchu@qq.com> * Send pause instructions after receiving fault info in ClientGuard Signed-off-by: fangyuchu <fangyuchu@qq.com> * change engine_core_guard_identities from dict[int, bytes] to list[bytes] Signed-off-by: a798347923 <2645302020@qq.com> * fix bug "only the worker guard of engine core 0 can receive messages sent from engine core guard Signed-off-by: a798347923 <2645302020@qq.com> * change local_rank to rank_in_group in WorkerGuard Signed-off-by: a798347923 <2645302020@qq.com> * changed del self.client_cmd_registry[int(unhealthy_engine.engine_id)] Signed-off-by: a798347923 <2645302020@qq.com> * add gloo communication timeout * fix some bug * add stateless_process_group gloo_comm_timeout * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix some bug * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add abort request * fix some bug * fix some bug * fix some bug * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * Implementation of two types of pause: a soft one by using flag signals and a hard one by aborting nccl communicators. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refine certain log forms and fix a minor bug in pause function. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refactor and abstract the recv_msg logic in CG,ECG,WG. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add and check method uuid when sending commands and receiving results. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Abstract the logic of sending instructions and waiting responses from FaultHandler Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add options in EngineCoreGuard to recv execution results from WorkerGuard Signed-off-by: fangyuchu <fangyuchu@qq.com> * Support worker reinitialization after hard pause; add task queue in FaultHandler to ensure sequential task execution Signed-off-by: fangyuchu <fangyuchu@qq.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add engine core ut Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add engine core ut Signed-off-by: w00689259 <wangzhuo66@huawei.com> * Ensure WorkerGuard command execution returns result; fix missing set_device when TP>1 Signed-off-by: fangyuchu <fangyuchu@qq.com> * rename& format logger Signed-off-by: w00689259 <wangzhuo66@huawei.com> * rename& format logger Signed-off-by: w00689259 <wangzhuo66@huawei.com> * feat(nccl): enable non-blocking NCCL communicators to support ncclCommAbort Signed-off-by: fangyuchu <fangyuchu@qq.com> * reinit dp_group * fix bug * fix bug * fix bug * fix bug (vllm-project#54) * Move requests to waiting queue instead of abandoing them directly. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add annotation Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix typos Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> * Fix DT and zmq socket closing issues, updated names per feedback and reinitialize dp_group with new port Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve documentation and logging in API server Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix hanging issue in DT; fix hang when aborting communicators from Python side; use queue.Queue for engine_exception_q Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refactor fault tolerance modules by renaming classes to Sentinel and converting engine_registry to a dict Signed-off-by: fangyuchu <fangyuchu@qq.com> * reject requests when engine is in fault status Signed-off-by: fangyuchu <fangyuchu@qq.com> * clear batch_queue for async scheduling Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect initialization of worker_cmd_socket in multi-node setups Signed-off-by: fangyuchu <fangyuchu@qq.com> * Switch from field to Field Signed-off-by: fangyuchu <fangyuchu@qq.com> * Unify start_engine_core_monitor in MPClient and CoreEngineProcManager to reduce duplication Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(Sentinel): Abstract and refactor class to standardize fault … (vllm-project#84) * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> --------- Signed-off-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * fix bug in tests Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * fix bug in tests Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * refactor: improve naming and add comments for readability Signed-off-by: fangyuchu <fangyuchu@qq.com> * Pass fault_tolerance_config through process group creation for future extensibility Signed-off-by: fangyuchu <fangyuchu@qq.com> * Switch to native preempt_request implementation Signed-off-by: fangyuchu <fangyuchu@qq.com> * Rename base_sentinel.py Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(api_server): Split fault_tolerance interfaces into standalone files Signed-off-by: fangyuchu <fangyuchu@qq.com> * Use zmq poll for socket receive in Sentinel DT to avoid hanging Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add shutdown-on-fault-tolerance-failure config option Signed-off-by: fangyuchu <fangyuchu@qq.com> * ClientSentinel: add extra check to prevent repeated pause commands on error Signed-off-by: fangyuchu <fangyuchu@qq.com> * feat(pause): apply pause with target index Signed-off-by: zWaNg3 <389750525@qq.com> * Add middleware for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * fix engine_actor monitoring function bug Signed-off-by: TianZhuo <2770730562@qq.com> * fix engine_actor monitoring bug Signed-off-by: TianZhuo <2770730562@qq.com> * logger output format Signed-off-by: TianZhuo <2770730562@qq.com> * refactor(client_sentinel): support ClientSentinel-Client communication; refactor internal socket logic Signed-off-by: zWaNg3 <389750525@qq.com> * feat: add FaultToleranceRequest and FaultToleranceResult Signed-off-by: fangyuchu <fangyuchu@qq.com> * feat: add EngineStatusType enum and support paused state Signed-off-by: fangyuchu <fangyuchu@qq.com> * Unify the logic of engine monitor for engine process manager and engine actor manager Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix the hanging issue of ClientSentinel in the shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(client_sentinel): rename process_ft_requests_loop function and run function Signed-off-by: zWaNg3 <389750525@qq.com> * Use VllmConfig as the input of Sentinel Modules Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove redundant @DataClass from FaultToleranceConfig Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move hardcoded vllm_fault topic string into FaultToleranceConfig Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update corresponding tests to new ClientSentinel design. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update engine core sentinel tests. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect device settings in the pause of worker sentinel. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Code cleanup and readability improvements Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify FaultInfo and improve the readability Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move sentinels into one file Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove recv_router_dealer_message Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the code in BaseSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the code in EngineCoreSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Introduce fault_tolerance utils and address dataclass Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor: split different sentinels into separate files Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor: split worker sentinel into v1/worker/sentinel for better plugin support and hardware adaptation Signed-off-by: fangyuchu <fangyuchu@qq.com> * remove ThreadSafeDict Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the communication between client, client sentinel and engine core sentinel (vllm-project#137) * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * add _send_utility_result in ClientSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Processes fault-tolerant requests and forwards them to output. Signed-off-by: yzchang-plus <1078477584@qq.com> * replace uncertain code with TODO Signed-off-by: yzchang-plus <1078477584@qq.com> * add monitoring logic in client sentinel and implement thread-safe pause in monitoring. Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(client_sentinel): send ft request using input_address Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): return ft result to client Signed-off-by: zWaNg3 <389750525@qq.com> * Use call_utility_async for interactions between client, client_sentinel and engine core sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Rename engine recovery timeout config Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove upstream and downstream concept from the base sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Support passing stateless dp port to retry Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve the shutdown of client sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add try except for handle_fault in engine core Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(dt tests of sentinels): add dt tests for sentinels Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove torch.cuda API call (vllm-project#148) * Remove torch.cuda API call Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove unwanted shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fault Tolerant EP: Implement fault-report Signed-off-by: fangyuchu <fangyuchu@qq.com> * merge engine monitor codes Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move FT router attachment point and simplify FaultInfo initialization logic Signed-off-by: fangyuchu <fangyuchu@qq.com> * Revise DT for Fault Report Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect count of engine core index Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update engine process monitoring codes Signed-off-by: fangyuchu <fangyuchu@qq.com> * [Bugfix] revise engine monitor logic on account of dead processes Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve the format of the fault report json Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect shutdown of engine manager Signed-off-by: fangyuchu <fangyuchu@qq.com> * Avoid error logging in normal shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> * handle zmq error Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close #106.