Add quickstart guide by zhuohan123 · Pull Request #148 · vllm-project/vllm

zhuohan123 · 2023-06-12T16:20:45Z

Close #106.

WoosukKwon

LGTM! Left minor comments

WoosukKwon · 2023-06-17T17:01:32Z

+    $ python -m vllm.entrypoints.openai.api_server \
+    $     --model facebook/opt-125m


I think we can make it a single line?

WoosukKwon

Looks very good to me! Thanks!

Summary: Copy benchmark results to EFS Tag-along fixes : fix `if` statements so all `github-action-benchmark` jobs execute. Test: Manual testing : - A10g x 4 Job : https://github.com/neuralmagic/nm-vllm/actions/runs/8458516524 --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

…herry-pick-145-to-release [release] Start by updating the image

* Support vLLM IR on XPU Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> * test layernorm on xpu Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> --------- Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>

* Milestone 1 of Internal Process-level Fault Tolerance * Milestone 1 of Internal Process-level Fault Tolerance (vllm-project#61) * feat(fault-tolerance): add class skeletons for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * config: add configuration options for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * 增加generate_identity和generate_identitys函数 Generate a unique identity for ZMQ ROUTER node * add service startup configuradtion fault report addr * add init WorkerGuard * add engine_core_cmd_addr、fault_report_addr、client_cmd_addr、engine_core_identitys in EngineZmqAddresses init engine_core_cmd_addr、fault_report_addr、client_cmd_addr in launch_core_engines func add _report_engine_dead func in CoreEngineProcManager * init ClientGuard init EngineZmqAddresses engine_core_identitys * init EngineCoreGuard * change generate_identitys to generate_identity_group * code typesetting is optimized * code typesetting is optimized * changed code format ensure every line < 88 chars * changed code format ensure every line < 88 chars fix error Value of type "dict[Any, Any] | None" is not indexable [index] * fix bug Error: vllm/v1/engine/utils.py:122:89: E501 Line too long (117 > 88) Error: vllm/v1/engine/utils.py:1059:9: F402 Import `uuid` from line 6 shadowed by loop variable * fix Error: vllm/v1/engine/utils.py:1045: error: Need type annotation for "uuids" (hint: "uuids: set[<type>] = ...") [var-annotated] * fix error: Value of type "dict[Any, Any] | None" is not indexable [index] * fix error: Value of type "dict[Any, Any] | None" is not indexable [index] Signed-off-by: a798347923 <2645302020@qq.com> * add _send_msg in EngineCoreGuard Signed-off-by: a798347923 <2645302020@qq.com> * add import torch.cuda * add _recv_cmd function docstring that clearly explains the meaning of the return value. * changed recv_fault_msg to recv_msg add ClientGuard __init__ func parameter types * add engine monitor Signed-off-by: TianZhuo <2770730562@qq.com> * Delete requirements/test.txt~ Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Delete vllm/v1/engine/core_client.py~ Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * simply _send_msg and _recv_cmd in EngineCoreGuard * simply recv_msg in ClientGuard * engine: add fault tolerance features for EngineCore. Signed-off-by: fangyuchu <fangyuchu@qq.com> * engine: add timeout mechanism in retry. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add engine monitor * Delete vllm/v1/engine/exceptions.py~ Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> * updata actor_index * updata enginedead flag * handle fault and report exception Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix engine_actor * fix engine_actor fault_info * handle fault and report exception Signed-off-by: w00689259 <wangzhuo66@huawei.com> * delete num_identity * changed try expect * fix debug error * fix one bug. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add fault_report_addr in FaultToleranceConfig * add handle fault&get_fault_info api Signed-off-by: w00689259 <wangzhuo66@huawei.com> * remove fault_report_address in CoreEngineActorManager __init__ Signed-off-by: a798347923 <2645302020@qq.com> * ruff format Signed-off-by: a798347923 <2645302020@qq.com> * add handle fault&get_fault_info api Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix one bug. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add fault_report_port in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * add zmq_addr concatenate with fault_report_addr and fault_report_port Signed-off-by: a798347923 <2645302020@qq.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix some bug * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fault reporter bug fix Signed-off-by: w00689259 <wangzhuo66@huawei.com> * remove fault_report_addr in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * refactor: relocate method serialization functions to serial_util.py Signed-off-by: fangyuchu <fangyuchu@qq.com> * fix actor bug * fix actor bug * add engine_core_cmd_addr in FaultToleranceConfig Signed-off-by: a798347923 <2645302020@qq.com> * add and use _stop_worker_execution in EngineCoreGuard Signed-off-by: a798347923 <2645302020@qq.com> * add and use run in WorkerGuard Signed-off-by: a798347923 <2645302020@qq.com> * fix actor bug * fix bug * fix sentinel * fix bug vllm/v1/engine/core.py:847: error: Missing positional argument "tp_size" in call to "EngineCoreGuard" Signed-off-by: a798347923 <2645302020@qq.com> * fix bug error: Missing positional arguments "length", "byteorder" in call to "to_bytes" of "int" Signed-off-by: a798347923 <2645302020@qq.com> * fix bug in fault tolerance mode Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix bug in fault tolerance mode Signed-off-by: w00689259 <wangzhuo66@huawei.com> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <2645302020@qq.com> * change fault_report_port to internal_fault_report_port add external_fault_notify_port Signed-off-by: a798347923 <2645302020@qq.com> * add _recv_cmd func use deserialize_method_call and run_method in run func Signed-off-by: a798347923 <2645302020@qq.com> * Update core.py fix bug error: Need type annotation for "kwargs" (hint: "kwargs: dict[<type>, <type>] = ...") Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * add self.ctx.term() in shutdown() Signed-off-by: a798347923 <2645302020@qq.com> * changed import deserialize_method_call,serialize_method_call Signed-off-by: a798347923 <2645302020@qq.com> * changed init worker_guard in init_device Signed-off-by: a798347923 <2645302020@qq.com> * Update core.py add import serialize_method_call Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py changed init WorkerGuard in init_device Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py FIX BUG self.worker_guard: WorkerGuard|None = None Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py fix bug error: Argument 1 to "deserialize_method_call" has incompatible type "str | None"; expected "str" [arg-type] Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update gpu_worker.py ruff format Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Update core.py ruff-format Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * actively send exception information Signed-off-by: w00689259 <wangzhuo66@huawei.com> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <2645302020@qq.com> * change engine_core_cmd_addr(str) to engine_core_cmd_addrs(list[str]) in EngineZmqAddresses Signed-off-by: a798347923 <2645302020@qq.com> * Update utils.py delete engine_core_cmd_addr in EngineZmqAddresses Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> * Remove redundant configuration: fault-pub-port Signed-off-by: fangyuchu <fangyuchu@qq.com> * Send pause instructions after receiving fault info in ClientGuard Signed-off-by: fangyuchu <fangyuchu@qq.com> * change engine_core_guard_identities from dict[int, bytes] to list[bytes] Signed-off-by: a798347923 <2645302020@qq.com> * fix bug "only the worker guard of engine core 0 can receive messages sent from engine core guard Signed-off-by: a798347923 <2645302020@qq.com> * change local_rank to rank_in_group in WorkerGuard Signed-off-by: a798347923 <2645302020@qq.com> * changed del self.client_cmd_registry[int(unhealthy_engine.engine_id)] Signed-off-by: a798347923 <2645302020@qq.com> * add gloo communication timeout * fix some bug * add stateless_process_group gloo_comm_timeout * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix some bug * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * reconstruct fault receiver&fault handler Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix return format Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add abort request * fix some bug * fix some bug * fix some bug * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add dt for client guard Signed-off-by: w00689259 <wangzhuo66@huawei.com> * Implementation of two types of pause: a soft one by using flag signals and a hard one by aborting nccl communicators. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refine certain log forms and fix a minor bug in pause function. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refactor and abstract the recv_msg logic in CG,ECG,WG. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add and check method uuid when sending commands and receiving results. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Abstract the logic of sending instructions and waiting responses from FaultHandler Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add options in EngineCoreGuard to recv execution results from WorkerGuard Signed-off-by: fangyuchu <fangyuchu@qq.com> * Support worker reinitialization after hard pause; add task queue in FaultHandler to ensure sequential task execution Signed-off-by: fangyuchu <fangyuchu@qq.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * resolve conflicts Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add engine core ut Signed-off-by: w00689259 <wangzhuo66@huawei.com> * add engine core ut Signed-off-by: w00689259 <wangzhuo66@huawei.com> * Ensure WorkerGuard command execution returns result; fix missing set_device when TP>1 Signed-off-by: fangyuchu <fangyuchu@qq.com> * rename& format logger Signed-off-by: w00689259 <wangzhuo66@huawei.com> * rename& format logger Signed-off-by: w00689259 <wangzhuo66@huawei.com> * feat(nccl): enable non-blocking NCCL communicators to support ncclCommAbort Signed-off-by: fangyuchu <fangyuchu@qq.com> * reinit dp_group * fix bug * fix bug * fix bug * fix bug (vllm-project#54) * Move requests to waiting queue instead of abandoing them directly. Signed-off-by: fangyuchu <fangyuchu@qq.com> * add annotation Signed-off-by: w00689259 <wangzhuo66@huawei.com> * fix typos Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> * Fix DT and zmq socket closing issues, updated names per feedback and reinitialize dp_group with new port Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve documentation and logging in API server Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix hanging issue in DT; fix hang when aborting communicators from Python side; use queue.Queue for engine_exception_q Signed-off-by: fangyuchu <fangyuchu@qq.com> * Refactor fault tolerance modules by renaming classes to Sentinel and converting engine_registry to a dict Signed-off-by: fangyuchu <fangyuchu@qq.com> * reject requests when engine is in fault status Signed-off-by: fangyuchu <fangyuchu@qq.com> * clear batch_queue for async scheduling Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect initialization of worker_cmd_socket in multi-node setups Signed-off-by: fangyuchu <fangyuchu@qq.com> * Switch from field to Field Signed-off-by: fangyuchu <fangyuchu@qq.com> * Unify start_engine_core_monitor in MPClient and CoreEngineProcManager to reduce duplication Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(Sentinel): Abstract and refactor class to standardize fault … (vllm-project#84) * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> * refactor(Sentinel): Abstract and refactor class to standardize fault tolerance logic Signed-off-by: w00689259 <wangzhuo66@huawei.com> --------- Signed-off-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * fix bug in tests Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * fix bug in tests Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <389750525@qq.com> * refactor: improve naming and add comments for readability Signed-off-by: fangyuchu <fangyuchu@qq.com> * Pass fault_tolerance_config through process group creation for future extensibility Signed-off-by: fangyuchu <fangyuchu@qq.com> * Switch to native preempt_request implementation Signed-off-by: fangyuchu <fangyuchu@qq.com> * Rename base_sentinel.py Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(api_server): Split fault_tolerance interfaces into standalone files Signed-off-by: fangyuchu <fangyuchu@qq.com> * Use zmq poll for socket receive in Sentinel DT to avoid hanging Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add shutdown-on-fault-tolerance-failure config option Signed-off-by: fangyuchu <fangyuchu@qq.com> * ClientSentinel: add extra check to prevent repeated pause commands on error Signed-off-by: fangyuchu <fangyuchu@qq.com> * feat(pause): apply pause with target index Signed-off-by: zWaNg3 <389750525@qq.com> * Add middleware for fault tolerance Signed-off-by: fangyuchu <fangyuchu@qq.com> * fix engine_actor monitoring function bug Signed-off-by: TianZhuo <2770730562@qq.com> * fix engine_actor monitoring bug Signed-off-by: TianZhuo <2770730562@qq.com> * logger output format Signed-off-by: TianZhuo <2770730562@qq.com> * refactor(client_sentinel): support ClientSentinel-Client communication; refactor internal socket logic Signed-off-by: zWaNg3 <389750525@qq.com> * feat: add FaultToleranceRequest and FaultToleranceResult Signed-off-by: fangyuchu <fangyuchu@qq.com> * feat: add EngineStatusType enum and support paused state Signed-off-by: fangyuchu <fangyuchu@qq.com> * Unify the logic of engine monitor for engine process manager and engine actor manager Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix the hanging issue of ClientSentinel in the shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(client_sentinel): rename process_ft_requests_loop function and run function Signed-off-by: zWaNg3 <389750525@qq.com> * Use VllmConfig as the input of Sentinel Modules Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove redundant @DataClass from FaultToleranceConfig Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move hardcoded vllm_fault topic string into FaultToleranceConfig Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update corresponding tests to new ClientSentinel design. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update engine core sentinel tests. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect device settings in the pause of worker sentinel. Signed-off-by: fangyuchu <fangyuchu@qq.com> * Code cleanup and readability improvements Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify FaultInfo and improve the readability Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move sentinels into one file Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove recv_router_dealer_message Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the code in BaseSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the code in EngineCoreSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Introduce fault_tolerance utils and address dataclass Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor: split different sentinels into separate files Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor: split worker sentinel into v1/worker/sentinel for better plugin support and hardware adaptation Signed-off-by: fangyuchu <fangyuchu@qq.com> * remove ThreadSafeDict Signed-off-by: fangyuchu <fangyuchu@qq.com> * Simplify the communication between client, client sentinel and engine core sentinel (vllm-project#137) * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): use core_client input_socket to broadcast ft_requst Signed-off-by: zWaNg3 <389750525@qq.com> * add _send_utility_result in ClientSentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Processes fault-tolerant requests and forwards them to output. Signed-off-by: yzchang-plus <1078477584@qq.com> * replace uncertain code with TODO Signed-off-by: yzchang-plus <1078477584@qq.com> * add monitoring logic in client sentinel and implement thread-safe pause in monitoring. Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(client_sentinel): send ft request using input_address Signed-off-by: zWaNg3 <389750525@qq.com> * refactor(client_sentinel): return ft result to client Signed-off-by: zWaNg3 <389750525@qq.com> * Use call_utility_async for interactions between client, client_sentinel and engine core sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Rename engine recovery timeout config Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove upstream and downstream concept from the base sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Support passing stateless dp port to retry Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve the shutdown of client sentinel Signed-off-by: fangyuchu <fangyuchu@qq.com> * Add try except for handle_fault in engine core Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> * refactor(dt tests of sentinels): add dt tests for sentinels Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove torch.cuda API call (vllm-project#148) * Remove torch.cuda API call Signed-off-by: fangyuchu <fangyuchu@qq.com> * Remove unwanted shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fault Tolerant EP: Implement fault-report Signed-off-by: fangyuchu <fangyuchu@qq.com> * merge engine monitor codes Signed-off-by: fangyuchu <fangyuchu@qq.com> * Move FT router attachment point and simplify FaultInfo initialization logic Signed-off-by: fangyuchu <fangyuchu@qq.com> * Revise DT for Fault Report Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect count of engine core index Signed-off-by: fangyuchu <fangyuchu@qq.com> * Update engine process monitoring codes Signed-off-by: fangyuchu <fangyuchu@qq.com> * [Bugfix] revise engine monitor logic on account of dead processes Signed-off-by: fangyuchu <fangyuchu@qq.com> * Improve the format of the fault report json Signed-off-by: fangyuchu <fangyuchu@qq.com> * Fix incorrect shutdown of engine manager Signed-off-by: fangyuchu <fangyuchu@qq.com> * Avoid error logging in normal shutdown Signed-off-by: fangyuchu <fangyuchu@qq.com> * handle zmq error Signed-off-by: fangyuchu <fangyuchu@qq.com> --------- Signed-off-by: fangyuchu <fangyuchu@qq.com> Signed-off-by: a798347923 <2645302020@qq.com> Signed-off-by: TianZhuo <2770730562@qq.com> Signed-off-by: a798347923 <39047817+a798347923@users.noreply.github.com> Signed-off-by: 205150940 <112750056+205150940@users.noreply.github.com> Signed-off-by: w00689259 <wangzhuo66@huawei.com> Signed-off-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Signed-off-by: zWaNg3 <389750525@qq.com> Signed-off-by: yzchang-plus <1078477584@qq.com> Co-authored-by: zWaNg3 <37772915+zWaNg3@users.noreply.github.com> Co-authored-by: a798347923 <2645302020@qq.com> Co-authored-by: TianZhuo <2770730562@qq.com> Co-authored-by: 205150940 <112750056+205150940@users.noreply.github.com> Co-authored-by: a798347923 <39047817+a798347923@users.noreply.github.com> Co-authored-by: w00689259 <wangzhuo66@huawei.com> Co-authored-by: zWaNg3 <389750525@qq.com> Co-authored-by: yzchang-plus <1078477584@qq.com> Signed-off-by: fangyuchu <fangyuchu@qq.com>

zhuohan123 added 2 commits June 10, 2023 21:57

[WIP] Quickstart guide

67b2a33

Add Simple FastAPI Server and OpenAI-Compatible Server

cac2757

zhuohan123 requested a review from WoosukKwon June 13, 2023 08:28

WoosukKwon reviewed Jun 14, 2023

View reviewed changes

Comment thread docs/source/getting_started/quickstart.rst Outdated

WoosukKwon reviewed Jun 14, 2023

View reviewed changes

Comment thread docs/source/getting_started/quickstart.rst Outdated

WoosukKwon approved these changes Jun 14, 2023

View reviewed changes

WoosukKwon reviewed Jun 15, 2023

View reviewed changes

Comment thread docs/source/getting_started/quickstart.rst Outdated

zhuohan123 added 6 commits June 17, 2023 09:16

Merge branch 'main' into quickstart-guide

a8c8d29

Modify first several paragraphs of the quickstart guide

3da1e0b

Merge branch 'main' into quickstart-guide

496f632

Rewrite offline batched inference

5dc55d4

Add quickstart guide

afd9141

Small fixes

fd8090c

WoosukKwon reviewed Jun 17, 2023

View reviewed changes

WoosukKwon approved these changes Jun 17, 2023

View reviewed changes

zhuohan123 merged commit bec7b2d into main Jun 17, 2023

zhuohan123 deleted the quickstart-guide branch June 18, 2023 07:22

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add quickstart guide (vllm-project#148)

cbd397d

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Sep 30, 2024

Merge pull request vllm-project#148 from openshift-cherrypick-robot/c…

1fe2bc9

…herry-pick-145-to-release [release] Start by updating the image

iwooook pushed a commit to moreh-dev/vllm that referenced this pull request Nov 29, 2025

Fixed Mistral7B config.head_dim == None (vllm-project#148)

3fc3263

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add quickstart guide#148

Add quickstart guide#148
zhuohan123 merged 8 commits intomainfrom
quickstart-guide

zhuohan123 commented Jun 12, 2023

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon Jun 17, 2023 •

edited

Loading

Uh oh!

WoosukKwon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		$ python -m vllm.entrypoints.openai.api_server \
		$ --model facebook/opt-125m

Uh oh!

Conversation

zhuohan123 commented Jun 12, 2023

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon Jun 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WoosukKwon Jun 17, 2023 •

edited

Loading