-
Notifications
You must be signed in to change notification settings - Fork 600
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
INFO 05-07 08:36:01 [importing.py:17] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 05-07 08:36:01 [importing.py:28] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernelcompilation.
INFO 05-07 08:36:01 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 05-07 08:36:02 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 05-07 08:36:02 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 05-07 08:36:02 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 05-07 08:36:02 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 05-07 08:36:02 [__init__.py:44] plugin ascend loaded.
INFO 05-07 08:36:02 [__init__.py:230] Platform plugin ascend is activated
WARNING 05-07 08:36:04 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.35
Python version: 3.10.17 (main, Apr 30 2025, 16:00:31) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-136.12.0.88.4.ctl3.aarch64-aarch64-with-glibc2.35
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
BIOS Vendor ID: HiSilicon
Model name: Kunpeng-920
BIOS Model name: HUAWEI Kunpeng 920 5250
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 4
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[conda] Could not collect
vLLM Version: 0.8.5.post1
vLLM Ascend Version: 0.8.5rc1
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
VLLM_USE_V1=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2.1 Version: 24.1.rc2.1 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B3 | OK | 90.9 32 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 3368 / 65536 |
+===========================+===============+====================================================+
| 1 910B3 | OK | 89.2 29 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 2 910B3 | OK | 90.7 30 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 3 910B3 | OK | 95.4 30 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 4 910B3 | OK | 90.8 37 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 5 910B3 | OK | 88.4 34 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 6 910B3 | OK | 95.8 35 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 3365 / 65536 |
+===========================+===============+====================================================+
| 7 910B3 | OK | 92.0 36 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3368 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
When I run Qwen3-235B-A22B on 2 nodes with 16x910B3,model weight can be loaded ok. But when I send a request, the vllm server crashed. This is the log below.
ERROR 05-07 08:23:58 [core.py:398] EngineCore encountered a fatal error.
ERROR 05-07 08:23:58 [core.py:398] Traceback (most recent call last):
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 389, in run_engine_core
ERROR 05-07 08:23:58 [core.py:398] engine_core.run_busy_loop()
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 413, in run_busy_loop
ERROR 05-07 08:23:58 [core.py:398] self._process_engine_step()
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 438, in _process_engine_step
ERROR 05-07 08:23:58 [core.py:398] outputs = self.step_fn()
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 203, in step
ERROR 05-07 08:23:58 [core.py:398] output = self.model_executor.execute_model(scheduler_output)
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/v1/executor/ray_distributed_executor.py", line 57, in execute_model
ERROR 05-07 08:23:58 [core.py:398] return refs[0].get()
ERROR 05-07 08:23:58 [core.py:398] File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 150, in get
ERROR 05-07 08:23:58 [core.py:398] return _process_return_vals(return_vals, True)
ERROR 05-07 08:23:58 [core.py:398] File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 27, in _process_return_vals
ERROR 05-07 08:23:58 [core.py:398] raise val.as_instanceof_cause()
ERROR 05-07 08:23:58 [core.py:398] ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.__ray_call__() (pid=18591, ip=172.19.0.28)
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 130, in execute_model_ray
ERROR 05-07 08:23:58 [core.py:398] self.setup_device_if_necessary()
ERROR 05-07 08:23:58 [core.py:398] File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 117, in setup_device_if_necessary
ERROR 05-07 08:23:58 [core.py:398] torch.cuda.set_device(self.worker.device)
ERROR 05-07 08:23:58 [core.py:398] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/__init__.py", line 476, in set_device
ERROR 05-07 08:23:58 [core.py:398] device = _get_device_index(device)
ERROR 05-07 08:23:58 [core.py:398] File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
ERROR 05-07 08:23:58 [core.py:398] raise ValueError(f"Expected a cuda device, but got: {device}")
ERROR 05-07 08:23:58 [core.py:398] ValueError: Expected a cuda device, but got: npu:0
INFO 05-07 08:23:58 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
2025-05-07 08:23:58,217 INFO compiled_dag_node.py:2173 -- Tearing down compiled DAG
ERROR 05-07 08:23:58 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 05-07 08:23:58 [async_llm.py:399] Traceback (most recent call last):
ERROR 05-07 08:23:58 [async_llm.py:399] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 05-07 08:23:58 [async_llm.py:399] outputs = await engine_core.get_output_async()
ERROR 05-07 08:23:58 [async_llm.py:399] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 05-07 08:23:58 [async_llm.py:399] raise self._format_exception(outputs) from None
ERROR 05-07 08:23:58 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 05-07 08:23:58 [async_llm.py:324] Request cmpl-5a2affcafa984ca3aaf71d064ee59067-0 failed (engine dead).
INFO: 127.0.0.1:57468 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 0d858f0749acd09331c7018001000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, a3570d045e221611145cd3f901000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, baae8c108247476f262c33bb01000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, ed5458ab9f87cc254621707201000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, cf4bf04aacbdeb2f50f5e61701000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, ba1c9a30620c9d316ae3941a01000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, fcf4d090a77e9838b4f08d2901000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 7db16050cdec7392f87ed58701000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 4b6fc5501e61fdf10b1cd55401000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 2e296490115d90d8ab17234601000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 0993a140aa7bd9cebdf5f81101000000)
2025-05-07 08:23:58,226 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, a4d6a013dd726f3ac1047ae101000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 89cad0caf7a239f3293ea30501000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, c19224d49013fc24e0de0ee201000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 2585f6f9494e325736ae915301000000)
2025-05-07 08:23:58,227 INFO compiled_dag_node.py:2178 -- Cancelling compiled worker on actor: Actor(RayWorkerWrapper, 3e4e31644b452faf13558dc801000000)
INFO: Shutting down
2025-05-07 08:23:58,267 INFO compiled_dag_node.py:2200 -- Waiting for worker tasks to exit
2025-05-07 08:23:58,269 INFO compiled_dag_node.py:2203 -- Teardown complete
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 400, in run_engine_core
raise e
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 389, in run_engine_core
engine_core.run_busy_loop()
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 413, in run_busy_loop
self._process_engine_step()
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 438, in _process_engine_step
outputs = self.step_fn()
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 203, in step
output = self.model_executor.execute_model(scheduler_output)
File "/vllm-workspace/vllm/vllm/v1/executor/ray_distributed_executor.py", line 57, in execute_model
return refs[0].get()
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 150, in get
return _process_return_vals(return_vals, True)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/experimental/compiled_dag_ref.py", line 27, in _process_return_vals
raise val.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.__ray_call__() (pid=18591, ip=172.19.0.28)
File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 130, in execute_model_ray
self.setup_device_if_necessary()
File "/vllm-workspace/vllm/vllm/executor/ray_utils.py", line 117, in setup_device_if_necessary
torch.cuda.set_device(self.worker.device)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/__init__.py", line 476, in set_device
device = _get_device_index(device)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
raise ValueError(f"Expected a cuda device, but got: {device}")
ValueError: Expected a cuda device, but got: npu:0
(raylet) [2025-05-07 08:23:58,299 C 18100 18100] (raylet) experimental_mutable_object_provider.cc:156: Check failed: object_manager_->WriteAcquire(info.local_object_id, total_data_size, nullptr, total_metadata_size, info.num_readers, object_backing_store) Status not OK: ChannelError: Channel closed.
(raylet) *** StackTrace Information ***
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xd18eb8) [0xaaaab3428eb8] ray::operator<<()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xd1b7e8) [0xaaaab342b7e8] ray::RayLog::~RayLog()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x456bb0) [0xaaaab2b66bb0] ray::core::experimental::MutableObjectProvider::HandlePushMutableObject()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x2421d0) [0xaaaab29521d0] ray::raylet::NodeManager::HandlePushMutableObject()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x2a4c60) [0xaaaab29b4c60] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d875c) [0xaaaab2de875c] EventTracker::RecordExecution()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d3e90) [0xaaaab2de3e90] std::_Function_handler<>::_M_invoke()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x6d4320) [0xaaaab2de4320] boost::asio::detail::completion_handler<>::do_complete()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf5630) [0xaaaab3405630] boost::asio::detail::scheduler::do_run_one()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf78c4) [0xaaaab34078c4] boost::asio::detail::scheduler::run()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0xcf7ec8) [0xaaaab3407ec8] boost::asio::io_context::run()
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x1ace44) [0xaaaab28bce44] main
(raylet) /lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff9e4d73fc]
(raylet) /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffff9e4d74cc] __libc_start_main
(raylet) /usr/local/python3.10.17/lib/python3.10/site-packages/ray/core/src/ray/raylet/raylet(+0x1ffbdc) [0xaaaab290fbdc]
(raylet)
(RayWorkerWrapper pid=4635, ip=172.19.0.29) [rank9]:[W507 08:22:16.546296931 compiler_depend.ts:28] Warning: The oprator of MoeInitRouting will be removed from Pytorch and switch to AscendSpeed after 630. (function operator()) [repeated 15x across cluster]
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [18336]
*** SIGTERM received at time=1746606238 on cpu 141 ***
PC: @ 0xffffb456ea9c (unknown) select
@ 0xfffde7359698 464 absl::lts_20230802::AbslFailureSignalHandler()
@ 0xffffb4a438ec 606883984 (unknown)
@ 0xffffb489d194 128 time_sleep
@ 0xffffb4749d1c 112 cfunction_vectorcall_O
@ 0xffffb46ae64c 48 _PyEval_EvalFrameDefault
@ 0xffffb47edf34 448 _PyEval_Vector
@ 0xffffb46a9f58 48 _PyEval_EvalFrameDefault
@ 0xffffb47edf34 448 _PyEval_Vector
@ 0xffffb489870c 48 atexit_callfuncs
@ 0xffffb482dc2c 64 Py_FinalizeEx
@ 0xffffb482ea54 80 Py_Exit
@ 0xffffb4833418 32 _PyErr_PrintEx
@ 0xffffb483409c 144 PyRun_SimpleStringFlags
@ 0xffffb485333c 32 Py_RunMain
@ 0xffffb4853d4c 224 Py_BytesMain
@ 0xffffb44b73fc 192 (unknown)
@ 0xffffb44b74cc 272 __libc_start_main
[2025-05-07 08:23:58,367 E 18484 18484] logging.cc:496: *** SIGTERM received at time=1746606238 on cpu 141 ***
[2025-05-07 08:23:58,367 E 18484 18484] logging.cc:496: PC: @ 0xffffb456ea9c (unknown) select
[2025-05-07 08:23:58,372 E 18484 18484] logging.cc:496: @ 0xfffde73596c0 464 absl::lts_20230802::AbslFailureSignalHandler()
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb4a438ec 606883984 (unknown)
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb489d194 128 time_sleep
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb4749d1c 112 cfunction_vectorcall_O
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb46ae64c 48 _PyEval_EvalFrameDefault
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb47edf34 448 _PyEval_Vector
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb46a9f58 48 _PyEval_EvalFrameDefault
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb47edf34 448 _PyEval_Vector
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb489870c 48 atexit_callfuncs
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb482dc2c 64 Py_FinalizeEx
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb482ea54 80 Py_Exit
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb4833418 32 _PyErr_PrintEx
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb483409c 144 PyRun_SimpleStringFlags
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb485333c 32 Py_RunMain
[2025-05-07 08:23:58,375 E 18484 18484] logging.cc:496: @ 0xffffb4853d4c 224 Py_BytesMain
[2025-05-07 08:23:58,377 E 18484 18484] logging.cc:496: @ 0xffffb44b73fc 192 (unknown)
[2025-05-07 08:23:58,377 E 18484 18484] logging.cc:496: @ 0xffffb44b74cc 272 __libc_start_main
Exception ignored in atexit callback: <function shutdown at 0xfffde57845e0>
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 1957, in shutdown
time.sleep(0.5)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 1539, in sigterm_handler
sys.exit(signum)
SystemExit: 15
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working