-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
(gpt_oss_edit) [[email protected] /data/users/axia/gitrepos/vllm/vllm (fix-resp-response-api)]$ python collect_env.py
Collecting environment information...
==============================
System Info
==============================
OS : CentOS Stream 9 (x86_64)
GCC version : (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11)
Clang version : Could not collect
CMake version : version 4.1.0
Libc version : glibc-2.34
==============================
PyTorch Info
==============================
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.12.11 (main, Aug 14 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-11)] (64-bit runtime)
Python platform : Linux-6.4.3-0_fbk15_hardened_2630_gf27365f948db-x86_64-with-glibc2.34
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : Could not collect
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA H100
GPU 1: NVIDIA H100
GPU 2: NVIDIA H100
GPU 3: NVIDIA H100
GPU 4: NVIDIA H100
GPU 5: NVIDIA H100
GPU 6: NVIDIA H100
GPU 7: NVIDIA H100
Nvidia driver version : 550.90.07
cuDNN version : Probably one of the following:
/usr/lib64/libcudnn.so.9.10.1
/usr/lib64/libcudnn_adv.so.9.10.1
/usr/lib64/libcudnn_cnn.so.9.10.1
/usr/lib64/libcudnn_engines_precompiled.so.9.10.1
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.10.1
/usr/lib64/libcudnn_graph.so.9.10.1
/usr/lib64/libcudnn_heuristic.so.9.10.1
/usr/lib64/libcudnn_ops.so.9.10.1
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 368
On-line CPU(s) list: 0-367
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 1
Core(s) per socket: 368
Socket(s): 1
Stepping: 1
BogoMIPS: 4792.80
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm flush_l1d arch_capabilities
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 23 MiB (368 instances)
L1i cache: 23 MiB (368 instances)
L2 cache: 184 MiB (368 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-367
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] efficientnet_pytorch==0.7.1
[pip3] mypy==1.11.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] open_clip_torch==2.32.0
[pip3] pytorch-lightning==2.5.2
[pip3] pyzmq==27.1.0
[pip3] segmentation_models_pytorch==0.4.0
[pip3] sentence-transformers==3.2.1
[pip3] terratorch==1.0.2
[pip3] torch==2.8.0+cu128
[pip3] torchaudio==2.8.0+cu128
[pip3] torchgeo==0.7.0
[pip3] torchmetrics==1.7.4
[pip3] torchvision==0.23.0+cu128
[pip3] transformers==4.55.2
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.4.0
[pip3] tritonclient==2.51.0
[pip3] vector-quantize-pytorch==1.21.2
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.10.2rc2.dev171+g170129eb2 (git sha: 170129eb2)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 0-367 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 0-367 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 0-367 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 0-367 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 0-367 0 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 0-367 0 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 0-367 0 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X 0-367 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
CUDA_CACHE_PATH=/data/users/axia/.nv/ComputeCache
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
We're working on supporting streaming for ResponsesAPI for GPT-OSS. I've noticed that with tool calling recently integrated to GPTOSS for streaming (#23386, #23927), when I run the following I've noticed we are not cleaning up our context properly.
I think the issue is related to:
- in serving_responses, we create an asyncContextManager https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_responses.py#L465
- this then initializes tool_sessions here: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/context.py#L349
- but when we exit, we don't clean up the tool sessions properly (maybe we enter 2 contexts but exit in the wrong order/clean up all of them in the inner context so the outer context returns this error)
I'm looking into mak PR to fix it, but wanted to share this bug with the general community / if anyone has ideas for this.
Thoughts
- I don't get this error in the non stream mode
- If the client is not making a request with tools, we probably shouldn't initialize tool sessions at all?
Some related issues
modelcontextprotocol/python-sdk#79
modelcontextprotocol/python-sdk#521
Chainlit/chainlit#2182
cc @Jialin @yeqcharlotte @lacora @heheda12345
curl http://localhost:20001/v1/responses -H "Content-Type: application/json" -N -d '{
"model": "/data/users/axia/checkpoints/gpt-oss-120b",
"input": [
{
"role": "user",
"content": "Hello."
}
],
"stream": true
}'
INFO 09-24 15:43:19 [context.py:344] init session browser
INFO: 127.0.0.1:34478 - "GET /browser/sse HTTP/1.1" 200 OK
I0924 15:43:19.259000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/browser/sse "HTTP/1.1 200 OK"
INFO: 127.0.0.1:34490 - "POST /browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 HTTP/1.1" 202 Accepted
I0924 15:43:19.262000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:344] init session python
INFO: 127.0.0.1:34490 - "POST /browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 HTTP/1.1" 202 Accepted
I0924 15:43:19.275000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 "HTTP/1.1 202 Accepted"
INFO: 127.0.0.1:34504 - "GET /python/sse HTTP/1.1" 200 OK
I0924 15:43:19.278000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/python/sse "HTTP/1.1 200 OK"
INFO: 127.0.0.1:34516 - "POST /python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 HTTP/1.1" 202 Accepted
I0924 15:43:19.281000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:344] init session container
INFO: 127.0.0.1:34516 - "POST /python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 HTTP/1.1" 202 Accepted
I0924 15:43:19.294000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 "HTTP/1.1 202 Accepted"
INFO: 127.0.0.1:34520 - "GET /container/sse HTTP/1.1" 200 OK
I0924 15:43:19.297000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/container/sse "HTTP/1.1 200 OK"
INFO: 127.0.0.1:34524 - "POST /container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 HTTP/1.1" 202 Accepted
I0924 15:43:19.299000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 "HTTP/1.1 202 Accepted"
INFO: 127.0.0.1:34524 - "POST /container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 HTTP/1.1" 202 Accepted
I0924 15:43:19.343000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20c55ce80>
warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Unexpected error in server stream handler:
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20c53e1a0>
warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20d3c6a40>
warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
yield read_stream, write_stream
File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
cb_suppress = await cb(*exc_details)
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
raise RuntimeError(
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Exception Group Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
| yield read_stream, write_stream
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
| return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
| if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Exception Group Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
| yield read_stream, write_stream
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
| return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 778, in __aexit__
| return self.cancel_scope.__exit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
|
| During handling of the above exception, another exception occurred:
|
| Exception Group Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
| yield read_stream, write_stream
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
| return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 778, in __aexit__
| return self.cancel_scope.__exit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
+------------------------------------
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
| await self.gen.athrow(typ, value, traceback)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/tool_server.py", line 181, in new_session
| yield mcp_client
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
| await self.gen.athrow(typ, value, traceback)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
| async with anyio.create_task_group() as tg:
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
| if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
| yield read_stream, write_stream
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
| return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
| if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
| await self.gen.athrow(typ, value, traceback)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/tool_server.py", line 181, in new_session
| yield mcp_client
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
| await self.gen.athrow(typ, value, traceback)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
| async with anyio.create_task_group() as tg:
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
| if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
| yield read_stream, write_stream
| File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
| cb_suppress = await cb(*exc_details)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
| return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
| if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
| File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
| raise RuntimeError(
| RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "thrift.python.streaming/py_promise.pyx", line 103, in thrift.python.streaming.py_promise.runGenerator
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/facebook/inference_platform_sp/llm_predictor_gpu/openai_compatible/openai_compatible_service/thrift_services.py", line 195, in _fbthrift__stream_wrapper_createResponsesStream
async for item in stream_generator:
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/predictor/stream_predict_event.py", line 77, in extract_chunk_responses
async for event in events:
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/handler_lib.py", line 798, in stream_create_responses_internal
raise e
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/handler_lib.py", line 779, in stream_create_responses_internal
async for part in g:
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/predictor/vllm_predictor.py", line 534, in create_responses_stream
async for res in resp:
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/openai/serving_responses.py", line 1664, in responses_stream_generator
async with AsyncExitStack() as exit_stack:
File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 714, in __aexit__
raise exc_details[1]
File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
cb_suppress = await cb(*exc_details)
File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
await self.gen.athrow(typ, value, traceback)
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
async with anyio.create_task_group() as tg:
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
raise RuntimeError(
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
yeqcharlotte
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
To Triage