Skip to content

[Bug][gpt-oss] streaming/tools RuntimeError: Attempted to exit cancel scope in a different task than it was entered in #25697

@qandrew

Description

@qandrew

Your current environment

The output of python collect_env.py
(gpt_oss_edit) [[email protected] /data/users/axia/gitrepos/vllm/vllm (fix-resp-response-api)]$ python collect_env.py
Collecting environment information...
==============================
        System Info
==============================
OS                           : CentOS Stream 9 (x86_64)
GCC version                  : (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11)
Clang version                : Could not collect
CMake version                : version 4.1.0
Libc version                 : glibc-2.34

==============================
       PyTorch Info
==============================
PyTorch version              : 2.8.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Aug 14 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-11)] (64-bit runtime)
Python platform              : Linux-6.4.3-0_fbk15_hardened_2630_gf27365f948db-x86_64-with-glibc2.34

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration :
GPU 0: NVIDIA H100
GPU 1: NVIDIA H100
GPU 2: NVIDIA H100
GPU 3: NVIDIA H100
GPU 4: NVIDIA H100
GPU 5: NVIDIA H100
GPU 6: NVIDIA H100
GPU 7: NVIDIA H100

Nvidia driver version        : 550.90.07
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.9.10.1
/usr/lib64/libcudnn_adv.so.9.10.1
/usr/lib64/libcudnn_cnn.so.9.10.1
/usr/lib64/libcudnn_engines_precompiled.so.9.10.1
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.10.1
/usr/lib64/libcudnn_graph.so.9.10.1
/usr/lib64/libcudnn_heuristic.so.9.10.1
/usr/lib64/libcudnn_ops.so.9.10.1
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      52 bits physical, 57 bits virtual
Byte Order:                         Little Endian
CPU(s):                             368
On-line CPU(s) list:                0-367
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 9654 96-Core Processor
CPU family:                         25
Model:                              17
Thread(s) per core:                 1
Core(s) per socket:                 368
Socket(s):                          1
Stepping:                           1
BogoMIPS:                           4792.80
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm flush_l1d arch_capabilities
Virtualization:                     AMD-V
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          23 MiB (368 instances)
L1i cache:                          23 MiB (368 instances)
L2 cache:                           184 MiB (368 instances)
L3 cache:                           16 MiB (1 instance)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-367
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:           Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

==============================
Versions of relevant libraries
==============================
[pip3] efficientnet_pytorch==0.7.1
[pip3] mypy==1.11.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-nccl-cu12==2.27.3
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] open_clip_torch==2.32.0
[pip3] pytorch-lightning==2.5.2
[pip3] pyzmq==27.1.0
[pip3] segmentation_models_pytorch==0.4.0
[pip3] sentence-transformers==3.2.1
[pip3] terratorch==1.0.2
[pip3] torch==2.8.0+cu128
[pip3] torchaudio==2.8.0+cu128
[pip3] torchgeo==0.7.0
[pip3] torchmetrics==1.7.4
[pip3] torchvision==0.23.0+cu128
[pip3] transformers==4.55.2
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.4.0
[pip3] tritonclient==2.51.0
[pip3] vector-quantize-pytorch==1.21.2
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.10.2rc2.dev171+g170129eb2 (git sha: 170129eb2)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV18    NV18    NV18    NV18    NV18    NV18    NV18    0-367   0               N/A
GPU1    NV18     X      NV18    NV18    NV18    NV18    NV18    NV18    0-367   0               N/A
GPU2    NV18    NV18     X      NV18    NV18    NV18    NV18    NV18    0-367   0               N/A
GPU3    NV18    NV18    NV18     X      NV18    NV18    NV18    NV18    0-367   0               N/A
GPU4    NV18    NV18    NV18    NV18     X      NV18    NV18    NV18    0-367   0               N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X      NV18    NV18    0-367   0               N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X      NV18    0-367   0               N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X      0-367   0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
CUDA_CACHE_PATH=/data/users/axia/.nv/ComputeCache
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY

🐛 Describe the bug

We're working on supporting streaming for ResponsesAPI for GPT-OSS. I've noticed that with tool calling recently integrated to GPTOSS for streaming (#23386, #23927), when I run the following I've noticed we are not cleaning up our context properly.

I think the issue is related to:

I'm looking into mak PR to fix it, but wanted to share this bug with the general community / if anyone has ideas for this.

Thoughts

  • I don't get this error in the non stream mode
  • If the client is not making a request with tools, we probably shouldn't initialize tool sessions at all?

Some related issues
modelcontextprotocol/python-sdk#79
modelcontextprotocol/python-sdk#521
Chainlit/chainlit#2182

cc @Jialin @yeqcharlotte @lacora @heheda12345

curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Hello."
        }
    ],
    "stream": true
}'  
INFO 09-24 15:43:19 [context.py:344] init session browser
INFO:     127.0.0.1:34478 - "GET /browser/sse HTTP/1.1" 200 OK
I0924 15:43:19.259000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/browser/sse "HTTP/1.1 200 OK"
INFO:     127.0.0.1:34490 - "POST /browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 HTTP/1.1" 202 Accepted
I0924 15:43:19.262000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:344] init session python
INFO:     127.0.0.1:34490 - "POST /browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 HTTP/1.1" 202 Accepted
I0924 15:43:19.275000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/browser/messages/?session_id=706f8c1b16d04077ac46fa80b2ffcbb7 "HTTP/1.1 202 Accepted"
INFO:     127.0.0.1:34504 - "GET /python/sse HTTP/1.1" 200 OK
I0924 15:43:19.278000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/python/sse "HTTP/1.1 200 OK"
INFO:     127.0.0.1:34516 - "POST /python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 HTTP/1.1" 202 Accepted
I0924 15:43:19.281000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:344] init session container
INFO:     127.0.0.1:34516 - "POST /python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 HTTP/1.1" 202 Accepted
I0924 15:43:19.294000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/python/messages/?session_id=eed4c79ab2dc4976a68eae288b163271 "HTTP/1.1 202 Accepted"
INFO:     127.0.0.1:34520 - "GET /container/sse HTTP/1.1" 200 OK
I0924 15:43:19.297000 3208220 _client.py:1740] HTTP Request: GET http://localhost:57421/container/sse "HTTP/1.1 200 OK"
INFO:     127.0.0.1:34524 - "POST /container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 HTTP/1.1" 202 Accepted
I0924 15:43:19.299000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 "HTTP/1.1 202 Accepted"
INFO:     127.0.0.1:34524 - "POST /container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 HTTP/1.1" 202 Accepted
I0924 15:43:19.343000 3208220 _client.py:1740] HTTP Request: POST http://localhost:57421/container/messages/?session_id=f6da499579114e6fa2e5f671ede67786 "HTTP/1.1 202 Accepted"
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
INFO 09-24 15:43:19 [context.py:393] cleanup_session start, called tools: set()
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20c55ce80>
  warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Unexpected error in server stream handler:
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20c53e1a0>
  warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/streams/memory.py:183: ResourceWarning: Unclosed <MemoryObjectReceiveStream at 7fa20d3c6a40>
  warnings.warn(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
    yield read_stream, write_stream
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
    cb_suppress = await cb(*exc_details)
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
    return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
    if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
    raise RuntimeError(
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
During handling of the above exception, another exception occurred:
  + Exception Group Traceback (most recent call last):
  |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
    |     raise BaseExceptionGroup(
    | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | Traceback (most recent call last):
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
      |     yield read_stream, write_stream
      |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
      |     cb_suppress = await cb(*exc_details)
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
      |     return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
      |     if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
      |     raise RuntimeError(
      | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Exception Group Traceback (most recent call last):
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
      |     raise BaseExceptionGroup(
      | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
      +-+---------------- 1 ----------------
        | Exception Group Traceback (most recent call last):
        |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
        |     raise BaseExceptionGroup(
        | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
        +-+---------------- 1 ----------------
          | Traceback (most recent call last):
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
          |     yield read_stream, write_stream
          |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
          |     cb_suppress = await cb(*exc_details)
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
          |     return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 778, in __aexit__
          |     return self.cancel_scope.__exit__(exc_type, exc_val, exc_tb)
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
          |     raise RuntimeError(
          | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
          | 
          | During handling of the above exception, another exception occurred:
          | 
          | Exception Group Traceback (most recent call last):
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 767, in __aexit__
          |     raise BaseExceptionGroup(
          | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
          +-+---------------- 1 ----------------
            | Traceback (most recent call last):
            |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
            |     yield read_stream, write_stream
            |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
            |     cb_suppress = await cb(*exc_details)
            |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
            |     return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
            |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 778, in __aexit__
            |     return self.cancel_scope.__exit__(exc_type, exc_val, exc_tb)
            |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
            |     raise RuntimeError(
            | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
            +------------------------------------
          | 
          | During handling of the above exception, another exception occurred:
          | 
          | Traceback (most recent call last):
          |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
          |     await self.gen.athrow(typ, value, traceback)
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/tool_server.py", line 181, in new_session
          |     yield mcp_client
          |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
          |     cb_suppress = await cb(*exc_details)
          |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
          |     await self.gen.athrow(typ, value, traceback)
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
          |     async with anyio.create_task_group() as tg:
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
          |     if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
          |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
          |     raise RuntimeError(
          | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
        | 
        | During handling of the above exception, another exception occurred:
        | 
        | Traceback (most recent call last):
        |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
        |     yield read_stream, write_stream
        |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
        |     cb_suppress = await cb(*exc_details)
        |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
        |     return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
        |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
        |     if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
        |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
        |     raise RuntimeError(
        | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
      | 
      | During handling of the above exception, another exception occurred:
      | 
      | Traceback (most recent call last):
      |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
      |     await self.gen.athrow(typ, value, traceback)
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/tool_server.py", line 181, in new_session
      |     yield mcp_client
      |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
      |     cb_suppress = await cb(*exc_details)
      |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
      |     await self.gen.athrow(typ, value, traceback)
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
      |     async with anyio.create_task_group() as tg:
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
      |     if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
      |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
      |     raise RuntimeError(
      | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Traceback (most recent call last):
    |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 139, in sse_client
    |     yield read_stream, write_stream
    |   File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
    |     cb_suppress = await cb(*exc_details)
    |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/shared/session.py", line 218, in __aexit__
    |     return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
    |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
    |     if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
    |   File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
    |     raise RuntimeError(
    | RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "thrift.python.streaming/py_promise.pyx", line 103, in thrift.python.streaming.py_promise.runGenerator
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/facebook/inference_platform_sp/llm_predictor_gpu/openai_compatible/openai_compatible_service/thrift_services.py", line 195, in _fbthrift__stream_wrapper_createResponsesStream
    async for item in stream_generator:
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/predictor/stream_predict_event.py", line 77, in extract_chunk_responses
    async for event in events:
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/handler_lib.py", line 798, in stream_create_responses_internal
    raise e
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/handler_lib.py", line 779, in stream_create_responses_internal
    async for part in g:
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/smart/inference_platform_sp/llm_predictor_gpu/predictor/vllm_predictor.py", line 534, in create_responses_stream
    async for res in resp:
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/vllm/entrypoints/openai/serving_responses.py", line 1664, in responses_stream_generator
    async with AsyncExitStack() as exit_stack:
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 714, in __aexit__
    raise exc_details[1]
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 697, in __aexit__
    cb_suppress = await cb(*exc_details)
  File "/usr/local/fbcode/platform010/lib/python3.10/contextlib.py", line 217, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/mcp/client/sse.py", line 54, in sse_client
    async with anyio.create_task_group() as tg:
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 773, in __aexit__
    if self.cancel_scope.__exit__(type(exc), exc, exc.__traceback__):
  File "/data/users/axia/fbsource/buck-out/v2/gen/fbcode/cc651496ec52f07e/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/anyio/_backends/_asyncio.py", line 456, in __exit__
    raise RuntimeError(
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    To Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions