Skip to content

Fix WebGPU device destroyed on session release, breaking session recreation#27634

Merged
guschmue merged 5 commits intomicrosoft:mainfrom
nico-martin:fix/create-webgpu-device-after-release
Mar 24, 2026
Merged

Fix WebGPU device destroyed on session release, breaking session recreation#27634
guschmue merged 5 commits intomicrosoft:mainfrom
nico-martin:fix/create-webgpu-device-after-release

Conversation

@nico-martin
Copy link
Copy Markdown
Contributor

Description

We had a weird behavior in Transformers.js V4. After calling InferenceSession.release() on a WebGPU session, attempting to create a new WebGPU session fails with:

WebGPU device lost (2): Device was destroyed.

In Transformers.js we encourage the use of the create -> release -> create pattern, because we expect the application to run for some time and might use multiple models. So it makes sense to unload models after the job is done.

It seems like this was introduced in e03631ee528, which added the preserveDevice option with a default value of false. When the last session is released and preserveDevice=false, the C++ side destroys the WebGPU device, but the JavaScript reference in env.webgpu.device is never cleared, leaving a stale reference to a destroyed device.

Changes

Clear stale device reference when lost (backend-webgpu.ts)

  1. Made device property configurable: true to allow deletion
  2. Added cleanup logic in dispose() to detect device loss via device.lost promise
  3. When device is lost (destroyed, driver crash, etc.), delete the stale env.webgpu.device reference

This allows subsequent session creation to acquire a fresh device instead of attempting to reuse a lost one.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Mar 12, 2026
guschmue
guschmue previously approved these changes Mar 16, 2026
@guschmue
Copy link
Copy Markdown
Contributor

Needs a tiny fix:
Error: Following source files are not formatted: (did you run "npm run format"?)
js/web/lib/wasm/jsep/backend-webgpu.ts

@guschmue
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue guschmue merged commit 2f66878 into microsoft:main Mar 24, 2026
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants