Cache opSupportLimits to improve the performance and update tracing e… #25589

qwu16 · 2025-07-30T02:19:31Z

Description

Cached opSupportLimits in webnn backend and avoid quering it from lower layer each time to improve the performance. Update the trace event in data transfer.

Motivation and Context

In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot. Call this API when session is created and then cache it will avoid the frequent lower API call.

…vent in data transfer

qwu16 · 2025-07-30T02:25:33Z

@Honry PTAL, thanks!

Honry

LGTM, thanks!

qwu16 · 2025-07-30T03:58:50Z

@fdwr PTAL, thanks~

fdwr · 2025-07-31T02:02:54Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-07-31T02:02:57Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-07-31T02:03:00Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-07-31T02:03:03Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-07-31T02:03:03Z

Azure Pipelines successfully started running 1 pipeline(s).

fdwr · 2025-07-31T02:03:05Z

/azp run Test Linux CUDA x64 Release,Test Linux TensorRT x64 Release,web_Debug / build_onnxruntime_web,web_Release / build_onnxruntime_web

azure-pipelines · 2025-07-31T02:03:08Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-07-31T02:03:11Z

No pipelines are associated with this pull request.

azure-pipelines · 2025-07-31T02:03:11Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-07-31T02:03:12Z

Azure Pipelines successfully started running 2 pipeline(s).

fdwr · 2025-07-31T02:14:25Z

In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot.

Are there any hotspots in the Chromium implementation that warrant improvement (reducing the need for this cache)? It appears the code after still goes through an MLOpSupportLimits, but just avoids the call to this.mlContextBySessionId.get(sessionId).opSupportLimits().

fdwr

👍

qwu16 · 2025-07-31T02:30:59Z

In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot.

Are there any hotspots in the Chromium implementation that warrant improvement (reducing the need for this cache)? It appears the code after still goes through an MLOpSupportLimits, but just avoids the call to this.mlContextBySessionId.get(sessionId).opSupportLimits().

In current Chromium implementation, queried properties for different backend have been cached in MLContext, so when calling MLContext.opSupportLimits, there is no extra IPC between render and GPU process. But in https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/ml/ml_context.cc;l=241, hundreds of op limit properties will be set before return, that takes some time. This PR will cache the MLOpSUpportLimits object and avoid setting hundreds of op limit settings in calling MLContext::opSupportLimits.

microsoft#25589) ### Description Cached opSupportLimits in webnn backend and avoid quering it from lower layer each time to improve the performance. Update the trace event in data transfer. ### Motivation and Context In current implementation, each time calling ensureTensor API to check input/output tensor, MLContext.opSupportLimits API will be called to query support ops capability from chromium and this function call will be the hotspot. Call this API when session is created and then cache it will avoid the frequent lower API call.

Cache opSupportLimits to improve the performance and update tracing e…

05e6359

…vent in data transfer

Remove confusing comment

b6f7bff

Honry approved these changes Jul 30, 2025

View reviewed changes

fdwr approved these changes Jul 31, 2025

View reviewed changes

fdwr merged commit a7bc727 into microsoft:main Jul 31, 2025
98 of 101 checks passed

qwu16 deleted the oplimit branch August 1, 2025 00:09

Cache opSupportLimits to improve the performance and update tracing e… #25589

Cache opSupportLimits to improve the performance and update tracing e… #25589

Uh oh!

Conversation

qwu16 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

qwu16 commented Jul 30, 2025

Uh oh!

Honry left a comment

Choose a reason for hiding this comment

Uh oh!

qwu16 commented Jul 30, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

azure-pipelines bot commented Jul 31, 2025

Uh oh!

fdwr commented Jul 31, 2025

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

qwu16 commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qwu16 commented Jul 30, 2025 •

edited

Loading