-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
improvementPRs or issues focused on improvements in the current codebasePRs or issues focused on improvements in the current codebasemodelIssues related to exporting, improving, fixing ML modelsIssues related to exporting, improving, fixing ML models
Milestone
Description
The profiling of Whisper-tiny models (encoder and decoder) revealed a significant inference slowdown due to certain operators not being delegated to the XNNPACK backend during the export stage.
The non-delegated operators account for approximately two-thirds of the inference time in the decoder module and around 40% in the encoder module, as shown below (decoder's profiling results, OPERATOR_CALL represents the aggregated result for all non-delegated methods, while DELEGATE_CALL represents the aggregated result for all delegated methods):
| Op Name | Total Time (ms) | Share (%) | Calls | Delegated | Delegated (%) |
|---|---|---|---|---|---|
| Method::execute | 138.49 | 100.00% | 1 | 0 | 0.00% |
| OPERATOR_CALL | 90.902 | 65.64% | 286 | 0 | 0.00% |
| native_call_mm.out | 70.643 | 51.01% | 1 | 0 | 0.00% |
| DELEGATE_CALL | 47.446 | 34.26% | 89 | 0 | 0.00% |
| Fully Connected (NC, F32) GEMM #1 | 29.328 | 21.18% | 40 | 40 | 100.00% |
| Batch Matrix Multiply (NC, F32) GEMM #1 | 8.133 | 5.87% | 16 | 16 | 100.00% |
| Transpose (ND, X32) #1 | 6.457 | 4.66% | 41 | 41 | 100.00% |
| native_call_where.self_out | 5.47 | 3.95% | 9 | 0 | 0.00% |
| native_call_eq.Scalar_out | 3.23 | 2.33% | 8 | 0 | 0.00% |
| native_call_expand_copy.out | 2.635 | 1.90% | 36 | 0 | 0.00% |
| native_call_gelu.out | 2.438 | 1.76% | 4 | 0 | 0.00% |
| Softmax (NC, F32) #1 | 1.318 | 0.95% | 8 | 8 | 100.00% |
| native_call_index.Tensor_out | 1.1 | 0.79% | 1 | 0 | 0.00% |
| native_call_slice_copy.Tensor_out | 0.838 | 0.61% | 24 | 0 | 0.00% |
| native_call_clone.out | 0.815 | 0.59% | 49 | 0 | 0.00% |
| native_call_full_like.out | 0.644 | 0.47% | 8 | 0 | 0.00% |
| native_call_view_copy.out | 0.618 | 0.45% | 1 | 0 | 0.00% |
| native_call_native_layer_norm.out | 0.598 | 0.43% | 13 | 0 | 0.00% |
| Transpose (ND, X32) #2 | 0.541 | 0.39% | 16 | 16 | 100.00% |
| Add (ND) #1 | 0.514 | 0.37% | 17 | 17 | 100.00% |
| native_call_mul.Scalar_out | 0.438 | 0.32% | 16 | 0 | 0.00% |
| native_call_any.out | 0.329 | 0.24% | 8 | 0 | 0.00% |
| native_call_logical_not.out | 0.317 | 0.23% | 16 | 0 | 0.00% |
| native_call_gt.Tensor_out | 0.204 | 0.15% | 1 | 0 | 0.00% |
| native_call_unsqueeze_copy.out | 0.092 | 0.07% | 11 | 0 | 0.00% |
| native_call_sub.out | 0.092 | 0.07% | 1 | 0 | 0.00% |
| native_call__to_dim_order_copy.out | 0.07 | 0.05% | 1 | 0 | 0.00% |
| native_call_ge.Scalar_out | 0.049 | 0.04% | 1 | 0 | 0.00% |
| native_call_embedding.out | 0.036 | 0.03% | 1 | 0 | 0.00% |
| Multiply (ND) #1 | 0.032 | 0.02% | 1 | 1 | 100.00% |
| native_call_arange.start_out | 0.016 | 0.01% | 4 | 0 | 0.00% |
| native_call_full.out | 0.009 | 0.01% | 1 | 0 | 0.00% |
| native_call_repeat.out | 0.008 | 0.01% | 1 | 0 | 0.00% |
| native_call_scalar_tensor.out | 0.004 | 0.00% | 1 | 0 | 0.00% |
darthez
Metadata
Metadata
Assignees
Labels
improvementPRs or issues focused on improvements in the current codebasePRs or issues focused on improvements in the current codebasemodelIssues related to exporting, improving, fixing ML modelsIssues related to exporting, improving, fixing ML models