optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

jiajie-yang · 2024-09-26T07:15:55Z

Motivation

Through practical tests on ascend platform, it is found that the AscendOpsBackend.update_step_context() function has some room for performance optimization, especially in the prefill stage and when the input sequence is long (for example, when the input sequence contains more image tokens). Because the original code uses a slower loop to calculate kv_start_indices.

The goal of this PR is to improve the performance of the AscendOpsBackend.update_step_context() function with more efficient code.

Modification

This PR is only changed the lmdeploy/pytorch/backends/ascend/op_backend.py file. The main modification is to optimize the kv_start_indices calculation process from a normal loop to using slot_tables and slot_indices (inspired by atb_models/examples/server in the Ascend mindie-atb_models package).

When running 8B's VL model on ascend910b, if the input length of a request reaches 2000+ (including image tokens), In the prefill stage, the AscendOpsBackend.update_step_context() function can be reduced from 160ms+ to about 80ms, which is beneficial in the case of strict TTFT requirements.

jinminxi104 · 2024-09-26T08:29:49Z

LGTM and passed our ci on ascend device

refactor: optimize performance of ascend backend's update_step_context()

fecef8f

lvhan028 requested a review from jinminxi104 September 26, 2024 07:28

lvhan028 added the improvement label Sep 26, 2024

jinminxi104 approved these changes Sep 26, 2024

View reviewed changes

lvhan028 merged commit 0323103 into InternLM:main Sep 26, 2024
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

jiajie-yang commented Sep 26, 2024

jinminxi104 commented Sep 26, 2024

optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

Conversation

jiajie-yang commented Sep 26, 2024

Motivation

Modification

jinminxi104 commented Sep 26, 2024