Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way #2521

Merged

Conversation

jiajie-yang
Copy link
Contributor

Motivation

Through practical tests on ascend platform, it is found that the AscendOpsBackend.update_step_context() function has some room for performance optimization, especially in the prefill stage and when the input sequence is long (for example, when the input sequence contains more image tokens). Because the original code uses a slower loop to calculate kv_start_indices.

The goal of this PR is to improve the performance of the AscendOpsBackend.update_step_context() function with more efficient code.

Modification

This PR is only changed the lmdeploy/pytorch/backends/ascend/op_backend.py file. The main modification is to optimize the kv_start_indices calculation process from a normal loop to using slot_tables and slot_indices (inspired by atb_models/examples/server in the Ascend mindie-atb_models package).

When running 8B's VL model on ascend910b, if the input length of a request reaches 2000+ (including image tokens), In the prefill stage, the AscendOpsBackend.update_step_context() function can be reduced from 160ms+ to about 80ms, which is beneficial in the case of strict TTFT requirements.

@jinminxi104
Copy link
Collaborator

LGTM and passed our ci on ascend device

@lvhan028 lvhan028 merged commit 0323103 into InternLM:main Sep 26, 2024
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants