Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prerequisite work for supporting disaggregation: #68

Merged
merged 1 commit into from
May 1, 2024

Conversation

zhihaoshan-google
Copy link
Contributor

  1. Add transfer thread to transfer KV Cache.
  2. For interleaved mode, prioritize prefill and improve the HBM utilization.

1. Add transfer thread to transfer KV Cache.
2. For interleaved mode, prioritize prefill and improve the HBM
   utilization.
Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zhihaoshan-google zhihaoshan-google merged commit a3546e8 into AI-Hypercomputer:main May 1, 2024
3 checks passed
jwyang-google pushed a commit that referenced this pull request May 6, 2024
1. Add transfer thread to transfer KV Cache.
2. For interleaved mode, prioritize prefill and improve the HBM
   utilization.

Co-authored-by: Zhihao Shan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants