Skip to content

Fix 1st token latency time#1091

Merged
regisss merged 1 commit into
mainfrom
libintan/fix_1st_token_latency
Jun 28, 2024
Merged

Fix 1st token latency time#1091
regisss merged 1 commit into
mainfrom
libintan/fix_1st_token_latency

Conversation

@libinta
Copy link
Copy Markdown
Collaborator

@libinta libinta commented Jun 24, 2024

What does this PR do?

Move the 1st token finish time to not include 2nd step kv pad time.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@libinta libinta added the run-test Run CI for PRs from external contributors label Jun 26, 2024
@regisss regisss merged commit d73f3c9 into main Jun 28, 2024
@regisss regisss deleted the libintan/fix_1st_token_latency branch June 28, 2024 18:17
regisss pushed a commit that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-test Run CI for PRs from external contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants