Skip to content

Conversation

@merrymercy
Copy link
Contributor

@merrymercy merrymercy commented Mar 2, 2025

  • Support penalty in overlap mode
  • Support chunked prefill + input logprob
  • Improve benchmark script and profiler
  • rename "token_ids" to "output_ids" in the return value when using --skip-tokenizer-init
Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: dhou-xai <[email protected]>
Co-authored-by: Hanming Lu <[email protected]>

@merrymercy merrymercy changed the title Support penalty; return logprob with chunked prefill; improve benchmark scripts Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts Mar 2, 2025
@merrymercy merrymercy merged commit 3f77ac7 into main Mar 3, 2025
20 of 23 checks passed
@merrymercy merrymercy deleted the lianmin/many-improve branch March 3, 2025 08:05
merrymercy added a commit that referenced this pull request Mar 3, 2025
… improve benchmark scripts (#3988)

Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: dhou-xai <[email protected]>
Co-authored-by: Hanming Lu <[email protected]>
zhaochenyang20 pushed a commit that referenced this pull request Mar 4, 2025
aoshen524 pushed a commit to aoshen524/sglang that referenced this pull request Mar 10, 2025
… improve benchmark scripts (sgl-project#3988)

Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: dhou-xai <[email protected]>
Co-authored-by: Hanming Lu <[email protected]>
aoshen524 pushed a commit to aoshen524/sglang that referenced this pull request Mar 10, 2025
@CSEEduanyu
Copy link

hi @merrymercy , why remove repetition_penalty.py?

@XiaobingSuper
Copy link

@merrymercy I have same question, why remove repetition_penalty.py? for now, how it works when repetition_penalty is set?

Copy link
Contributor

@elvischenv elvischenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove TPOT?

Comment on lines 1068 to 1076
print("{:<40} {:<10.2f}".format("P99 TTFT (ms):", metrics.p99_ttft_ms))
print(
"{s:{c}^{n}}".format(s="Time per Output Token (excl. 1st token)", n=50, c="-")
)
print("{:<40} {:<10.2f}".format("Mean TPOT (ms):", metrics.mean_tpot_ms))
print("{:<40} {:<10.2f}".format("Median TPOT (ms):", metrics.median_tpot_ms))
print("{:<40} {:<10.2f}".format("P99 TPOT (ms):", metrics.p99_tpot_ms))
print("{s:{c}^{n}}".format(s="Inter-token Latency", n=50, c="-"))
print("{s:{c}^{n}}".format(s="Inter-Token Latency", n=50, c="-"))
print("{:<40} {:<10.2f}".format("Mean ITL (ms):", metrics.mean_itl_ms))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove TPOT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants