Fix token usage with jump forward by comaniac · Pull Request #174 · sgl-project/sglang

comaniac · 2024-02-09T18:59:50Z

close #173

This PR fixes the incorrect token usage when jump forward is enabled. Specifically, we introduce a new field orig_prompt_tokens, which will be set when the first jump forward happens so that we could know the original number of prompt tokens. When returning a response (a chunk in streaming or a complete response), we use the following equations to correct the token usage:

completion_tokens = curr_prompt_token - orig_prompt_tokens + completion_tokens
prompt_tokens = orig_prompt_tokens

hnyls2002 · 2024-02-10T01:33:34Z

@comaniac Hi, I suggest initializing the orig_prompt_tokens when constructing the Req so that we can simplify the code.

comaniac · 2024-02-10T03:04:18Z

Thanks that makes sense. Meanwhile, can you add some comments back to explain why we need to calculate the token usage in this way?

…sgl-project#174)

* added CI * poslih * poslih * poslih * poslih * poslih * fixed some broken tests * fix broken test case and fix llama4.py * precommit llama4.py * add init to make test discoverable (sgl-project#173) * Devops/ci (sgl-project#174) * add init to make test discoverable * lower atol rtol and fix random * Devops/ci used normed tensor for testing, worked locally (sgl-project#175) * add init to make test discoverable * lower atol rtol and fix random * pass locally... why fail on ci * done * remove requires grad * Reduce qwen tp process count to 2 (sgl-project#177) * add init to make test discoverable * lower atol rtol and fix random * pass locally... why fail on ci * done * remove requires grad * fix qwen3 tp test * remove dup * Should pass CI without a problem (sgl-project#178) * add init to make test discoverable * lower atol rtol and fix random * pass locally... why fail on ci * done * remove requires grad * fix qwen3 tp test * remove dup * fix tp size * recursively search for tests --------- Co-authored-by: ZhengHSI <zhenghsi@qq.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com>

Fix token usage with jump forward

aed223f

comaniac requested review from hnyls2002 and merrymercy and removed request for merrymercy February 9, 2024 18:59

init orig_prompt_tokens when init Req

c813cbe

Add comments

720ce3f

comaniac merged commit 4d303c4 into main Feb 10, 2024

comaniac deleted the cody/usage branch February 10, 2024 04:06

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Fix token usage with jump forward (sgl-project#174)

e097742

XiaobingSuper pushed a commit to XiaobingSuper/sglang that referenced this pull request Jan 23, 2026

Adjust the WAN model's Ulysses segmentation from frame_num to seq_len (…

ed5fe18

…sgl-project#174)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix token usage with jump forward#174

Fix token usage with jump forward#174
comaniac merged 3 commits intomainfrom
cody/usage

comaniac commented Feb 9, 2024

Uh oh!

hnyls2002 commented Feb 10, 2024

Uh oh!

comaniac commented Feb 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

comaniac commented Feb 9, 2024

Uh oh!

hnyls2002 commented Feb 10, 2024

Uh oh!

comaniac commented Feb 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants