Add request-level timestamp for when prefill finishes by scottjlee · Pull Request #14860 · sgl-project/sglang

scottjlee · 2025-12-11T01:44:07Z

Motivation

Currently, the prefill_finished_ts field is derived from ReqState.first_token_time. However, for non-streaming requests, ReqState.first_token_time is never set, so we cannot rely on this value for when prefill finishes. As a result, prefill_finished_ts is omitted from the request-level metrics for non-streaming requests.

Modifications

This PR adds a new explicit prefill_finished_ts field to the TimeStats object, and updates the value once a request completes prefill. The prefill_finished_ts field is populated in both streaming and non-streaming requests.

Since this PR only adds the new metric field, it should have no impact on the model outputs or speed.

Log examples

Non-streaming:

[2025-12-11 01:42:24] Finish: obj=GenerateReqInput(validation_time=2.880580723285675e-05, received_time=1765417343.9740803, received_time_perf=14121634.43814618, rid='oBTZYgU-4YNCb4-980033869e849bc1', http_worker_ipc=None, video_data=None, return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, token_ids_logprob=None, return_text_in_logprobs=True, stream=False, log_metrics=True, return_hidden_states=False, modalities=[], session_params=None, lora_id=None, custom_logit_processor=None, bootstrap_host=None, bootstrap_port=None, bootstrap_room=None, bootstrap_pair_key=None, decode_tp_size=None, reasoning=False, data_parallel_rank=None, background=False, conversation_id=None, priority=None, extra_key=None, no_logs=False, custom_labels=None, return_bytes=False, return_entropy=False), out={'meta_info': {'id': 'oBTZYgU-4YNCb4-980033869e849bc1', 'finish_reason': {'type': 'stop', 'matched': 2}, 'prompt_tokens': 23, 'weight_version': 'default', 'total_retractions': 0, 'queue_time': 0.0006293673068284988, 'prefill_launch_delay': 0.0007045920938253403, 'prefill_launch_latency': 0.024291932582855225, 'prefill_finished_ts': 1765417344.0011635, 'completion_tokens': 32, 'cached_tokens': 0, 'spec_accept_rate': 0.42857142857142855, 'spec_accept_length': 2.2857142857142856, 'spec_verify_ct': 14, 'spec_accept_token_num': 18, 'spec_draft_token_num': 42, 'e2e_latency': 0.15244436264038086, 'request_received_ts': 1765417343.9740803, 'request_sent_to_scheduler_ts': 1765417343.9751327, 'decode_finished_ts': 1765417344.1265247, 'inference_time': 0.15041274577379227, 'response_sent_to_client_ts': 1765417344.1267433}}

Streaming:

[2025-12-11 01:42:46] Finish: obj=GenerateReqInput(validation_time=2.5197863578796387e-05, received_time=1765417366.1434186, received_time_perf=14121656.607484436, rid='oBTZYgU-4YNCb4-980033869e849bc1', http_worker_ipc=None, video_data=None, return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, token_ids_logprob=None, return_text_in_logprobs=True, stream=True, log_metrics=True, return_hidden_states=False, modalities=[], session_params=None, lora_id=None, custom_logit_processor=None, bootstrap_host=None, bootstrap_port=None, bootstrap_room=None, bootstrap_pair_key=None, decode_tp_size=None, reasoning=False, data_parallel_rank=None, background=False, conversation_id=None, priority=None, extra_key=None, no_logs=False, custom_labels=None, return_bytes=False, return_entropy=False), out={'meta_info': {'id': 'oBTZYgU-4YNCb4-980033869e849bc1', 'finish_reason': {'type': 'stop', 'matched': 2}, 'prompt_tokens': 23, 'weight_version': 'default', 'total_retractions': 0, 'queue_time': 0.0007969774305820465, 'prefill_launch_delay': 0.0005463473498821259, 'prefill_launch_latency': 0.011130349710583687, 'prefill_finished_ts': 1765417366.1574004, 'completion_tokens': 32, 'cached_tokens': 21, 'spec_accept_rate': 0.42857142857142855, 'spec_accept_length': 2.2857142857142856, 'spec_verify_ct': 14, 'spec_accept_token_num': 18, 'spec_draft_token_num': 42, 'e2e_latency': 0.14004898071289062, 'request_received_ts': 1765417366.1434186, 'request_sent_to_scheduler_ts': 1765417366.1445887, 'response_sent_to_client_ts': 1765417366.1584878, 'decode_finished_ts': 1765417366.2834675, 'inference_time': 0.13779237307608128, 'decode_throughput': 255.8506616612087}}

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-11T01:44:12Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

scottjlee · 2025-12-11T01:46:42Z

python/sglang/srt/managers/tokenizer_manager.py

-        # For embeddings, there's no separate prefill phase, so omit `prefill_finished_ts`.
-        if (
-            not isinstance(recv_obj, BatchEmbeddingOutput)
-            and state.first_token_time > 0
-        ):
-            meta_info["prefill_finished_ts"] = state.first_token_time


no longer need to rely on ReqState.first_token_time for prefill_finished_ts, since we directly track this now.

shall we keep not isinstance(recv_obj, BatchEmbeddingOutput)

i think the whole if clause is no longer needed. because in this PR, we explicitly track prefill_finished_ts as a new field, and is added to meta_info with _add_metric_if_present() above, we shouldn't need this case to use state.first_token_time.

zhyncs · 2025-12-11T01:56:40Z

/tag-and-rerun-ci

ShangmingCai

Logic LGTM. Do you have time to check on this PR @sufeng-buaa ?

sufeng-buaa · 2025-12-11T09:50:07Z

python/sglang/srt/disaggregation/prefill.py

        ):
            if req.is_chunked <= 0:
+                if req.time_stats.prefill_finished_ts == 0.0:
+                    req.time_stats.prefill_finished_ts = time.time()


Currently, all timestamps collected in TimeStats are obtained via time.perf_counter(). Additionally, with this in place, should prefill_end_time_host be removed? It seems that prefill_end_time_host does not record the actual prefill end time.

but there doesn't seem to be a better way to maintain consistency with first_token_time (obtained by time.time()). It just feels a bit inconsistent. Maybe we should add a comment in TimeStats to clarify this?

if i'm understanding correctly, prefill_end_time_host is used in calculating prefill_launch_latency, which specifically corresponds to the prefill kernel launch time. whereas for my newly created prefill_finished_ts timestamp is more useful for seeing where prefill as a whole ends (which encapsulates prefill_launch_delay, prefill_launch_latency, and other untracked operations in between).

good point about the time.perf_counter() units for other timestamps in TimeStats. i left a comment clarifying this difference. IMO we should still keep it consistent with first_token_time in this case, especially since the other _ts metrics reported at request-level are also the same unit, making it easier to compare them directly.

ShangmingCai · 2025-12-12T05:51:04Z

python/sglang/srt/managers/scheduler_output_processor_mixin.py

+                    if req.time_stats.prefill_finished_ts == 0.0:
+                        req.time_stats.prefill_finished_ts = time.time()
+


Not sure about whether this is the best place to record prefill finish time.

ShangmingCai

LGTM as long as CI passes.

ShangmingCai · 2025-12-12T05:52:20Z

/rerun-failed-ci

scottjlee · 2025-12-12T18:24:01Z

/rerun-failed-ci

zhyncs · 2025-12-15T22:14:29Z

/rerun-failed-ci

scottjlee · 2025-12-16T02:31:23Z

/rerun-failed-ci try again

scottjlee · 2025-12-16T18:00:37Z

/rerun-failed-ci

scottjlee · 2025-12-16T22:29:51Z

/rerun-failed-ci

scottjlee · 2025-12-17T02:24:01Z

/rerun-failed-ci

scottjlee · 2025-12-17T17:23:46Z

/rerun-failed-ci

zhyncs · 2025-12-17T21:33:04Z

https://github.com/sgl-project/sglang/actions/runs/20254485319?pr=14860

…n3_pp * 'main' of https://github.com/sgl-project/sglang: (74 commits) [bug fix][pp] fix inconsistent latency between tp (sgl-project#15379) Fix warp illegal instruction in kimi k2 thinking PCG (sgl-project#15306) Fix gpt-oss yarn with `truncate` argument (sgl-project#14270) Monkey patch deepseek-ocr's `v_head_dim` (sgl-project#15384) [model-gateway] Replace PolicyRegistry RwLock with DashMap for lock-free policy lookups (sgl-project#15361) [PP] Fix dynamic chunking strategy for PP (sgl-project#15372) Fix issue: ENABLE_BELOW_SM90 cannot be enabled on aarch64 CPU (sgl-project#12967) Split test_piecewise_cuda_graph.py to optimize CI resource usage (sgl-project#15290) unified management of environment variables for vlm cuda ipc transport (sgl-project#14501) Mistral Large 3 NVFP4 TRTLLM MoE support (sgl-project#15049) fix: adjust time for test_epd_disaggregation.py (sgl-project#15354) Add doc for qwen3 next (sgl-project#15337) feat: DeepSeek-V3.2 Streaming tool call output (sgl-project#15278) Feature/trtllm mha workspace size configurable sgl-project#15089 (sgl-project#15131) [VLM] Support cos sin cache for Qwen3-VL & GLM-4.1V (sgl-project#15205) [Deepseek V3.2] Support Overlap Spec + NSA (sgl-project#15307) Add request-level timestamp for when prefill finishes (sgl-project#14860) [CI] Migrate LoRA tests to test/registered/lora/ (sgl-project#15176) Reserve more memory for DeepSeekOCR model and adjust server start timeout for DeepGEMM to reduce flakiness (sgl-project#15277) Fix condition check for require_gathered_buffer (sgl-project#15328) ...

)

scottjlee added 2 commits December 10, 2025 17:25

add prefill_finished_ts timing

1393316

remove duplicate prefill_finished_ts setting

ad0e6b5

scottjlee requested review from ByronHsu, ShangmingCai, Ying1123, hnyls2002, merrymercy, xiezhq-hermann and zhyncs as code owners December 11, 2025 01:44

scottjlee commented Dec 11, 2025

View reviewed changes

zhyncs assigned cicirori Dec 11, 2025

github-actions bot added the run-ci label Dec 11, 2025

zhyncs assigned ShangmingCai Dec 11, 2025

fix embedding arg

fdd7985

ShangmingCai reviewed Dec 11, 2025

View reviewed changes

sufeng-buaa reviewed Dec 11, 2025

View reviewed changes

clarify time units

bf70455

ShangmingCai reviewed Dec 12, 2025

View reviewed changes

ShangmingCai approved these changes Dec 12, 2025

View reviewed changes

Merge branch 'main' into sjl/1210-prefill-finish

e0a7430

Merge branch 'main' into sjl/1210-prefill-finish

ceef21b

Merge branch 'main' into sjl/1210-prefill-finish

3c15a94

zhyncs self-assigned this Dec 15, 2025

zhyncs added the high priority label Dec 15, 2025

Merge branch 'main' into sjl/1210-prefill-finish

bcf8022

zhyncs merged commit 169a75d into sgl-project:main Dec 17, 2025
395 of 424 checks passed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025

Add request-level timestamp for when prefill finishes (sgl-project#14860

0e9dece

)

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

Add request-level timestamp for when prefill finishes (sgl-project#14860

d104b51

)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Add request-level timestamp for when prefill finishes (sgl-project#14860

3981da9

)

		if req.time_stats.prefill_finished_ts == 0.0:
		req.time_stats.prefill_finished_ts = time.time()

Conversation

scottjlee commented Dec 11, 2025

Motivation

Modifications

Log examples

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Uh oh!

scottjlee Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

scottjlee Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Dec 11, 2025

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

sufeng-buaa Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sufeng-buaa Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottjlee Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

sufeng-buaa Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ShangmingCai Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

ShangmingCai commented Dec 12, 2025

Uh oh!

scottjlee commented Dec 12, 2025

Uh oh!

zhyncs commented Dec 15, 2025

Uh oh!

scottjlee commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scottjlee commented Dec 16, 2025

Uh oh!

scottjlee commented Dec 16, 2025

Uh oh!

scottjlee commented Dec 17, 2025

Uh oh!

scottjlee commented Dec 17, 2025

Uh oh!

zhyncs commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sufeng-buaa Dec 11, 2025 •

edited

Loading

scottjlee commented Dec 16, 2025 •

edited

Loading