fix: prefill stats added to EngineCoreOutput#133
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the num_cached_tokens and num_external_computed_tokens fields in EngineCoreOutput with a new PrefillStats struct, providing a more granular breakdown of prefill token sources. The metrics tracking logic has been updated to utilize this new structure, and the vllm:prompt_tokens_recomputed metric was removed. Feedback suggests implementing a fallback in the metrics tracker to record total prompt tokens if the prefill_stats object is absent, ensuring consistency between cumulative counters and request histograms.
| if let Some(prefill_stats) = &output.prefill_stats { | ||
| record_prompt_tokens(&self.model_name, engine_index, prefill_stats); | ||
| } |
There was a problem hiding this comment.
If prefill_stats is missing from the first output (e.g., due to an unexpected backend response or protocol mismatch), the vllm:prompt_tokens counter will not be incremented at all, while the vllm:request_prompt_tokens histogram in record_finished will still be updated using self.prompt_len. This creates a discrepancy between cumulative counters and request histograms. Consider providing a fallback that records the total prompt tokens even if the breakdown is missing.
| if let Some(prefill_stats) = &output.prefill_stats { | |
| record_prompt_tokens(&self.model_name, engine_index, prefill_stats); | |
| } | |
| if let Some(prefill_stats) = &output.prefill_stats { | |
| record_prompt_tokens(&self.model_name, engine_index, prefill_stats); | |
| } else { | |
| metrics() | |
| .prompt_tokens | |
| .get_or_create(&engine_labels(&self.model_name, engine_index)) | |
| .inc_by(self.prompt_len as u64); | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c72e817046
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Due to vllm-project/vllm#37460