[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels by LHXuuu · Pull Request #3576 · vllm-project/vllm-omni

LHXuuu · 2026-05-13T11:05:44Z

Purpose

Implements RFC #3545 on top of PR #3362's /metrics baseline, layering seven incremental goals:

G1 (Audio) — Add audio_ttfp_seconds / audio_duration_seconds / audio_rtf / audio_frames with labels {stage, replica, model_name}. audio_ttfp is "time to first audio packet/frame," semantically distinct from upstream vllm:time_to_first_token_seconds{stage="talker"} ("time to first audio token (code)") — different units, exposed independently.
G2 (Visual) — Add image_ttfp_seconds / image_num / image_generation_time_seconds / video_duration_seconds / video_rtf / video_generation_time_seconds. Introduces "video content duration vs. generation time" and recognizes that the image path has no RTF concept, so its stage time is exposed directly. PR Add Prometheus /metrics Support #3362's four ms-level diffusion timings are reused, not duplicated.
G3 (Transfer) — Add transfer_size_bytes / transfer_tx_time_ms / transfer_rx_decode_time_ms / transfer_in_flight_time_ms with labels {model_name, from_stage, from_replica, to_stage, to_replica}. The _ms suffix matches the source fields on TransferEdgeStats.*_ms and follows the same style as PR Add Prometheus /metrics Support #3362's diffusion_*_time_ms.
G4 (Single source of truth with bench CLI) — New metric names / units / formulas match MultiModalsBenchmarkMetrics; a shared definition module is imported by both server and bench.
G5 (Naming) — "Co-position, different-name" across the three modalities for time-to-first-output: vllm:time_to_first_token_seconds (text — reuse upstream) / vllm:omni_audio_ttfp_seconds / vllm:omni_image_ttfp_seconds. No single metric with a modality label.
G6 (Pipeline completion Counter) — Add vllm:omni_requests_success_total{finished_reason=...}, Pipeline-global, with finished_reason mirroring upstream completion values (stop / length / abort / ...).
G7 (Stage replica) — OmniPrometheusStatLogger wraps the upstream PrometheusStatLogger. It uses engine_indexes = range(num_stages × num_replicas) for flat numbering and reshapes the single engine label into a stage + replica pair. All ~37 vllm:* families automatically gain per-(stage, replica) dimensions — including text-path TTFT / ITL / TPOT / e2e, which therefore do not need to be re-implemented on the omni side.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

Introduce vllm_omni/metrics/definitions.py as the single source of truth for metric family names, label sets, histogram buckets, and RTF formulas. Server-side prometheus.py and bench-side MultiModalsBenchmark Metrics now consume the same constants, eliminating the dual-track naming drift between /metrics output and bench CLI reports. Pre-work for the upcoming RFC extensions: per-modality audio/image/ video families (G1/G2), cross-stage transfer family (G3), and the OmniPrometheusStatLogger wrap that relabels engine into stage+replica (G7). All refactors are name-only; PR vllm-project#3362's existing 10 prometheus tests pass unchanged.

Phase 2.1 of multi-replica observability (RFC §3.2.7): introduce three wrapper metric classes (_RelabelGauge / _RelabelCounter / _RelabelHistogram) that intercept labelnames at family creation and translate .labels(engine=idx, ...) kwargs into stage/replica via a process-level engine→(stage, replica) map. These are the ingredients for OmniPrometheusStatLogger (Phase 2.2), which will populate the engine map and slot the wrappers into upstream's _gauge_cls / _counter_cls / _histogram_cls hooks. Shipped standalone with 18 unit tests so the relabel logic can be vetted before the StatLogger subclass lands.

…l mixin Phase 2.1 only intercepted .labels(engine=int_idx, ...) kwarg form. Upstream PrometheusStatLogger has four .labels() call sites for engine-bearing families: loggers.py:510 kwarg form, int engine (gauge_engine_sleep_state) loggers.py:646 positional, str engine (counter_prompt_tokens_by_source) loggers.py:679 positional, str engine (counter_request_success_base) loggers.py:1056 kwarg form, str engine (info_gauge via metrics_info) Track the original engine label index at family creation so positional .labels() can splice (stage, replica) at the right offset, and accept both int and str engine values in either form. Replaces the narrow _rewrite_label_kwargs helper with _engine_to_stage_replica which is shared by both code paths. Adds tests for positional engine at middle index (str + int values), str-form kwarg engine, and child metric non-recursion.

Phase 2.2b of multi-replica observability (RFC §3.2.7). Subclass upstream PrometheusStatLogger to expose ~37 vllm:* metric families with {stage, replica} labels instead of the single {engine} label. Three pieces of glue: 1. Class-level slot overrides (_gauge_cls / _counter_cls / _histogram_cls) plug the relabel mixin wrappers into upstream's family-construction path so labelnames are rewritten as families are created. 2. A property descriptor on per_engine_labelvalues intercepts upstream's loggers.py:433 self-assignment and rewrites each [model_name, str(idx)] tuple to [model_name, stage, replica] so downstream create_metric_per_engine feeds 3-element labelvalues into the 3-label families. 3. __init__ accepts a stage_replica_map: dict[int, tuple[str, str]] and populates the process-level _ENGINE_INDEX_MAP that the wrappers consult on every .labels() call. Map is cleared first so re-init in the same process (tests, orchestrator restart) starts clean. Dynamic add/remove of replicas at runtime is intentionally out of scope.

Phase 2.3 of multi-replica observability (RFC §3.2.7) — completes the end-to-end path so the ~37 vllm:* metric families now expose per-(stage, replica) series. Three targeted edits to orchestrator.py: 1. Build a flat stage_replica_map at __init__ by walking stage_pools and assigning each (stage_id, replica_id) tuple a monotonically increasing engine_idx via prefix-sum across pools. The reverse map _stage_replica_to_engine_idx is consulted at record() time to translate the orchestrator's (stage_id, replica_id) loop variables back into the flat engine_idx upstream expects. 2. Swap PrometheusStatLogger for OmniPrometheusStatLogger and pass stage_replica_map instead of the old range(num_stages) engine_indexes. 3. record() now passes the per-replica flat engine_idx instead of just stage_id, so multi-replica deployments emit distinct {stage, replica} series per replica rather than aggregating them. The stage label currently uses str(stage_id); migrating to semantic names ("thinker", "talker", "diffusion") is a separate config schema change deferred to a later iteration.

Phase 3.1 of multi-modal observability (RFC G1/G2). Per-modality business-semantic Prometheus families with {model_name, stage, replica} labels: audio: ttfp_seconds, duration_seconds, rtf, frames (Counter) image: ttfp_seconds, num (Counter), generation_time_seconds video: generation_time_seconds video_duration_seconds and video_rtf are intentionally deferred — diffusion video pipelines (i2v / t2v / cogvideo / hunyuan / wan) expose num_frames + fps in heterogeneous shapes and a clean abstraction is out of scope for this iteration. Text-path metrics (TTFT/ITL/TPOT) are NOT here — they come from the upstream vllm:*{stage="thinker", ...} families exposed by the Phase 2 OmniPrometheusStatLogger wrap. RFC §3.2.6 single-source naming via definitions.py constants; RTF families use RTF_BUCKETS, time families use SECONDS_BUCKETS. observe APIs accept stage/replica at call time since one OmniModalityMetrics instance per pipeline serves all (stage, replica) combinations.

Phase 3.2 of multi-modal observability (RFC G1/G2). Emit per-modality Prometheus observations at request finalize for 7 of the 8 modality families: audio: frames (Counter), duration_seconds, rtf image: num (Counter), generation_time_seconds, ttfp_seconds video: generation_time_seconds audio_ttfp_seconds is intentionally NOT here — it requires a first-packet timestamp from the streaming path and lands separately in Phase 3.3. Five connected edits: 1. definitions.py — add resolve_audio_sample_rate(multimodal_output) helper that mirrors the (audio_sample_rate / sample_rate / sampling_rate / sr) fallback chain already used by serving_chat.py for the OpenAI audio response, plus DEFAULT_AUDIO_SAMPLE_RATE=24000. 2. modality.py — add observe_modality_at_finalize() module-level routing function. Extracted from omni_base so unit tests can exercise the routing logic without importing the heavy AsyncOmniBase stack. 3. orchestrator.py — add replica_id to _route_output() signature and into the "type=output" / "type=stage_metrics" queue dicts so the downstream finalize hook in omni_base can emit per-replica labels. The error path at line 867 intentionally omits replica_id; the modality routing function defensive-skips when None. 4. omni_base.py — instantiate self.mod_metrics (OmniModalityMetrics) alongside self.prom_metrics; call observe_modality_at_finalize() inside the existing e2e_done finalize guard so it fires once per request and inherits the try/except isolation. 5. tests/metrics/test_modality.py — 9 routing tests covering all four output_type branches, sample-rate resolution, ttfp clamping on clock skew, and defensive skips for None replica_id / stage_metrics. Uses a stub mod_metrics that records call signatures so we don't need a live PrometheusRegistry for routing assertions.

…ervation Phase 3.3 of multi-modal observability (RFC G1) — completes the 8th modality family. Unlike the other 7 (finalize-path), audio_ttfp must be measured at the moment the first audio packet leaves the engine, not at request finalize. Five edits: 1. ClientRequestState — add request_arrival_ts (wall-clock anchor for the TTFP delta) and first_audio_ts (once-per-request guard). 2. async_omni.generate() — populate req_state.request_arrival_ts = wall_start_ts right after creating the ClientRequestState, before the orchestrator accepts the request. 3. modality.observe_audio_first_packet() — module-level helper that computes max(now_ts - arrival_ts, 0.0) and emits audio_ttfp_seconds. Defensive-skips when replica_id is None or arrival_ts == 0 (uninitialized). The once-per-request guard lives in the caller. 4. serving_chat.chat_completion_stream_generator() — HTTP SSE audio branch checks req_state.first_audio_ts and observes on first hit. Looks up replica via stage_pools[stage_id].get_bound_replica_id() so the metric carries the right per-replica label. 5. serving_video_stream._process_query_engine() — WebSocket path inserts the same guard + observe right where t_first_audio is already set, so the existing first-packet detection logic is reused. Caller-side guards (req_state.first_audio_ts is None) prevent re-observe on subsequent audio chunks for the same request_id. Both hook sites defensive-skip if request_states or stage_pools is missing. 4 new helper unit tests (valid inputs / replica=None skip / arrival_ts=0 skip / clock-skew clamp to 0). Total 26 modality tests pass.

Phase 4 of multi-modal observability (RFC G3). Four Histogram families with {model_name, from_stage, from_replica, to_stage, to_replica} labels, emitted from the existing TransferEdgeStats accumulators in OrchestratorAggregator: transfer_size_bytes -- BYTES_BUCKETS, observed at TX hook transfer_tx_time_ms -- MS_BUCKETS, observed at TX hook transfer_rx_decode_time_ms -- MS_BUCKETS, observed at RX hook transfer_in_flight_time_ms -- MS_BUCKETS, observed at RX hook Each .observe_*() corresponds to one physical transfer event (one chunk hop) so the histogram tracks per-transfer distribution rather than request-aggregated sums. Two non-trivial design choices, both deviating from RFC §3.2.6: 1. Add model_name to the label set (RFC lists only the four stage/replica labels). Rationale: aligns transfer with the rest of the omni_* families (audio_*, image_*, video_*, num_requests_*) so PromQL joins on model_name work uniformly. Cardinality cost is zero for typical single-model deployments. To be amended back into the RFC as D11. 2. Resolve from_replica / to_replica by consulting the sticky-routing binding the orchestrator already maintains, rather than plumbing replica ids through TransferEdgeStats / StageRequestStats / connector adapters. Phase 2 (PR vllm-project#2396) gave us stage_pool.get_bound_replica_id(request_id), which returns the replica each request is bound to within a stage; we look it up at emit time. Five files would need touching otherwise (TransferEdgeStats, StageRequestStats, stage_pool.build_stage_metrics, adapter.on_forward call site, request bookkeeping). To be amended back into the RFC as D12. Five edits: 1. definitions.py — TRANSFER_LABELS prepends model_name. 2. transfer.py — new OmniTransferMetrics(model_name) class with the four Histogram families and one observe method per family. Same wrapper pattern as OmniPrometheusMetrics (PR vllm-project#3362) and OmniModalityMetrics (Phase 3): bind model_name at __init__, accept stage/replica at observe time. 3. stats.py — OrchestratorAggregator gains two optional kwargs at __init__ (transfer_emitter, replica_resolver). record_transfer_tx and record_transfer_rx call thin _emit_transfer_tx / _emit_transfer_rx helpers after the existing accumulation, which fail-safe (skip emit) when either dep is missing or the resolver returns None for either side. The TransferEdgeStats accumulation path is unchanged so existing log/aggregation consumers stay intact. 4. omni_base.py — instantiate self.transfer_metrics alongside self.prom_metrics / self.mod_metrics. 5. async_omni.py — wire transfer_emitter and replica_resolver into the per-request OrchestratorMetrics construction. New _resolve_transfer_replica helper looks up self.engine.stage_pools[s].get_bound_replica_id(rid). 15 unit tests cover the 4-family registration, observe APIs, bucket selection, multi-edge cardinality, plus the OrchestratorAggregator emit hooks (tx + rx full emit, defensive skips for emitter=None / resolver=None / one-side-resolves-to-None, rx skips stage 0). Stub emitter records call signatures so the routing logic is asserted without standing up a Prometheus registry. Note: TX-side hook (record_transfer_tx) is emit-ready but adapter.try_send_via_connector — its only caller — currently has no upstream invocation in the main code path (likely plugin-driven or about-to-land). RX side (record_transfer_rx) is the active path today; TX side will start emitting as soon as the connector pipeline activates.

…al{finished_reason} Phase 5 of multi-modal observability (RFC G6). Collapse the two PR-vllm-project#3362-era pipeline-level counters vllm:omni_num_requests_success vllm:omni_num_requests_fail into a single per-reason Counter: vllm:omni_requests_success_total{model_name, finished_reason} with finished_reason ∈ {stop, length, abort, ...} mirroring upstream vllm:request_success_total. The previous "fail" path now maps to finished_reason="abort" so the Pipeline-level success-rate dashboard becomes one query (sum by (finished_reason) ...) instead of needing two metrics joined. Breaking change: dashboards / alerts that reference the old num_requests_success / num_requests_fail counter names must migrate to the new requests_success_total{finished_reason} family. Four edits: 1. definitions.py — drop NUM_REQUESTS_SUCCESS, NUM_REQUESTS_FAIL. Add REQUESTS_SUCCESS (passed without _total since Counter auto-suffixes at exposition). 2. prometheus.py — replace _success_family + _fail_family with a single _completion_family using SUCCESS_LABELS = ("model_name", "finished_reason"). The label is bound per-call rather than at __init__ time. - OmniPrometheusMetrics.request_succeeded() gains a finished_reason="stop" default kwarg. - OmniPrometheusMetrics.request_failed() keeps its no-arg signature for back-compat with the cleanup call site, internally mapping to finished_reason="abort". 3. omni_base.py — at the request-finalize hook, extract finish_reason from engine_outputs.outputs[0].finish_reason (vLLM CompletionOutput convention) and pass it through to request_succeeded(). Falls back to "stop" when no completion output is present (defensive — e.g. diffusion stages). 4. tests/metrics/test_prometheus.py — refresh _PIPELINE_METRICS, exercise three reason buckets (stop x 2 + length x 1 + abort x 1) in the scrape fixture, replace the old success/fail-count assertions with per-reason ones, and adjust the histogram-count assertions (e2e/queue go from 2 to 3 since the abort path now only inc's the counter and doesn't observe latency).

PR vllm-project#3362 introduced docs/usage/metrics.md and docs/design/metrics.md covering its own pipeline-level + diffusion families. The follow-up work in this branch (G1 audio, G2 image/video, G3 cross-stage transfer, G6 success/fail merge into requests_success_total{finished_reason}, G7 OmniPrometheusStatLogger per-replica wrap) wasn't reflected. Refresh both docs to the final state. usage/metrics.md (+117 / -38): - Request Tracking table swaps num_requests_success / num_requests_fail for the merged requests_success_total{finished_reason} family. - New sections for Modality (audio / image / video) and Cross-Stage Transfer with the per-modality / per-edge metric tables. - vLLM Engine Metrics section gains a before/after example showing how the G7 wrap reshapes engine -> stage + replica, with a note that the ~37 upstream families gain per-replica visibility automatically. - Diffusion Engine Metrics section clarifies that omni-side diffusion families bypass the wrap (engine label = stage_id, not relabelled). - Pipeline Type availability matrix gains modality / transfer / per-replica rows. - Naming Convention section explains the RFC "co-position, different name" three-modality TTFP convention. design/metrics.md (+260 / -74): - Objectives section calls out the per-replica + modality + transfer goals introduced after PR vllm-project#3362. - Architecture diagram replaces the original two-component picture with the four-component one (OmniPrometheusMetrics + OmniModalityMetrics + OmniTransferMetrics + OmniPrometheusStatLogger). - Data Flow grows from two paths to four: Pipeline-level, Modality (finalize hook + audio_ttfp streaming hook), Cross-stage transfer (sticky-routing replica_resolver), and the wrapped per-engine path. The transfer subsection notes the TX-side hook is wired but currently inactive pending try_send_via_connector being called from the main code path. - New "OmniPrometheusStatLogger Wrap (G7)" section explains the four upstream impedance points and the three coordinated mechanisms (class-level metric class slots, per_engine_labelvalues property descriptor, _RelabelMixin.labels() override) used to handle them. Also documents the stage_replica_map construction and the deferred dynamic add/remove decision. - Metric Definitions table refreshed end-to-end: pipeline-level row now lists requests_success_total{finished_reason}; new modality and transfer subtables added; transfer subtable carries a note that model_name was added on top of the RFC-listed four stage/replica labels for cross-omni consistency. - Logging vs. Prometheus section expanded to mention OmniModality / OmniTransfer alongside OmniPrometheusMetrics, and notes that OrchestratorAggregator.record_transfer_tx/rx fans out to both the log accumulator and the Prometheus emit hook in one method body.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56afe670b8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T11:10:58Z

+        total = len(self.request_states)
+        self.prom_metrics.set_running(running)
+        self.prom_metrics.set_waiting(max(0, total - running))


Recompute waiting gauge after request-state cleanup

total = len(self.request_states) is sampled before the completed request is removed from self.request_states, while the orchestrator may already have decremented _running_counter; for a single completed request this yields running=0, total=1 and publishes omni_num_requests_waiting=1 even though the system is idle. Because _log_summary_and_cleanup() pops the request after this point without refreshing gauges, the stale waiting value can persist until another request arrives.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-13T11:10:58Z

+    pass
+
+
+_vllm_prometheus.unregister_vllm_metrics = _noop_unregister_vllm_metrics


Keep collector cleanup to avoid duplicate metric registration

Replacing unregister_vllm_metrics() with a no-op means previously registered vllm:* collectors are never removed, so creating another engine in the same process (e.g., tests that instantiate multiple engines, hot-reload, or multi-tenant hosting) can fail at startup with duplicated Prometheus timeseries registration errors. This patch should preserve upstream cleanup for vllm:* families and only protect vllm:omni_* collectors from removal.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-05-14T06:58:55Z

how is this PR related to #3069 ? cc @bjf-frz @fake0fan

…mismatch Phase 2.4 — two bugs in the G7 wrap surfaced when bringing the server up against a real PrometheusStatLogger / qwen3-omni deployment, both caused by paths the unit tests didn't reach against the stub upstream: 1. Helper-class label-count mismatch (orchestrator init crash with `ValueError: Incorrect label count`). Upstream PrometheusStatLogger.__init__ instantiates three sub-helper collectors at loggers.py:438-446: self.spec_decoding_prom = self._spec_decoding_cls( vllm_config.speculative_config, labelnames, per_engine_labelvalues) self.kv_connector_prom = self._kv_connector_cls( vllm_config, labelnames, per_engine_labelvalues) self.perf_metrics_prom = self._perf_metrics_cls( vllm_config, labelnames, per_engine_labelvalues) Each helper takes raw `labelnames` as a constructor argument and builds its internal Counter/Gauge/Histogram families via its own class-level `_counter_cls` / `_gauge_cls` / `_histogram_cls` slots. The slot overrides on OmniPrometheusStatLogger only reach families created via *its* slots, so the helpers still build 2-label families and then crash when create_metric_per_engine fans the rewritten 3-element per_engine_labelvalues into them. Fix: subclass each helper (`_OmniPerfMetricsProm`, `_OmniSpecDecodingProm`, `_OmniKVConnectorProm`) and override their cls slots to the relabel-mixin wrappers, then assign these subclasses to `_perf_metrics_cls` / `_spec_decoding_cls` / `_kv_connector_cls` on OmniPrometheusStatLogger so the helpers construct families through the same labelname-rewrite path. 2. Double-rewrite in mixin's positional-args path (also surfaces as `Incorrect label count` after the helpers were patched). Phase 2.2b's per_engine_labelvalues property setter already rewrites each entry from [model_name, str(idx)] to [model_name, stage, replica]. create_metric_per_engine then fans those into `metric.labels(*values)` — i.e. the wrapper's positional .labels() path receives 3 already-rewritten values. The old splice logic would then re-interpret args[engine_label_index=1] (now "stage") as an engine_idx and splice (stage, replica) again, blowing the label count to 4. Fix: detect `len(args) == len(self._labelnames)` at the start of the positional path and short-circuit to passthrough — the caller has already shaped the values to the rewritten family. The legacy path (e.g. counter_request_success.labels(model_name, str(idx), reason) at loggers.py:679) keeps working because its arg count is short by one relative to the rewritten labelnames, so the splice still triggers as before. Tests: - New `TestHelperClassWraps` asserts the four cls-slot overrides on OmniPrometheusStatLogger and the Counter/Gauge/Histogram slots on each helper subclass. - New `TestDoubleRewriteGuard` covers both branches: a 2-label original family receiving 3 already-rewritten values (passthrough) and a 3-label original (engine in middle) receiving 3 ORIGINAL-shape values (splice path still fires). - Updated the docstring on `test_labels_positional_passthrough` to call out that it specifically exercises the new guard.

vraiti and others added 12 commits May 13, 2026 12:15

[Feature] Add Prometheus metrics for multi-stage pipelines

662cb68

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: vraiti <vraiti@redhat.com>

LHXuuu requested review from Gaohan123, david6666666, hsliuustc0106, lishunyang12, tzhouam, yenuo26 and ywang96 as code owners May 13, 2026 11:05

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels#3576

[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels#3576
LHXuuu wants to merge 13 commits into
vllm-project:mainfrom
LHXuuu:support_prometheus

LHXuuu commented May 13, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

hsliuustc0106 commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		pass


		_vllm_prometheus.unregister_vllm_metrics = _noop_unregister_vllm_metrics

Conversation

LHXuuu commented May 13, 2026

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hsliuustc0106 commented May 14, 2026 •

edited

Loading