Skip to content

[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels#3576

Open
LHXuuu wants to merge 13 commits into
vllm-project:mainfrom
LHXuuu:support_prometheus
Open

[WIP][Metrics] Extend Prometheus with multi-modal SLOs, cross-stage transfer, and per-replica labels#3576
LHXuuu wants to merge 13 commits into
vllm-project:mainfrom
LHXuuu:support_prometheus

Conversation

@LHXuuu
Copy link
Copy Markdown
Contributor

@LHXuuu LHXuuu commented May 13, 2026

Purpose

Implements RFC #3545 on top of PR #3362's /metrics baseline, layering seven incremental goals:

  • G1 (Audio) — Add audio_ttfp_seconds / audio_duration_seconds / audio_rtf / audio_frames with labels {stage, replica, model_name}. audio_ttfp is "time to first audio packet/frame," semantically distinct from upstream vllm:time_to_first_token_seconds{stage="talker"} ("time to first audio token (code)") — different units, exposed independently.
  • G2 (Visual) — Add image_ttfp_seconds / image_num / image_generation_time_seconds / video_duration_seconds / video_rtf / video_generation_time_seconds. Introduces "video content duration vs. generation time" and recognizes that the image path has no RTF concept, so its stage time is exposed directly. PR Add Prometheus /metrics Support #3362's four ms-level diffusion timings are reused, not duplicated.
  • G3 (Transfer) — Add transfer_size_bytes / transfer_tx_time_ms / transfer_rx_decode_time_ms / transfer_in_flight_time_ms with labels {model_name, from_stage, from_replica, to_stage, to_replica}. The _ms suffix matches the source fields on TransferEdgeStats.*_ms and follows the same style as PR Add Prometheus /metrics Support #3362's diffusion_*_time_ms.
  • G4 (Single source of truth with bench CLI) — New metric names / units / formulas match MultiModalsBenchmarkMetrics; a shared definition module is imported by both server and bench.
  • G5 (Naming) — "Co-position, different-name" across the three modalities for time-to-first-output: vllm:time_to_first_token_seconds (text — reuse upstream) / vllm:omni_audio_ttfp_seconds / vllm:omni_image_ttfp_seconds. No single metric with a modality label.
  • G6 (Pipeline completion Counter) — Add vllm:omni_requests_success_total{finished_reason=...}, Pipeline-global, with finished_reason mirroring upstream completion values (stop / length / abort / ...).
  • G7 (Stage replica)OmniPrometheusStatLogger wraps the upstream PrometheusStatLogger. It uses engine_indexes = range(num_stages × num_replicas) for flat numbering and reshapes the single engine label into a stage + replica pair. All ~37 vllm:* families automatically gain per-(stage, replica) dimensions — including text-path TTFT / ITL / TPOT / e2e, which therefore do not need to be re-implemented on the omni side.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md

vraiti and others added 12 commits May 13, 2026 12:15
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: vraiti <vraiti@redhat.com>
Introduce vllm_omni/metrics/definitions.py as the single source of
truth for metric family names, label sets, histogram buckets, and RTF
formulas. Server-side prometheus.py and bench-side MultiModalsBenchmark
Metrics now consume the same constants, eliminating the dual-track
naming drift between /metrics output and bench CLI reports.

Pre-work for the upcoming RFC extensions: per-modality audio/image/
video families (G1/G2), cross-stage transfer family (G3), and the
OmniPrometheusStatLogger wrap that relabels engine into stage+replica
(G7). All refactors are name-only; PR vllm-project#3362's existing 10 prometheus
tests pass unchanged.
Phase 2.1 of multi-replica observability (RFC §3.2.7): introduce three
wrapper metric classes (_RelabelGauge / _RelabelCounter / _RelabelHistogram)
that intercept labelnames at family creation and translate
.labels(engine=idx, ...) kwargs into stage/replica via a process-level
engine→(stage, replica) map.

These are the ingredients for OmniPrometheusStatLogger (Phase 2.2),
which will populate the engine map and slot the wrappers into upstream's
_gauge_cls / _counter_cls / _histogram_cls hooks. Shipped standalone
with 18 unit tests so the relabel logic can be vetted before the
StatLogger subclass lands.
…l mixin

Phase 2.1 only intercepted .labels(engine=int_idx, ...) kwarg form.
Upstream PrometheusStatLogger has four .labels() call sites for
engine-bearing families:

  loggers.py:510  kwarg form, int engine    (gauge_engine_sleep_state)
  loggers.py:646  positional, str engine    (counter_prompt_tokens_by_source)
  loggers.py:679  positional, str engine    (counter_request_success_base)
  loggers.py:1056 kwarg form, str engine    (info_gauge via metrics_info)

Track the original engine label index at family creation so positional
.labels() can splice (stage, replica) at the right offset, and accept
both int and str engine values in either form. Replaces the narrow
_rewrite_label_kwargs helper with _engine_to_stage_replica which is
shared by both code paths.

Adds tests for positional engine at middle index (str + int values),
str-form kwarg engine, and child metric non-recursion.
Phase 2.2b of multi-replica observability (RFC §3.2.7). Subclass upstream
PrometheusStatLogger to expose ~37 vllm:* metric families with
{stage, replica} labels instead of the single {engine} label.

Three pieces of glue:

1. Class-level slot overrides (_gauge_cls / _counter_cls / _histogram_cls)
   plug the relabel mixin wrappers into upstream's family-construction path
   so labelnames are rewritten as families are created.

2. A property descriptor on per_engine_labelvalues intercepts upstream's
   loggers.py:433 self-assignment and rewrites each [model_name, str(idx)]
   tuple to [model_name, stage, replica] so downstream create_metric_per_engine
   feeds 3-element labelvalues into the 3-label families.

3. __init__ accepts a stage_replica_map: dict[int, tuple[str, str]] and
   populates the process-level _ENGINE_INDEX_MAP that the wrappers consult
   on every .labels() call. Map is cleared first so re-init in the same
   process (tests, orchestrator restart) starts clean.

Dynamic add/remove of replicas at runtime is intentionally out of scope.
Phase 2.3 of multi-replica observability (RFC §3.2.7) — completes the
end-to-end path so the ~37 vllm:* metric families now expose
per-(stage, replica) series.

Three targeted edits to orchestrator.py:

1. Build a flat stage_replica_map at __init__ by walking stage_pools and
   assigning each (stage_id, replica_id) tuple a monotonically increasing
   engine_idx via prefix-sum across pools. The reverse map
   _stage_replica_to_engine_idx is consulted at record() time to translate
   the orchestrator's (stage_id, replica_id) loop variables back into the
   flat engine_idx upstream expects.

2. Swap PrometheusStatLogger for OmniPrometheusStatLogger and pass
   stage_replica_map instead of the old range(num_stages) engine_indexes.

3. record() now passes the per-replica flat engine_idx instead of just
   stage_id, so multi-replica deployments emit distinct
   {stage, replica} series per replica rather than aggregating them.

The stage label currently uses str(stage_id); migrating to semantic names
("thinker", "talker", "diffusion") is a separate config schema change
deferred to a later iteration.
Phase 3.1 of multi-modal observability (RFC G1/G2). Per-modality
business-semantic Prometheus families with {model_name, stage, replica}
labels:

  audio: ttfp_seconds, duration_seconds, rtf, frames (Counter)
  image: ttfp_seconds, num (Counter), generation_time_seconds
  video: generation_time_seconds

video_duration_seconds and video_rtf are intentionally deferred —
diffusion video pipelines (i2v / t2v / cogvideo / hunyuan / wan)
expose num_frames + fps in heterogeneous shapes and a clean abstraction
is out of scope for this iteration.

Text-path metrics (TTFT/ITL/TPOT) are NOT here — they come from the
upstream vllm:*{stage="thinker", ...} families exposed by the Phase 2
OmniPrometheusStatLogger wrap. RFC §3.2.6 single-source naming via
definitions.py constants; RTF families use RTF_BUCKETS, time families
use SECONDS_BUCKETS. observe APIs accept stage/replica at call time
since one OmniModalityMetrics instance per pipeline serves all
(stage, replica) combinations.
Phase 3.2 of multi-modal observability (RFC G1/G2). Emit per-modality
Prometheus observations at request finalize for 7 of the 8 modality
families:

  audio: frames (Counter), duration_seconds, rtf
  image: num (Counter), generation_time_seconds, ttfp_seconds
  video: generation_time_seconds

audio_ttfp_seconds is intentionally NOT here — it requires a
first-packet timestamp from the streaming path and lands separately
in Phase 3.3.

Five connected edits:

1. definitions.py — add resolve_audio_sample_rate(multimodal_output)
   helper that mirrors the (audio_sample_rate / sample_rate /
   sampling_rate / sr) fallback chain already used by serving_chat.py
   for the OpenAI audio response, plus DEFAULT_AUDIO_SAMPLE_RATE=24000.

2. modality.py — add observe_modality_at_finalize() module-level
   routing function. Extracted from omni_base so unit tests can
   exercise the routing logic without importing the heavy AsyncOmniBase
   stack.

3. orchestrator.py — add replica_id to _route_output() signature and
   into the "type=output" / "type=stage_metrics" queue dicts so the
   downstream finalize hook in omni_base can emit per-replica labels.
   The error path at line 867 intentionally omits replica_id; the
   modality routing function defensive-skips when None.

4. omni_base.py — instantiate self.mod_metrics (OmniModalityMetrics)
   alongside self.prom_metrics; call observe_modality_at_finalize()
   inside the existing e2e_done finalize guard so it fires once per
   request and inherits the try/except isolation.

5. tests/metrics/test_modality.py — 9 routing tests covering all
   four output_type branches, sample-rate resolution, ttfp clamping
   on clock skew, and defensive skips for None replica_id /
   stage_metrics. Uses a stub mod_metrics that records call signatures
   so we don't need a live PrometheusRegistry for routing assertions.
…ervation

Phase 3.3 of multi-modal observability (RFC G1) — completes the 8th
modality family. Unlike the other 7 (finalize-path), audio_ttfp must
be measured at the moment the first audio packet leaves the engine,
not at request finalize.

Five edits:

1. ClientRequestState — add request_arrival_ts (wall-clock anchor for
   the TTFP delta) and first_audio_ts (once-per-request guard).

2. async_omni.generate() — populate req_state.request_arrival_ts =
   wall_start_ts right after creating the ClientRequestState, before
   the orchestrator accepts the request.

3. modality.observe_audio_first_packet() — module-level helper that
   computes max(now_ts - arrival_ts, 0.0) and emits audio_ttfp_seconds.
   Defensive-skips when replica_id is None or arrival_ts == 0
   (uninitialized). The once-per-request guard lives in the caller.

4. serving_chat.chat_completion_stream_generator() — HTTP SSE audio
   branch checks req_state.first_audio_ts and observes on first hit.
   Looks up replica via stage_pools[stage_id].get_bound_replica_id()
   so the metric carries the right per-replica label.

5. serving_video_stream._process_query_engine() — WebSocket path
   inserts the same guard + observe right where t_first_audio is
   already set, so the existing first-packet detection logic is reused.

Caller-side guards (req_state.first_audio_ts is None) prevent re-observe
on subsequent audio chunks for the same request_id. Both hook sites
defensive-skip if request_states or stage_pools is missing.

4 new helper unit tests (valid inputs / replica=None skip / arrival_ts=0
skip / clock-skew clamp to 0). Total 26 modality tests pass.
Phase 4 of multi-modal observability (RFC G3). Four Histogram families
with {model_name, from_stage, from_replica, to_stage, to_replica} labels,
emitted from the existing TransferEdgeStats accumulators in
OrchestratorAggregator:

  transfer_size_bytes          -- BYTES_BUCKETS, observed at TX hook
  transfer_tx_time_ms          -- MS_BUCKETS,    observed at TX hook
  transfer_rx_decode_time_ms   -- MS_BUCKETS,    observed at RX hook
  transfer_in_flight_time_ms   -- MS_BUCKETS,    observed at RX hook

Each .observe_*() corresponds to one physical transfer event (one chunk
hop) so the histogram tracks per-transfer distribution rather than
request-aggregated sums.

Two non-trivial design choices, both deviating from RFC §3.2.6:

1. Add model_name to the label set (RFC lists only the four stage/replica
   labels). Rationale: aligns transfer with the rest of the omni_*
   families (audio_*, image_*, video_*, num_requests_*) so PromQL joins
   on model_name work uniformly. Cardinality cost is zero for typical
   single-model deployments. To be amended back into the RFC as D11.

2. Resolve from_replica / to_replica by consulting the sticky-routing
   binding the orchestrator already maintains, rather than plumbing
   replica ids through TransferEdgeStats / StageRequestStats / connector
   adapters. Phase 2 (PR vllm-project#2396) gave us
   stage_pool.get_bound_replica_id(request_id), which returns the replica
   each request is bound to within a stage; we look it up at emit time.
   Five files would need touching otherwise (TransferEdgeStats,
   StageRequestStats, stage_pool.build_stage_metrics, adapter.on_forward
   call site, request bookkeeping). To be amended back into the RFC as D12.

Five edits:

1. definitions.py — TRANSFER_LABELS prepends model_name.

2. transfer.py — new OmniTransferMetrics(model_name) class with the four
   Histogram families and one observe method per family. Same wrapper
   pattern as OmniPrometheusMetrics (PR vllm-project#3362) and OmniModalityMetrics
   (Phase 3): bind model_name at __init__, accept stage/replica at
   observe time.

3. stats.py — OrchestratorAggregator gains two optional kwargs at
   __init__ (transfer_emitter, replica_resolver). record_transfer_tx
   and record_transfer_rx call thin _emit_transfer_tx / _emit_transfer_rx
   helpers after the existing accumulation, which fail-safe (skip emit)
   when either dep is missing or the resolver returns None for either
   side. The TransferEdgeStats accumulation path is unchanged so
   existing log/aggregation consumers stay intact.

4. omni_base.py — instantiate self.transfer_metrics alongside
   self.prom_metrics / self.mod_metrics.

5. async_omni.py — wire transfer_emitter and replica_resolver into the
   per-request OrchestratorMetrics construction. New
   _resolve_transfer_replica helper looks up
   self.engine.stage_pools[s].get_bound_replica_id(rid).

15 unit tests cover the 4-family registration, observe APIs, bucket
selection, multi-edge cardinality, plus the OrchestratorAggregator emit
hooks (tx + rx full emit, defensive skips for emitter=None /
resolver=None / one-side-resolves-to-None, rx skips stage 0). Stub
emitter records call signatures so the routing logic is asserted
without standing up a Prometheus registry.

Note: TX-side hook (record_transfer_tx) is emit-ready but
adapter.try_send_via_connector — its only caller — currently has no
upstream invocation in the main code path (likely plugin-driven or
about-to-land). RX side (record_transfer_rx) is the active path today;
TX side will start emitting as soon as the connector pipeline activates.
…al{finished_reason}

Phase 5 of multi-modal observability (RFC G6). Collapse the two
PR-vllm-project#3362-era pipeline-level counters

  vllm:omni_num_requests_success
  vllm:omni_num_requests_fail

into a single per-reason Counter:

  vllm:omni_requests_success_total{model_name, finished_reason}

with finished_reason ∈ {stop, length, abort, ...} mirroring upstream
vllm:request_success_total. The previous "fail" path now maps to
finished_reason="abort" so the Pipeline-level success-rate dashboard
becomes one query (sum by (finished_reason) ...) instead of needing
two metrics joined.

Breaking change: dashboards / alerts that reference the old
num_requests_success / num_requests_fail counter names must migrate
to the new requests_success_total{finished_reason} family.

Four edits:

1. definitions.py — drop NUM_REQUESTS_SUCCESS, NUM_REQUESTS_FAIL.
   Add REQUESTS_SUCCESS (passed without _total since Counter
   auto-suffixes at exposition).

2. prometheus.py — replace _success_family + _fail_family with a
   single _completion_family using SUCCESS_LABELS = ("model_name",
   "finished_reason"). The label is bound per-call rather than at
   __init__ time.

   - OmniPrometheusMetrics.request_succeeded() gains a
     finished_reason="stop" default kwarg.
   - OmniPrometheusMetrics.request_failed() keeps its no-arg signature
     for back-compat with the cleanup call site, internally mapping
     to finished_reason="abort".

3. omni_base.py — at the request-finalize hook, extract
   finish_reason from engine_outputs.outputs[0].finish_reason
   (vLLM CompletionOutput convention) and pass it through to
   request_succeeded(). Falls back to "stop" when no completion
   output is present (defensive — e.g. diffusion stages).

4. tests/metrics/test_prometheus.py — refresh _PIPELINE_METRICS,
   exercise three reason buckets (stop x 2 + length x 1 + abort x 1)
   in the scrape fixture, replace the old success/fail-count assertions
   with per-reason ones, and adjust the histogram-count assertions
   (e2e/queue go from 2 to 3 since the abort path now only inc's the
   counter and doesn't observe latency).
PR vllm-project#3362 introduced docs/usage/metrics.md and docs/design/metrics.md
covering its own pipeline-level + diffusion families. The follow-up
work in this branch (G1 audio, G2 image/video, G3 cross-stage transfer,
G6 success/fail merge into requests_success_total{finished_reason},
G7 OmniPrometheusStatLogger per-replica wrap) wasn't reflected. Refresh
both docs to the final state.

usage/metrics.md (+117 / -38):

- Request Tracking table swaps num_requests_success / num_requests_fail
  for the merged requests_success_total{finished_reason} family.
- New sections for Modality (audio / image / video) and Cross-Stage
  Transfer with the per-modality / per-edge metric tables.
- vLLM Engine Metrics section gains a before/after example showing how
  the G7 wrap reshapes engine -> stage + replica, with a note that the
  ~37 upstream families gain per-replica visibility automatically.
- Diffusion Engine Metrics section clarifies that omni-side diffusion
  families bypass the wrap (engine label = stage_id, not relabelled).
- Pipeline Type availability matrix gains modality / transfer /
  per-replica rows.
- Naming Convention section explains the RFC "co-position, different
  name" three-modality TTFP convention.

design/metrics.md (+260 / -74):

- Objectives section calls out the per-replica + modality + transfer
  goals introduced after PR vllm-project#3362.
- Architecture diagram replaces the original two-component picture with
  the four-component one (OmniPrometheusMetrics + OmniModalityMetrics +
  OmniTransferMetrics + OmniPrometheusStatLogger).
- Data Flow grows from two paths to four: Pipeline-level, Modality
  (finalize hook + audio_ttfp streaming hook), Cross-stage transfer
  (sticky-routing replica_resolver), and the wrapped per-engine path.
  The transfer subsection notes the TX-side hook is wired but currently
  inactive pending try_send_via_connector being called from the main
  code path.
- New "OmniPrometheusStatLogger Wrap (G7)" section explains the four
  upstream impedance points and the three coordinated mechanisms
  (class-level metric class slots, per_engine_labelvalues property
  descriptor, _RelabelMixin.labels() override) used to handle them.
  Also documents the stage_replica_map construction and the deferred
  dynamic add/remove decision.
- Metric Definitions table refreshed end-to-end: pipeline-level row
  now lists requests_success_total{finished_reason}; new modality and
  transfer subtables added; transfer subtable carries a note that
  model_name was added on top of the RFC-listed four
  stage/replica labels for cross-omni consistency.
- Logging vs. Prometheus section expanded to mention OmniModality /
  OmniTransfer alongside OmniPrometheusMetrics, and notes that
  OrchestratorAggregator.record_transfer_tx/rx fans out to both the
  log accumulator and the Prometheus emit hook in one method body.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56afe670b8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +493 to +495
total = len(self.request_states)
self.prom_metrics.set_running(running)
self.prom_metrics.set_waiting(max(0, total - running))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Recompute waiting gauge after request-state cleanup

total = len(self.request_states) is sampled before the completed request is removed from self.request_states, while the orchestrator may already have decremented _running_counter; for a single completed request this yields running=0, total=1 and publishes omni_num_requests_waiting=1 even though the system is idle. Because _log_summary_and_cleanup() pops the request after this point without refreshing gauges, the stale waiting value can persist until another request arrives.

Useful? React with 👍 / 👎.

Comment thread vllm_omni/patch.py
pass


_vllm_prometheus.unregister_vllm_metrics = _noop_unregister_vllm_metrics
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep collector cleanup to avoid duplicate metric registration

Replacing unregister_vllm_metrics() with a no-op means previously registered vllm:* collectors are never removed, so creating another engine in the same process (e.g., tests that instantiate multiple engines, hot-reload, or multi-tenant hosting) can fail at startup with duplicated Prometheus timeseries registration errors. This patch should preserve upstream cleanup for vllm:* families and only protect vllm:omni_* collectors from removal.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

hsliuustc0106 commented May 14, 2026

how is this PR related to #3069 ? cc @bjf-frz @fake0fan

…mismatch

Phase 2.4 — two bugs in the G7 wrap surfaced when bringing the server
up against a real PrometheusStatLogger / qwen3-omni deployment, both
caused by paths the unit tests didn't reach against the stub upstream:

1. Helper-class label-count mismatch (orchestrator init crash with
   `ValueError: Incorrect label count`).

   Upstream PrometheusStatLogger.__init__ instantiates three sub-helper
   collectors at loggers.py:438-446:

       self.spec_decoding_prom = self._spec_decoding_cls(
           vllm_config.speculative_config, labelnames, per_engine_labelvalues)
       self.kv_connector_prom = self._kv_connector_cls(
           vllm_config, labelnames, per_engine_labelvalues)
       self.perf_metrics_prom = self._perf_metrics_cls(
           vllm_config, labelnames, per_engine_labelvalues)

   Each helper takes raw `labelnames` as a constructor argument and
   builds its internal Counter/Gauge/Histogram families via its own
   class-level `_counter_cls` / `_gauge_cls` / `_histogram_cls` slots.
   The slot overrides on OmniPrometheusStatLogger only reach families
   created via *its* slots, so the helpers still build 2-label
   families and then crash when create_metric_per_engine fans the
   rewritten 3-element per_engine_labelvalues into them.

   Fix: subclass each helper (`_OmniPerfMetricsProm`,
   `_OmniSpecDecodingProm`, `_OmniKVConnectorProm`) and override
   their cls slots to the relabel-mixin wrappers, then assign these
   subclasses to `_perf_metrics_cls` / `_spec_decoding_cls` /
   `_kv_connector_cls` on OmniPrometheusStatLogger so the helpers
   construct families through the same labelname-rewrite path.

2. Double-rewrite in mixin's positional-args path (also surfaces as
   `Incorrect label count` after the helpers were patched).

   Phase 2.2b's per_engine_labelvalues property setter already rewrites
   each entry from [model_name, str(idx)] to [model_name, stage,
   replica]. create_metric_per_engine then fans those into
   `metric.labels(*values)` — i.e. the wrapper's positional .labels()
   path receives 3 already-rewritten values. The old splice logic
   would then re-interpret args[engine_label_index=1] (now "stage")
   as an engine_idx and splice (stage, replica) again, blowing the
   label count to 4.

   Fix: detect `len(args) == len(self._labelnames)` at the start of
   the positional path and short-circuit to passthrough — the caller
   has already shaped the values to the rewritten family. The legacy
   path (e.g. counter_request_success.labels(model_name, str(idx),
   reason) at loggers.py:679) keeps working because its arg count is
   short by one relative to the rewritten labelnames, so the splice
   still triggers as before.

Tests:

- New `TestHelperClassWraps` asserts the four cls-slot overrides on
  OmniPrometheusStatLogger and the Counter/Gauge/Histogram slots on
  each helper subclass.
- New `TestDoubleRewriteGuard` covers both branches: a 2-label original
  family receiving 3 already-rewritten values (passthrough) and a
  3-label original (engine in middle) receiving 3 ORIGINAL-shape
  values (splice path still fires).
- Updated the docstring on `test_labels_positional_passthrough` to
  call out that it specifically exercises the new guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants