[Feat]: add upstream request span and trace context propagation for distributed tracing by HanFa · Pull Request #852 · vllm-project/semantic-router

HanFa · 2025-12-17T08:24:40Z

This change implements the semantic_router.upstream.request span that was previously defined but never used. The span tracks the full lifecycle of requests forwarded to vLLM backends, starting when the routing decision is made and ending when response headers are received. Key attributes including model name, endpoint address, and HTTP status code are recorded on the span.

Additionally, this change enables distributed trace correlation between the semantic router and vLLM by injecting W3C Trace Context headers https://www.w3.org/TR/trace-context/ (traceparent, tracestate) into upstream requests.

An example trace screenshot is

with corresponding config

server:
  port: 50051
  http_port: 8080
  metrics_port: 9190

# Embedding model for semantic cache and classification
bert_model:
  model_id: models/all-MiniLM-L12-v2
  threshold: 0.6
  use_cpu: true

# vLLM Endpoints
vllm_endpoints:
  - name: "local-gpu"
    address: "127.0.0.1"
    port: 8000
    weight: 1

# Model configuration
model_config:
  "auto":
    preferred_endpoints: ["local-gpu"]
  "Qwen/Qwen2.5-7B-Instruct":
    reasoning_family: "qwen3"
    preferred_endpoints: ["local-gpu"]

# Classifier - enables classification span
classifier:
  category_model:
    model_id: "models/category_classifier_modernbert-base_model"
    threshold: 0.6
    use_cpu: true
    category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
    use_modernbert: false
    threshold: 0.7
    use_cpu: true
    pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"

# Prompt Guard - enables jailbreak_detection span
prompt_guard:
  enabled: true
  use_modernbert: false
  model_id: "models/lora_jailbreak_classifier_bert-base-uncased_model"
  threshold: 0.7
  use_cpu: true
  jailbreak_mapping_path: "models/lora_jailbreak_classifier_bert-base-uncased_model/jailbreak_type_mapping.json"

# Semantic Cache - enables cache.lookup span
semantic_cache:
  enabled: true
  backend_type: "memory"
  similarity_threshold: 0.8
  max_entries: 1000
  ttl_seconds: 3600
  eviction_policy: "fifo"
  use_hnsw: true
  hnsw_m: 16
  hnsw_ef_construction: 200
  embedding_model: "bert"

# Categories for classification
categories:
  - name: math
    description: "Mathematics and quantitative reasoning"
    mmlu_categories: ["math"]
  - name: computer_science
    description: "Computer science and programming"
    mmlu_categories: ["computer_science"]
  - name: other
    description: "General knowledge and miscellaneous topics"
    mmlu_categories: ["other"]

# Decisions - enables routing.decision and system_prompt.injection spans
strategy: "priority"
decisions:
  - name: "math_decision"
    description: "Mathematics and quantitative reasoning"
    priority: 100
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "math"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: true
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a mathematics expert. Provide step-by-step solutions."
      - type: "pii"
        configuration:
          enabled: true
          pii_types_allowed: []
      - type: "semantic-cache"
        configuration:
          enabled: true
          similarity_threshold: 0.85

  - name: "cs_decision"
    description: "Computer science and programming"
    priority: 100
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "computer_science"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: false
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a computer science expert. Provide clear code examples."
      - type: "pii"
        configuration:
          enabled: true
          pii_types_allowed: []

  - name: "general_decision"
    description: "General knowledge fallback"
    priority: 50
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "other"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: false
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a helpful assistant."
      - type: "semantic-cache"
        configuration:
          enabled: true
          similarity_threshold: 0.75

default_model: "Qwen/Qwen2.5-7B-Instruct"

# Reasoning family configuration
reasoning_families:
  qwen3:
    type: "chat_template_kwargs"
    parameter: "enable_thinking"

default_reasoning_effort: high

# Observability - all tracing enabled
observability:
  metrics:
    enabled: true
  tracing:
    enabled: true
    provider: "opentelemetry"
    exporter:
      type: "otlp"
      endpoint: "localhost:4317"
      insecure: true
    sampling:
      type: "always_on"
      rate: 1.0
    resource:
      service_name: "semantic-router"
      service_version: "demo"
      deployment_environment: "local"

Note that the vLLM engine side should have OTEL enabled.

OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317 \
OTEL_SERVICE_NAME=vllm \
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-7B-Instruct \
--otlp-traces-endpoint http://localhost:4317

FIX #853 #853

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

netlify · 2025-12-17T08:24:45Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`af35b14`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/6942690b10e17a0008139cfd
😎 Deploy Preview	https://deploy-preview-852--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…tributed tracing Signed-off-by: Fang Han <fhan0520@gmail.com>

github-actions · 2025-12-17T08:41:31Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/extproc/processor_req_body.go
src/semantic-router/pkg/extproc/processor_req_header.go
src/semantic-router/pkg/extproc/processor_res_header.go
src/semantic-router/pkg/observability/tracing/tracing_test.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Xunzhuo

LGTM thanks

HanFa requested review from Xunzhuo, rootfs and wangchen615 as code owners December 17, 2025 08:24

feat: add upstream request span and trace context propagation for dis…

af35b14

…tributed tracing Signed-off-by: Fang Han <fhan0520@gmail.com>

HanFa force-pushed the main branch from a0064e1 to af35b14 Compare December 17, 2025 08:25

github-actions Bot assigned rootfs, wangchen615 and Xunzhuo Dec 17, 2025

Xunzhuo approved these changes Dec 17, 2025

View reviewed changes

Xunzhuo merged commit 787144d into vllm-project:main Dec 17, 2025
33 of 34 checks passed

HanFa mentioned this pull request Dec 20, 2025

[Feat] Production Stack Router: Add OpenTelemetry tracing support with W3C context propagation vllm-project/production-stack#772

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: add upstream request span and trace context propagation for distributed tracing#852

[Feat]: add upstream request span and trace context propagation for distributed tracing#852
Xunzhuo merged 1 commit into
vllm-project:mainfrom
HanFa:main

HanFa commented Dec 17, 2025 •

edited

Loading

Uh oh!

netlify Bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

Xunzhuo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HanFa commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

netlify Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

🎉 Thanks for your contributions!

Uh oh!

Xunzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HanFa commented Dec 17, 2025 •

edited

Loading

netlify Bot commented Dec 17, 2025 •

edited

Loading

github-actions Bot commented Dec 17, 2025 •

edited

Loading

📁 `src`