Skip to content

[Feat]: add upstream request span and trace context propagation for distributed tracing#852

Merged
Xunzhuo merged 1 commit into
vllm-project:mainfrom
HanFa:main
Dec 17, 2025
Merged

[Feat]: add upstream request span and trace context propagation for distributed tracing#852
Xunzhuo merged 1 commit into
vllm-project:mainfrom
HanFa:main

Conversation

@HanFa
Copy link
Copy Markdown
Contributor

@HanFa HanFa commented Dec 17, 2025

This change implements the semantic_router.upstream.request span that was previously defined but never used. The span tracks the full lifecycle of requests forwarded to vLLM backends, starting when the routing decision is made and ending when response headers are received. Key attributes including model name, endpoint address, and HTTP status code are recorded on the span.

Additionally, this change enables distributed trace correlation between the semantic router and vLLM by injecting W3C Trace Context headers https://www.w3.org/TR/trace-context/ (traceparent, tracestate) into upstream requests.

An example trace screenshot is

image

with corresponding config

server:
  port: 50051
  http_port: 8080
  metrics_port: 9190

# Embedding model for semantic cache and classification
bert_model:
  model_id: models/all-MiniLM-L12-v2
  threshold: 0.6
  use_cpu: true

# vLLM Endpoints
vllm_endpoints:
  - name: "local-gpu"
    address: "127.0.0.1"
    port: 8000
    weight: 1

# Model configuration
model_config:
  "auto":
    preferred_endpoints: ["local-gpu"]
  "Qwen/Qwen2.5-7B-Instruct":
    reasoning_family: "qwen3"
    preferred_endpoints: ["local-gpu"]

# Classifier - enables classification span
classifier:
  category_model:
    model_id: "models/category_classifier_modernbert-base_model"
    threshold: 0.6
    use_cpu: true
    category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
  pii_model:
    model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
    use_modernbert: false
    threshold: 0.7
    use_cpu: true
    pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"

# Prompt Guard - enables jailbreak_detection span
prompt_guard:
  enabled: true
  use_modernbert: false
  model_id: "models/lora_jailbreak_classifier_bert-base-uncased_model"
  threshold: 0.7
  use_cpu: true
  jailbreak_mapping_path: "models/lora_jailbreak_classifier_bert-base-uncased_model/jailbreak_type_mapping.json"

# Semantic Cache - enables cache.lookup span
semantic_cache:
  enabled: true
  backend_type: "memory"
  similarity_threshold: 0.8
  max_entries: 1000
  ttl_seconds: 3600
  eviction_policy: "fifo"
  use_hnsw: true
  hnsw_m: 16
  hnsw_ef_construction: 200
  embedding_model: "bert"

# Categories for classification
categories:
  - name: math
    description: "Mathematics and quantitative reasoning"
    mmlu_categories: ["math"]
  - name: computer_science
    description: "Computer science and programming"
    mmlu_categories: ["computer_science"]
  - name: other
    description: "General knowledge and miscellaneous topics"
    mmlu_categories: ["other"]

# Decisions - enables routing.decision and system_prompt.injection spans
strategy: "priority"
decisions:
  - name: "math_decision"
    description: "Mathematics and quantitative reasoning"
    priority: 100
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "math"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: true
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a mathematics expert. Provide step-by-step solutions."
      - type: "pii"
        configuration:
          enabled: true
          pii_types_allowed: []
      - type: "semantic-cache"
        configuration:
          enabled: true
          similarity_threshold: 0.85

  - name: "cs_decision"
    description: "Computer science and programming"
    priority: 100
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "computer_science"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: false
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a computer science expert. Provide clear code examples."
      - type: "pii"
        configuration:
          enabled: true
          pii_types_allowed: []

  - name: "general_decision"
    description: "General knowledge fallback"
    priority: 50
    rules:
      operator: "AND"
      conditions:
        - type: "domain"
          name: "other"
    modelRefs:
      - model: "Qwen/Qwen2.5-7B-Instruct"
        use_reasoning: false
    plugins:
      - type: "system_prompt"
        configuration:
          system_prompt: "You are a helpful assistant."
      - type: "semantic-cache"
        configuration:
          enabled: true
          similarity_threshold: 0.75

default_model: "Qwen/Qwen2.5-7B-Instruct"

# Reasoning family configuration
reasoning_families:
  qwen3:
    type: "chat_template_kwargs"
    parameter: "enable_thinking"

default_reasoning_effort: high

# Observability - all tracing enabled
observability:
  metrics:
    enabled: true
  tracing:
    enabled: true
    provider: "opentelemetry"
    exporter:
      type: "otlp"
      endpoint: "localhost:4317"
      insecure: true
    sampling:
      type: "always_on"
      rate: 1.0
    resource:
      service_name: "semantic-router"
      service_version: "demo"
      deployment_environment: "local"

Note that the vLLM engine side should have OTEL enabled.

OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4317 \
OTEL_SERVICE_NAME=vllm \
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-7B-Instruct \
--otlp-traces-endpoint http://localhost:4317

FIX #853 #853

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 17, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit af35b14
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/6942690b10e17a0008139cfd
😎 Deploy Preview https://deploy-preview-852--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…tributed tracing

Signed-off-by: Fang Han <fhan0520@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 17, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/extproc/processor_req_body.go
  • src/semantic-router/pkg/extproc/processor_req_header.go
  • src/semantic-router/pkg/extproc/processor_res_header.go
  • src/semantic-router/pkg/observability/tracing/tracing_test.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Copy link
Copy Markdown
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@Xunzhuo Xunzhuo merged commit 787144d into vllm-project:main Dec 17, 2025
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: Add distributed tracing correlation between semantic router and vLLM engine

4 participants