Bug: Response model field does not match routing decision

# Bug: Response `model` field does not match routing decision

## Summary

The semantic router correctly classifies prompts and routes requests to the appropriate model endpoint (Model-A or Model-B), but the response JSON `model` field does not reflect the router's decision. Instead, it shows the model name from the vLLM endpoint that happened to serve the request.

## Impact

- **Severity:** Medium
- **Component:** ExtProc response handling
- **User Impact:** API consumers cannot determine which model was actually selected by the semantic router
- Users must inspect custom headers (`x-vsr-selected-model`) or logs instead of the standard `model` field

## Steps to Reproduce

1. **Deploy semantic router** with Model-A and Model-B configured
2. **Configure routing** with categories that should route to Model-A (e.g., economics with score 1.0)
3. **Send a request** that should route to Model-A:

```bash
curl -X POST "http://<envoy-url>/v1/chat/completions" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain marginal utility in economics"}],
    "max_tokens": 20
  }'
```

4. **Check the response** - the `model` field shows `Model-B` (incorrect)
5. **Check the logs** - routing logs show `"selected_model":"Model-A"` (correct)

## Expected Behavior

The response JSON should contain:
```json
{
  "model": "Model-A",
  "choices": [...],
  "usage": {...}
}
```

The `model` field should match what the semantic router decided, as shown in:
- Routing decision logs: `"selected_model":"Model-A"`
- Custom header: `x-vsr-selected-model: Model-A`
- Context: `ctx.RequestModel = "Model-A"`

## Actual Behavior

The response JSON contains:
```json
{
  "model": "Model-B",
  "choices": [...],
  "usage": {...}
}
```

The `model` field comes from the vLLM endpoint's response and does not reflect the router's decision.

## Evidence from Logs

```json
// Classification (correct)
{"msg":"Classified as category: economics (mmlu=economics)"}

// Model selection (correct)
{"msg":"Selected model Model-A for category economics with score 1.0000"}

// Routing decision (correct)
{"msg":"Routing to model: Model-A"}
{"msg":"routing_decision","selected_model":"Model-A","category":"economics","selected_endpoint":"127.0.0.1:8000"}
```

Yet the API response returns `"model": "Model-B"`

## Root Cause Analysis

**File:** `src/semantic-router/pkg/extproc/response_handler.go`
**Function:** `handleResponseBody()` (lines 182-296)

### Current Code Flow

1. Response body is received from vLLM endpoint (line 186)
2. Response is parsed into `openai.ChatCompletion` struct (line 215-216)
3. Usage statistics are extracted for metrics (lines 220-270)
4. Cache is updated with original response (lines 273-282)
5. **Original response is returned unchanged** (lines 284-293)

```go
// Line 284-293: Current code
response := &ext_proc.ProcessingResponse{
    Response: &ext_proc.ProcessingResponse_ResponseBody{
        ResponseBody: &ext_proc.BodyResponse{
            Response: &ext_proc.CommonResponse{
                Status: ext_proc.CommonResponse_CONTINUE,
            },
        },
    },
}
return response, nil
```

### The Problem

The `parsed.Model` field from the vLLM endpoint response is never updated to match `ctx.RequestModel` (which contains the router's decision).

## Proposed Fix

After parsing the response (line 216), update the model field and re-marshal:

```go
// Parse tokens from the response JSON using OpenAI SDK types
var parsed openai.ChatCompletion
if err := json.Unmarshal(responseBody, &parsed); err != nil {
    observability.Errorf("Error parsing tokens from response: %v", err)
    metrics.RecordRequestError(ctx.RequestModel, "parse_error")
}

// FIX: Update model field to match routing decision
if ctx.RequestModel != "" && parsed.Model != ctx.RequestModel {
    observability.Infof("Updating response model field from '%s' to '%s'", parsed.Model, ctx.RequestModel)
    parsed.Model = ctx.RequestModel

    // Re-marshal with updated model field
    modifiedBody, err := json.Marshal(parsed)
    if err != nil {
        observability.Errorf("Error re-marshaling response with updated model: %v", err)
        // Fall back to original response
    } else {
        responseBody = modifiedBody
    }
}

// Continue with existing token extraction...
promptTokens := int(parsed.Usage.PromptTokens)
completionTokens := int(parsed.Usage.CompletionTokens)
// ... rest of the code
```

Then at the end, return the modified response body:

```go
// Return the modified response body
response := &ext_proc.ProcessingResponse{
    Response: &ext_proc.ProcessingResponse_ResponseBody{
        ResponseBody: &ext_proc.BodyResponse{
            Response: &ext_proc.CommonResponse{
                Status: ext_proc.CommonResponse_CONTINUE,
                BodyMutation: &ext_proc.BodyMutation{
                    Mutation: &ext_proc.BodyMutation_Body{
                        Body: responseBody,
                    },
                },
            },
        },
    },
}
return response, nil
```

## Testing Strategy

### Unit Tests

1. Test response model field rewriting when routing decision differs from endpoint response
2. Test fallback behavior when JSON unmarshaling/marshaling fails
3. Test that non-JSON responses are handled gracefully

### Integration Tests

1. Send request that routes to Model-A, verify response contains `"model": "Model-A"`
2. Send request that routes to Model-B, verify response contains `"model": "Model-B"`
3. Verify custom header `x-vsr-selected-model` matches response `model` field
4. Test with streaming responses (should not modify SSE chunks)

### E2E Tests

1. Deploy with real vLLM endpoints
2. Test all categories route to correct models
3. Verify response model field matches routing logs
4. Verify cached responses also have correct model field

## Additional Considerations

1. **Streaming responses:** The fix should only apply to non-streaming responses (already handled by `ctx.IsStreamingResponse` check at line 190)
2. **Cache consistency:** Cached responses should also have the correct model field
3. **Performance:** JSON re-marshaling adds minimal overhead compared to model inference time
4. **Backwards compatibility:** This is a bug fix that makes the API more correct, not a breaking change

## Related Code

- Response headers already include `x-vsr-selected-model` (line 80-88)
- Request context tracks `ctx.RequestModel` throughout routing (set at line 952 in request_handler.go)
- Metrics already use `ctx.RequestModel` for tracking (lines 224-270)

## Verification

After the fix:

```bash
# Send request
curl -X POST "http://<envoy-url>/v1/chat/completions" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain marginal utility in economics"}],
    "max_tokens": 20
  }' | jq '.model'

# Expected output: "Model-A"
# Header should also show: x-vsr-selected-model: Model-A
```

---

**Assignee:** @yovadia
**Labels:** bug, extproc, response-handling
**Priority:** Medium
**Milestone:** Next Release


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Response model field does not match routing decision #430

Bug: Response `model` field does not match routing decision

Summary

Impact

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence from Logs

Root Cause Analysis

Current Code Flow

The Problem

Proposed Fix

Testing Strategy

Unit Tests

Integration Tests

E2E Tests

Additional Considerations

Related Code

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Response model field does not match routing decision #430

Description

Bug: Response model field does not match routing decision

Summary

Impact

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence from Logs

Root Cause Analysis

Current Code Flow

The Problem

Proposed Fix

Testing Strategy

Unit Tests

Integration Tests

E2E Tests

Additional Considerations

Related Code

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: Response `model` field does not match routing decision