fix(test): resolve A2A guardrail test regressions from #20619#20625
fix(test): resolve A2A guardrail test regressions from #20619#20625shin-bot-litellm wants to merge 2 commits intomainfrom
Conversation
Update tests to match new implementation:
1. test_invoke_agent_a2a_adds_litellm_data:
- Mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic
instead of add_litellm_data_to_request (which is no longer used)
2. test_process_input_messages_* (MCP guardrail handler):
- Handler now processes mcp_tool_name/mcp_arguments instead of messages
- Passes GenericGuardrailAPIInputs(tools=[...]) to guardrails
- Updated tests to verify new tool-based guardrail behavior
- Renamed test_process_input_messages_skips_when_no_messages to
test_process_input_messages_skips_when_no_tool_name
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile OverviewGreptile Summary
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py | Updates MCP guardrail handler tests to assert tool-based inputs; current assertions treat GenericGuardrailAPIInputs.tools items as dicts, which will fail when they are Pydantic/typed objects. |
| tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py | Updates A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic; test is somewhat brittle because it doesn’t assert the call args / use of returned (data, logging_obj) in the downstream asend_message call. |
Sequence Diagram
sequenceDiagram
participant T as test_a2a_endpoints.py
participant E as invoke_agent_a2a()
participant P as ProxyBaseLLMRequestProcessing
participant G as common_processing_pre_call_logic()
participant A as asend_message()
T->>E: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
E->>P: "ProxyBaseLLMRequestProcessing(data=body)"
P->>G: "common_processing_pre_call_logic(request, settings, auth, ...)"
G-->>P: "(data, logging_obj)"
P-->>E: "(data, logging_obj)"
E->>A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"
participant M as test_mcp_guardrail_handler.py
participant H as MCPGuardrailTranslationHandler
participant C as CustomGuardrail.apply_guardrail()
M->>H: "process_input_messages(data)"
H->>H: "Build MCPTool(name/description)"
H->>H: "transform_mcp_tool_to_openai_tool(mcp_tool)"
H->>C: "apply_guardrail(inputs=GenericGuardrailAPIInputs(tools=[tool_def]), request_data=data)"
C-->>H: "guardrail result (ignored)"
H-->>M: "return original data"
| # Guardrail should receive tool definition in inputs | ||
| assert "tools" in guardrail.last_inputs | ||
| assert len(guardrail.last_inputs["tools"]) == 1 | ||
|
|
||
| tool = guardrail.last_inputs["tools"][0] | ||
| assert tool["type"] == "function" | ||
| assert tool["function"]["name"] == "weather" | ||
| assert tool["function"]["description"] == "Get weather for a city" |
There was a problem hiding this comment.
Wrong type for tools
MCPGuardrailTranslationHandler.process_input_messages() builds GenericGuardrailAPIInputs(tools=[...]), so guardrail.last_inputs["tools"][0] is a ChatCompletionToolParam/Pydantic model (attribute access), not a dict. Indexing like tool["type"]/tool["function"]["name"] will raise at runtime when the handler passes real objects.
Consider asserting via attribute access (e.g. tool.type, tool.function.name) or converting to a dict via .model_dump() first before indexing.
| async def mock_common_processing( | ||
| request, | ||
| general_settings, | ||
| user_api_key_dict, | ||
| proxy_logging_obj, | ||
| proxy_config, | ||
| route_type, | ||
| version, | ||
| ): | ||
| # Get the data from the processor instance via closure | ||
| data = mock_processor_instance.data | ||
| # Simulate what common_processing_pre_call_logic does | ||
| data["proxy_server_request"] = { | ||
| "url": "http://localhost:4000/a2a/test-agent", | ||
| "method": "POST", | ||
| "headers": {}, | ||
| "body": dict(data), | ||
| } | ||
| captured_data.update(data) | ||
| return data | ||
| return data, MagicMock() # Returns (data, logging_obj) | ||
|
|
There was a problem hiding this comment.
Mock return shape mismatch
invoke_agent_a2a() expects common_processing_pre_call_logic to return (data, logging_obj) (see a2a_endpoints.py:300-308). In this test, mock_common_processing() returns data, MagicMock() but the mocked processor is a plain MagicMock() and your side_effect reads mock_processor_instance.data instead of using the actual call args; if ProxyBaseLLMRequestProcessing ever changes to pass/clone data internally, this test can pass while behavior breaks.
Prefer validating the actual arguments passed to common_processing_pre_call_logic (request/general_settings/user_api_key_dict/route_type/version) and assert the returned data is what asend_message() receives via metadata/proxy_server_request. Right now the test doesn’t prove the endpoint uses the returned data tuple correctly.
- Use dict() conversion for TypedDict access (safer for future changes) - Use .get() for safer dict key access - Improve A2A test to validate call arguments more explicitly - Add comments explaining TypedDict vs Pydantic model distinction
|
@greptile can you review again Addressed your feedback:
|
Greptile OverviewGreptile Summary
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py | Updates MCP guardrail handler tests to assert tool-definition inputs (tools=[...]) and alias/missing-args behavior; assertions still depend on tools being dict-castable but otherwise align with handler. |
| tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py | Reworks A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic, but currently asserts the wrong route_type compared to implementation, causing a test failure. |
Sequence Diagram
sequenceDiagram
participant Test as A2A test
participant Endpoint as invoke_agent_a2a
participant Proc as ProxyBaseLLMRequestProcessing
participant A2A as asend_message
Test->>Endpoint: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
Endpoint->>Proc: "__init__(data=body+model+provider)"
Endpoint->>Proc: "common_processing_pre_call_logic(..., route_type, version)"
Proc-->>Endpoint: "(data, logging_obj)"
Endpoint->>A2A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"
A2A-->>Endpoint: "response"
Endpoint-->>Test: "JSONResponse(response.model_dump())"
participant MCPTest as MCP tests
participant MCPHandler as MCPGuardrailTranslationHandler
participant Guardrail as apply_guardrail
MCPTest->>MCPHandler: "process_input_messages(data, guardrail)"
alt "mcp_tool_name missing"
MCPHandler-->>MCPTest: "return data"
else "tool name present"
MCPHandler->>Guardrail: "apply_guardrail(inputs.tools=[tool_def], request_data=data)"
Guardrail-->>MCPHandler: "result"
MCPHandler-->>MCPTest: "return data"
end
| async def mock_common_processing( | ||
| request, | ||
| general_settings, | ||
| user_api_key_dict, | ||
| proxy_logging_obj, | ||
| proxy_config, | ||
| route_type, | ||
| version, | ||
| ): | ||
| # Capture the actual arguments passed to common_processing_pre_call_logic | ||
| processing_call_args["request"] = request | ||
| processing_call_args["general_settings"] = general_settings | ||
| processing_call_args["user_api_key_dict"] = user_api_key_dict | ||
| processing_call_args["route_type"] = route_type | ||
| processing_call_args["version"] = version | ||
|
|
||
| # Get the data from the processor instance | ||
| data = mock_processor_instance.data | ||
|
|
||
| # Simulate what common_processing_pre_call_logic does | ||
| data["proxy_server_request"] = { | ||
| "url": "http://localhost:4000/a2a/test-agent", | ||
| "method": "POST", | ||
| "headers": {}, | ||
| "body": dict(data), | ||
| } | ||
| captured_data.update(data) | ||
| return data | ||
|
|
||
| # Store the returned data to verify endpoint uses it | ||
| returned_data.update(data) | ||
| mock_logging_obj = MagicMock() | ||
| return data, mock_logging_obj |
There was a problem hiding this comment.
Wrong route_type asserted
invoke_agent_a2a() passes route_type="asend_message" into common_processing_pre_call_logic (litellm/proxy/agent_endpoints/a2a_endpoints.py:300-308). This test asserts route_type == "a2a_request", so it will fail (or force an incorrect behavior) even though the endpoint is correct. Update the expectation to match the actual route type used by the endpoint.
|
Automated patch bundle from next-100 unresolved backlog expansion.\nGenerated due limited direct branch-write access; please apply/cherry-pick minimal edits below.\n\n## PR #20625 — Unresolved thread summary
Minimal patch proposals
|
Regression Fix
Failing Job:
litellm_proxy_unit_testing_part2Caused By: PR #20619
Author: @krrishdholakia
What Broke
PR #20619 ("Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support") changed the implementation of:
A2A endpoints (
a2a_endpoints.py):add_litellm_data_to_requestto usingProxyBaseLLMRequestProcessing.common_processing_pre_call_logictest_invoke_agent_a2a_adds_litellm_datawas mocking the old function pathMCP guardrail handler (
handler.py):messagesarray to processingmcp_tool_name/mcp_argumentsGenericGuardrailAPIInputs(tools=[...])) instead of text contenttest_process_input_messages_skips_when_no_messagesandtest_process_input_messages_updates_contentexpected the old message-based behaviorThis Fix
Updates the test expectations to match the new implementation:
test_invoke_agent_a2a_adds_litellm_data:ProxyBaseLLMRequestProcessingand itscommon_processing_pre_call_logicmethodMCP guardrail handler tests:
test_process_input_messages_skips_when_no_messages→test_process_input_messages_skips_when_no_tool_namemcp_tool_namecorrectlynamealias supportTesting
Tests verify:
ProxyBaseLLMRequestProcessingmcp_tool_nameis presentmcp_tool_nameis missing