fix(test): resolve A2A guardrail test regressions from #20619 by shin-bot-litellm · Pull Request #20625 · BerriAI/litellm

shin-bot-litellm · 2026-02-07T02:20:56Z

Regression Fix

Failing Job: litellm_proxy_unit_testing_part2
Caused By: PR #20619
Author: @krrishdholakia

What Broke

PR #20619 ("Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support") changed the implementation of:

A2A endpoints (a2a_endpoints.py):
- Changed from using add_litellm_data_to_request to using ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic
- Test test_invoke_agent_a2a_adds_litellm_data was mocking the old function path
MCP guardrail handler (handler.py):
- Changed from processing messages array to processing mcp_tool_name/mcp_arguments
- Now passes tool definitions (GenericGuardrailAPIInputs(tools=[...])) instead of text content
- Tests test_process_input_messages_skips_when_no_messages and test_process_input_messages_updates_content expected the old message-based behavior

This Fix

Updates the test expectations to match the new implementation:

test_invoke_agent_a2a_adds_litellm_data:
- Now mocks ProxyBaseLLMRequestProcessing and its common_processing_pre_call_logic method
- Verifies the processor is instantiated and called correctly
MCP guardrail handler tests:
- Renamed test_process_input_messages_skips_when_no_messages → test_process_input_messages_skips_when_no_tool_name
- Updated to verify the handler processes mcp_tool_name correctly
- Added test for name alias support
- Updated mock guardrail to work with tool-based inputs

Testing

Tests verify:

✅ A2A endpoint integrates with ProxyBaseLLMRequestProcessing
✅ MCP handler calls guardrail when mcp_tool_name is present
✅ MCP handler skips guardrail when mcp_tool_name is missing
✅ MCP handler passes tool definition to guardrail

Update tests to match new implementation: 1. test_invoke_agent_a2a_adds_litellm_data: - Mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic instead of add_litellm_data_to_request (which is no longer used) 2. test_process_input_messages_* (MCP guardrail handler): - Handler now processes mcp_tool_name/mcp_arguments instead of messages - Passes GenericGuardrailAPIInputs(tools=[...]) to guardrails - Updated tests to verify new tool-based guardrail behavior - Renamed test_process_input_messages_skips_when_no_messages to test_process_input_messages_skips_when_no_tool_name

vercel · 2026-02-07T02:21:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 7, 2026 2:43am

CLAassistant · 2026-02-07T02:21:02Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-02-07T02:23:10Z

Greptile Overview

Greptile Summary

Updates two unit test suites to match the post-Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support #20619 guardrail plumbing: A2A now uses ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic, and MCP guardrail translation now sends tool definitions (GenericGuardrailAPIInputs(tools=[...])) rather than message text.
A2A test swaps mocking from add_litellm_data_to_request to the request-processor class/method and checks it is invoked.
MCP handler tests are rewritten around mcp_tool_name/mcp_arguments and add coverage for name/arguments aliases.
One test issue remains: MCP assertions treat tool objects as dicts and will fail when tools[] items are typed/Pydantic objects; A2A assertions are also a bit weak (don’t validate call args / propagation of returned (data, logging_obj) into asend_message).

Confidence Score: 3/5

This PR is likely safe once the failing MCP test assertions are corrected to match the actual tool object types passed to guardrails.
Changes are test-only and align with the updated production behavior, but at least one updated test appears to make incorrect assumptions about GenericGuardrailAPIInputs.tools element types (dict vs typed object), which can cause the unit job to keep failing or become flaky across pydantic versions.
tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py

Important Files Changed

Filename	Overview
tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py	Updates MCP guardrail handler tests to assert tool-based inputs; current assertions treat `GenericGuardrailAPIInputs.tools` items as dicts, which will fail when they are Pydantic/typed objects.
tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py	Updates A2A endpoint test to mock `ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic`; test is somewhat brittle because it doesn’t assert the call args / use of returned `(data, logging_obj)` in the downstream `asend_message` call.

Sequence Diagram

sequenceDiagram
    participant T as test_a2a_endpoints.py
    participant E as invoke_agent_a2a()
    participant P as ProxyBaseLLMRequestProcessing
    participant G as common_processing_pre_call_logic()
    participant A as asend_message()

    T->>E: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
    E->>P: "ProxyBaseLLMRequestProcessing(data=body)"
    P->>G: "common_processing_pre_call_logic(request, settings, auth, ...)"
    G-->>P: "(data, logging_obj)"
    P-->>E: "(data, logging_obj)"
    E->>A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"

    participant M as test_mcp_guardrail_handler.py
    participant H as MCPGuardrailTranslationHandler
    participant C as CustomGuardrail.apply_guardrail()

    M->>H: "process_input_messages(data)"
    H->>H: "Build MCPTool(name/description)"
    H->>H: "transform_mcp_tool_to_openai_tool(mcp_tool)"
    H->>C: "apply_guardrail(inputs=GenericGuardrailAPIInputs(tools=[tool_def]), request_data=data)"
    C-->>H: "guardrail result (ignored)"
    H-->>M: "return original data"

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T02:23:14Z

...t_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py

+    # Guardrail should receive tool definition in inputs
+    assert "tools" in guardrail.last_inputs
+    assert len(guardrail.last_inputs["tools"]) == 1
+
+    tool = guardrail.last_inputs["tools"][0]
+    assert tool["type"] == "function"
+    assert tool["function"]["name"] == "weather"
+    assert tool["function"]["description"] == "Get weather for a city"


Wrong type for tools

MCPGuardrailTranslationHandler.process_input_messages() builds GenericGuardrailAPIInputs(tools=[...]), so guardrail.last_inputs["tools"][0] is a ChatCompletionToolParam/Pydantic model (attribute access), not a dict. Indexing like tool["type"]/tool["function"]["name"] will raise at runtime when the handler passes real objects.

Consider asserting via attribute access (e.g. tool.type, tool.function.name) or converting to a dict via .model_dump() first before indexing.

greptile-apps · 2026-02-07T02:23:15Z

tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py

+    async def mock_common_processing(
+        request,
+        general_settings,
+        user_api_key_dict,
+        proxy_logging_obj,
+        proxy_config,
+        route_type,
+        version,
+    ):
+        # Get the data from the processor instance via closure
+        data = mock_processor_instance.data
+        # Simulate what common_processing_pre_call_logic does
        data["proxy_server_request"] = {
            "url": "http://localhost:4000/a2a/test-agent",
            "method": "POST",
            "headers": {},
            "body": dict(data),
        }
        captured_data.update(data)
-        return data
+        return data, MagicMock()  # Returns (data, logging_obj)



Mock return shape mismatch

invoke_agent_a2a() expects common_processing_pre_call_logic to return (data, logging_obj) (see a2a_endpoints.py:300-308). In this test, mock_common_processing() returns data, MagicMock() but the mocked processor is a plain MagicMock() and your side_effect reads mock_processor_instance.data instead of using the actual call args; if ProxyBaseLLMRequestProcessing ever changes to pass/clone data internally, this test can pass while behavior breaks.

Prefer validating the actual arguments passed to common_processing_pre_call_logic (request/general_settings/user_api_key_dict/route_type/version) and assert the returned data is what asend_message() receives via metadata/proxy_server_request. Right now the test doesn’t prove the endpoint uses the returned data tuple correctly.

- Use dict() conversion for TypedDict access (safer for future changes) - Use .get() for safer dict key access - Improve A2A test to validate call arguments more explicitly - Add comments explaining TypedDict vs Pydantic model distinction

shin-bot-litellm · 2026-02-07T02:42:28Z

@greptile can you review again

Addressed your feedback:

test_mcp_guardrail_handler.py: Added dict() conversion for TypedDict access to be defensive against future changes. Using .get() for safer key access.
test_a2a_endpoints.py: Improved test to explicitly capture and validate the arguments passed to common_processing_pre_call_logic, including route_type and version. Also validates that the returned data is passed downstream to asend_message.

greptile-apps · 2026-02-07T02:44:23Z

Greptile Overview

Greptile Summary

Updates MCP guardrail translation tests to reflect the new tool-based GenericGuardrailAPIInputs(tools=[...]) behavior (including name/arguments alias support and missing-args handling).
Updates A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic instead of the removed add_litellm_data_to_request call path.
Overall intent is to align unit tests with the refactors introduced in Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support #20619 to fix regressions in litellm_proxy_unit_testing_part2.

Confidence Score: 3/5

This PR is mostly safe to merge once the failing A2A unit test expectation is corrected.
Changes are limited to test updates, but test_invoke_agent_a2a_adds_litellm_data currently asserts a route_type value that disagrees with the real implementation, which will break CI.
tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py

Important Files Changed

Filename	Overview
tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py	Updates MCP guardrail handler tests to assert tool-definition inputs (`tools=[...]`) and alias/missing-args behavior; assertions still depend on tools being dict-castable but otherwise align with handler.
tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py	Reworks A2A endpoint test to mock `ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic`, but currently asserts the wrong `route_type` compared to implementation, causing a test failure.

Sequence Diagram

sequenceDiagram
    participant Test as A2A test
    participant Endpoint as invoke_agent_a2a
    participant Proc as ProxyBaseLLMRequestProcessing
    participant A2A as asend_message

    Test->>Endpoint: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
    Endpoint->>Proc: "__init__(data=body+model+provider)"
    Endpoint->>Proc: "common_processing_pre_call_logic(..., route_type, version)"
    Proc-->>Endpoint: "(data, logging_obj)"
    Endpoint->>A2A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"
    A2A-->>Endpoint: "response"
    Endpoint-->>Test: "JSONResponse(response.model_dump())"

    participant MCPTest as MCP tests
    participant MCPHandler as MCPGuardrailTranslationHandler
    participant Guardrail as apply_guardrail

    MCPTest->>MCPHandler: "process_input_messages(data, guardrail)"
    alt "mcp_tool_name missing"
        MCPHandler-->>MCPTest: "return data"
    else "tool name present"
        MCPHandler->>Guardrail: "apply_guardrail(inputs.tools=[tool_def], request_data=data)"
        Guardrail-->>MCPHandler: "result"
        MCPHandler-->>MCPTest: "return data"
    end

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-07T02:44:28Z

tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py

+    async def mock_common_processing(
+        request,
+        general_settings,
+        user_api_key_dict,
+        proxy_logging_obj,
+        proxy_config,
+        route_type,
+        version,
+    ):
+        # Capture the actual arguments passed to common_processing_pre_call_logic
+        processing_call_args["request"] = request
+        processing_call_args["general_settings"] = general_settings
+        processing_call_args["user_api_key_dict"] = user_api_key_dict
+        processing_call_args["route_type"] = route_type
+        processing_call_args["version"] = version
+
+        # Get the data from the processor instance
+        data = mock_processor_instance.data
+
+        # Simulate what common_processing_pre_call_logic does
        data["proxy_server_request"] = {
            "url": "http://localhost:4000/a2a/test-agent",
            "method": "POST",
            "headers": {},
            "body": dict(data),
        }
-        captured_data.update(data)
-        return data
+
+        # Store the returned data to verify endpoint uses it
+        returned_data.update(data)
+        mock_logging_obj = MagicMock()
+        return data, mock_logging_obj


Wrong route_type asserted

invoke_agent_a2a() passes route_type="asend_message" into common_processing_pre_call_logic (litellm/proxy/agent_endpoints/a2a_endpoints.py:300-308). This test asserts route_type == "a2a_request", so it will fail (or force an incorrect behavior) even though the endpoint is correct. Update the expectation to match the actual route type used by the endpoint.

giulio-leone · 2026-03-02T14:36:28Z

Automated patch bundle from next-100 unresolved backlog expansion.\nGenerated due limited direct branch-write access; please apply/cherry-pick minimal edits below.\n\n## PR #20625 — litellm_fix_a2a_test_regression (3 unresolved)

Unresolved thread summary

T1 tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py:57 — Wrong type for tools
T2 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:57 — Mock return shape mismatch
T3 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:56 — Wrong route_type asserted

Minimal patch proposals

T1 tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py:57
- Edit steps:
  1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
  2. Keep the test deterministic and verify it fails before / passes after the patch intent.
T2 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:57
- Edit steps:
  1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
  2. Keep the test deterministic and verify it fails before / passes after the patch intent.
T3 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:56
- Edit steps:
  1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
  2. Keep the test deterministic and verify it fails before / passes after the patch intent.

vercel bot deployed to Preview February 7, 2026 02:22 View deployment

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

fix(test): address Greptile review comments

c23c041

- Use dict() conversion for TypedDict access (safer for future changes) - Use .get() for safer dict key access - Improve A2A test to validate call arguments more explicitly - Add comments explaining TypedDict vs Pydantic model distinction

vercel bot deployed to Preview February 7, 2026 02:43 View deployment

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(test): resolve A2A guardrail test regressions from #20619#20625

fix(test): resolve A2A guardrail test regressions from #20619#20625
shin-bot-litellm wants to merge 2 commits intomainfrom
litellm_fix_a2a_test_regression

shin-bot-litellm commented Feb 7, 2026

Uh oh!

vercel bot commented Feb 7, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

shin-bot-litellm commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 7, 2026

Uh oh!

giulio-leone commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

shin-bot-litellm commented Feb 7, 2026

Regression Fix

What Broke

This Fix

Testing

Uh oh!

vercel bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

shin-bot-litellm commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone commented Mar 2, 2026

Unresolved thread summary

Minimal patch proposals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Feb 7, 2026 •

edited

Loading