Skip to content

fix(test): resolve A2A guardrail test regressions from #20619#20625

Open
shin-bot-litellm wants to merge 2 commits intomainfrom
litellm_fix_a2a_test_regression
Open

fix(test): resolve A2A guardrail test regressions from #20619#20625
shin-bot-litellm wants to merge 2 commits intomainfrom
litellm_fix_a2a_test_regression

Conversation

@shin-bot-litellm
Copy link
Contributor

Regression Fix

Failing Job: litellm_proxy_unit_testing_part2
Caused By: PR #20619
Author: @krrishdholakia

What Broke

PR #20619 ("Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support") changed the implementation of:

  1. A2A endpoints (a2a_endpoints.py):

    • Changed from using add_litellm_data_to_request to using ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic
    • Test test_invoke_agent_a2a_adds_litellm_data was mocking the old function path
  2. MCP guardrail handler (handler.py):

    • Changed from processing messages array to processing mcp_tool_name/mcp_arguments
    • Now passes tool definitions (GenericGuardrailAPIInputs(tools=[...])) instead of text content
    • Tests test_process_input_messages_skips_when_no_messages and test_process_input_messages_updates_content expected the old message-based behavior

This Fix

Updates the test expectations to match the new implementation:

  1. test_invoke_agent_a2a_adds_litellm_data:

    • Now mocks ProxyBaseLLMRequestProcessing and its common_processing_pre_call_logic method
    • Verifies the processor is instantiated and called correctly
  2. MCP guardrail handler tests:

    • Renamed test_process_input_messages_skips_when_no_messagestest_process_input_messages_skips_when_no_tool_name
    • Updated to verify the handler processes mcp_tool_name correctly
    • Added test for name alias support
    • Updated mock guardrail to work with tool-based inputs

Testing

Tests verify:

  • ✅ A2A endpoint integrates with ProxyBaseLLMRequestProcessing
  • ✅ MCP handler calls guardrail when mcp_tool_name is present
  • ✅ MCP handler skips guardrail when mcp_tool_name is missing
  • ✅ MCP handler passes tool definition to guardrail

Update tests to match new implementation:

1. test_invoke_agent_a2a_adds_litellm_data:
   - Mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic
     instead of add_litellm_data_to_request (which is no longer used)

2. test_process_input_messages_* (MCP guardrail handler):
   - Handler now processes mcp_tool_name/mcp_arguments instead of messages
   - Passes GenericGuardrailAPIInputs(tools=[...]) to guardrails
   - Updated tests to verify new tool-based guardrail behavior
   - Renamed test_process_input_messages_skips_when_no_messages to
     test_process_input_messages_skips_when_no_tool_name
@vercel
Copy link

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 7, 2026 2:43am

Request Review

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

  • Updates two unit test suites to match the post-Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support #20619 guardrail plumbing: A2A now uses ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic, and MCP guardrail translation now sends tool definitions (GenericGuardrailAPIInputs(tools=[...])) rather than message text.
  • A2A test swaps mocking from add_litellm_data_to_request to the request-processor class/method and checks it is invoked.
  • MCP handler tests are rewritten around mcp_tool_name/mcp_arguments and add coverage for name/arguments aliases.
  • One test issue remains: MCP assertions treat tool objects as dicts and will fail when tools[] items are typed/Pydantic objects; A2A assertions are also a bit weak (don’t validate call args / propagation of returned (data, logging_obj) into asend_message).

Confidence Score: 3/5

  • This PR is likely safe once the failing MCP test assertions are corrected to match the actual tool object types passed to guardrails.
  • Changes are test-only and align with the updated production behavior, but at least one updated test appears to make incorrect assumptions about GenericGuardrailAPIInputs.tools element types (dict vs typed object), which can cause the unit job to keep failing or become flaky across pydantic versions.
  • tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py

Important Files Changed

Filename Overview
tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py Updates MCP guardrail handler tests to assert tool-based inputs; current assertions treat GenericGuardrailAPIInputs.tools items as dicts, which will fail when they are Pydantic/typed objects.
tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py Updates A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic; test is somewhat brittle because it doesn’t assert the call args / use of returned (data, logging_obj) in the downstream asend_message call.

Sequence Diagram

sequenceDiagram
    participant T as test_a2a_endpoints.py
    participant E as invoke_agent_a2a()
    participant P as ProxyBaseLLMRequestProcessing
    participant G as common_processing_pre_call_logic()
    participant A as asend_message()

    T->>E: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
    E->>P: "ProxyBaseLLMRequestProcessing(data=body)"
    P->>G: "common_processing_pre_call_logic(request, settings, auth, ...)"
    G-->>P: "(data, logging_obj)"
    P-->>E: "(data, logging_obj)"
    E->>A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"

    participant M as test_mcp_guardrail_handler.py
    participant H as MCPGuardrailTranslationHandler
    participant C as CustomGuardrail.apply_guardrail()

    M->>H: "process_input_messages(data)"
    H->>H: "Build MCPTool(name/description)"
    H->>H: "transform_mcp_tool_to_openai_tool(mcp_tool)"
    H->>C: "apply_guardrail(inputs=GenericGuardrailAPIInputs(tools=[tool_def]), request_data=data)"
    C-->>H: "guardrail result (ignored)"
    H-->>M: "return original data"
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +50 to +57
# Guardrail should receive tool definition in inputs
assert "tools" in guardrail.last_inputs
assert len(guardrail.last_inputs["tools"]) == 1

tool = guardrail.last_inputs["tools"][0]
assert tool["type"] == "function"
assert tool["function"]["name"] == "weather"
assert tool["function"]["description"] == "Get weather for a city"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong type for tools

MCPGuardrailTranslationHandler.process_input_messages() builds GenericGuardrailAPIInputs(tools=[...]), so guardrail.last_inputs["tools"][0] is a ChatCompletionToolParam/Pydantic model (attribute access), not a dict. Indexing like tool["type"]/tool["function"]["name"] will raise at runtime when the handler passes real objects.

Consider asserting via attribute access (e.g. tool.type, tool.function.name) or converting to a dict via .model_dump() first before indexing.

Comment on lines +25 to 45
async def mock_common_processing(
request,
general_settings,
user_api_key_dict,
proxy_logging_obj,
proxy_config,
route_type,
version,
):
# Get the data from the processor instance via closure
data = mock_processor_instance.data
# Simulate what common_processing_pre_call_logic does
data["proxy_server_request"] = {
"url": "http://localhost:4000/a2a/test-agent",
"method": "POST",
"headers": {},
"body": dict(data),
}
captured_data.update(data)
return data
return data, MagicMock() # Returns (data, logging_obj)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mock return shape mismatch

invoke_agent_a2a() expects common_processing_pre_call_logic to return (data, logging_obj) (see a2a_endpoints.py:300-308). In this test, mock_common_processing() returns data, MagicMock() but the mocked processor is a plain MagicMock() and your side_effect reads mock_processor_instance.data instead of using the actual call args; if ProxyBaseLLMRequestProcessing ever changes to pass/clone data internally, this test can pass while behavior breaks.

Prefer validating the actual arguments passed to common_processing_pre_call_logic (request/general_settings/user_api_key_dict/route_type/version) and assert the returned data is what asend_message() receives via metadata/proxy_server_request. Right now the test doesn’t prove the endpoint uses the returned data tuple correctly.

- Use dict() conversion for TypedDict access (safer for future changes)
- Use .get() for safer dict key access
- Improve A2A test to validate call arguments more explicitly
- Add comments explaining TypedDict vs Pydantic model distinction
@shin-bot-litellm
Copy link
Contributor Author

@greptile can you review again

Addressed your feedback:

  1. test_mcp_guardrail_handler.py: Added dict() conversion for TypedDict access to be defensive against future changes. Using .get() for safer key access.
  2. test_a2a_endpoints.py: Improved test to explicitly capture and validate the arguments passed to common_processing_pre_call_logic, including route_type and version. Also validates that the returned data is passed downstream to asend_message.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

  • Updates MCP guardrail translation tests to reflect the new tool-based GenericGuardrailAPIInputs(tools=[...]) behavior (including name/arguments alias support and missing-args handling).
  • Updates A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic instead of the removed add_litellm_data_to_request call path.
  • Overall intent is to align unit tests with the refactors introduced in Add http support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support #20619 to fix regressions in litellm_proxy_unit_testing_part2.

Confidence Score: 3/5

  • This PR is mostly safe to merge once the failing A2A unit test expectation is corrected.
  • Changes are limited to test updates, but test_invoke_agent_a2a_adds_litellm_data currently asserts a route_type value that disagrees with the real implementation, which will break CI.
  • tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py

Important Files Changed

Filename Overview
tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py Updates MCP guardrail handler tests to assert tool-definition inputs (tools=[...]) and alias/missing-args behavior; assertions still depend on tools being dict-castable but otherwise align with handler.
tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py Reworks A2A endpoint test to mock ProxyBaseLLMRequestProcessing.common_processing_pre_call_logic, but currently asserts the wrong route_type compared to implementation, causing a test failure.

Sequence Diagram

sequenceDiagram
    participant Test as A2A test
    participant Endpoint as invoke_agent_a2a
    participant Proc as ProxyBaseLLMRequestProcessing
    participant A2A as asend_message

    Test->>Endpoint: "invoke_agent_a2a(agent_id, request, user_api_key_dict)"
    Endpoint->>Proc: "__init__(data=body+model+provider)"
    Endpoint->>Proc: "common_processing_pre_call_logic(..., route_type, version)"
    Proc-->>Endpoint: "(data, logging_obj)"
    Endpoint->>A2A: "asend_message(..., metadata=data.metadata, proxy_server_request=data.proxy_server_request)"
    A2A-->>Endpoint: "response"
    Endpoint-->>Test: "JSONResponse(response.model_dump())"

    participant MCPTest as MCP tests
    participant MCPHandler as MCPGuardrailTranslationHandler
    participant Guardrail as apply_guardrail

    MCPTest->>MCPHandler: "process_input_messages(data, guardrail)"
    alt "mcp_tool_name missing"
        MCPHandler-->>MCPTest: "return data"
    else "tool name present"
        MCPHandler->>Guardrail: "apply_guardrail(inputs.tools=[tool_def], request_data=data)"
        Guardrail-->>MCPHandler: "result"
        MCPHandler-->>MCPTest: "return data"
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +26 to +56
async def mock_common_processing(
request,
general_settings,
user_api_key_dict,
proxy_logging_obj,
proxy_config,
route_type,
version,
):
# Capture the actual arguments passed to common_processing_pre_call_logic
processing_call_args["request"] = request
processing_call_args["general_settings"] = general_settings
processing_call_args["user_api_key_dict"] = user_api_key_dict
processing_call_args["route_type"] = route_type
processing_call_args["version"] = version

# Get the data from the processor instance
data = mock_processor_instance.data

# Simulate what common_processing_pre_call_logic does
data["proxy_server_request"] = {
"url": "http://localhost:4000/a2a/test-agent",
"method": "POST",
"headers": {},
"body": dict(data),
}
captured_data.update(data)
return data

# Store the returned data to verify endpoint uses it
returned_data.update(data)
mock_logging_obj = MagicMock()
return data, mock_logging_obj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong route_type asserted

invoke_agent_a2a() passes route_type="asend_message" into common_processing_pre_call_logic (litellm/proxy/agent_endpoints/a2a_endpoints.py:300-308). This test asserts route_type == "a2a_request", so it will fail (or force an incorrect behavior) even though the endpoint is correct. Update the expectation to match the actual route type used by the endpoint.

@giulio-leone
Copy link

Automated patch bundle from next-100 unresolved backlog expansion.\nGenerated due limited direct branch-write access; please apply/cherry-pick minimal edits below.\n\n## PR #20625litellm_fix_a2a_test_regression (3 unresolved)

Unresolved thread summary

  • T1 tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py:57 — Wrong type for tools
  • T2 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:57 — Mock return shape mismatch
  • T3 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:56 — Wrong route_type asserted

Minimal patch proposals

  • T1 tests/test_litellm/proxy/_experimental/mcp_server/guardrail_translation/test_mcp_guardrail_handler.py:57
    • Edit steps:
      1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
      2. Keep the test deterministic and verify it fails before / passes after the patch intent.
  • T2 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:57
    • Edit steps:
      1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
      2. Keep the test deterministic and verify it fails before / passes after the patch intent.
  • T3 tests/test_litellm/proxy/agent_endpoints/test_a2a_endpoints.py:56
    • Edit steps:
      1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
      2. Keep the test deterministic and verify it fails before / passes after the patch intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants