Skip to content

fix: integration test cases#671

Merged
akshaydeo merged 2 commits intomainfrom
10-23-fix_integration_test_cases
Nov 25, 2025
Merged

fix: integration test cases#671
akshaydeo merged 2 commits intomainfrom
10-23-fix_integration_test_cases

Conversation

@TejasGhatte
Copy link
Copy Markdown
Collaborator

@TejasGhatte TejasGhatte commented Oct 23, 2025

Summary

Increased API timeout values and added model listing tests for Anthropic, Google, and OpenAI integrations.

Changes

  • Increased default timeout from 30 to 120 seconds in Anthropic and OpenAI client configurations
  • Updated timeout values in streaming content collection functions from 30 to 120 seconds
  • Added null check for Anthropic response content before checking its length
  • Added new test cases to verify model listing functionality for all three providers:
    • test_14_list_models for Anthropic
    • test_15_list_models for Google
    • test_31_list_models for OpenAI

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Run the integration tests for the affected providers:

# Run Anthropic integration tests
pytest tests/integrations/tests/integrations/test_anthropic.py -v

# Run Google integration tests
pytest tests/integrations/tests/integrations/test_google.py -v

# Run OpenAI integration tests
pytest tests/integrations/tests/integrations/test_openai.py -v

Breaking changes

  • Yes
  • No

Related issues

Addresses timeout issues with API calls and adds test coverage for model listing functionality.

Security considerations

No security implications.

Checklist

  • I added/updated tests where appropriate
  • I verified the CI pipeline passes locally if applicable

@TejasGhatte TejasGhatte mentioned this pull request Oct 23, 2025
8 tasks
Copy link
Copy Markdown
Collaborator Author

TejasGhatte commented Oct 23, 2025

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 23, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added support for Bedrock and Cohere providers.
    • Introduced Gemini 2.5 models with TTS capabilities.
    • Enhanced speech synthesis with multi-speaker and voice customization support.
    • Added transcription capabilities across providers.
    • Improved audio format detection and validation.
  • Improvements

    • Refined streaming and response handling for better compatibility.
    • Enhanced tool call tracking with ID support.
    • Restructured test infrastructure for cross-provider testing.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Reworks provider/config mapping to provider-centric models, adds broad multimodal (speech/transcription/audio) support and utilities, introduces extensive test reorganization (new/removed integration tests and tooling), expands Gemini/Vertex speech/types, and updates Bifrost routing and response converters for speech/transcription and streaming.

Changes

Cohort / File(s) Change summary
Tests — integrations overhaul
tests/integrations/...
Large reorganization: removed legacy Makefile/requirements/pytest.ini and many one-off tests; added new pyproject/config.json/.python-version, dummy credentials, new integration test utilities and constants, provider-centric config, and many parameterized/cross-provider test modules; tests/integrations/tests/integrations/test_google.py was removed in one commit and a Google test module was also added elsewhere (diff shows both adds/removes across PR).
Tests — utils & parametrize
tests/integrations/tests/utils/common.py
tests/integrations/tests/utils/parametrize.py
tests/integrations/tests/utils/config_loader.py
Added rich response/tool/image/audio helpers, provider-voice utilities, streaming collectors, audio generation helpers, and provider-centric config loader API plus cross-provider parametrization helpers.
Transports / Router / Bifrost HTTP
transports/bifrost-http/integrations/genai.go
transports/bifrost-http/router.go
transports/bifrost-http/handlers/*.go
RouteConfig extended for Speech/Transcription detection and converters; added SpeechResponseConverter/TranscriptionResponseConverter fields; improved MIME detection and request branching; ensured auth_config fallback in handlers and adjusted session handler nil behavior.
Core — Gemini provider (types, chat, speech, transcription, utils)
core/providers/gemini/types.go
core/providers/gemini/chat.go
core/providers/gemini/speech.go
core/providers/gemini/transcription.go
core/providers/gemini/utils.go
core/providers/gemini/gemini.go
Added IsSpeech/IsTranscription flags, blob base64 marshal/unmarshal, renamed JSON schema fields, added many JSON tag changes; improved chat streaming/tool-call correlation; added Gemini<->Bifrost speech/transcription converters and context-driven PCM→WAV handling; new helpers to lowercase schema types and normalize generation config.
Core — New/changed audio utilities & testutils
core/providers/utils/audio.go
core/internal/testutil/*.go
Added PCMConfig, DefaultGeminiPCMConfig and ConvertPCMToWAV; added audio validation helpers, SaveAndValidateAudio, DetectAudioFormat, provider-default audio format helper and streaming buffering/validation in test helpers.
Core — Provider adaptations
core/providers/anthropic/responses.go
core/providers/cohere/responses.go
core/providers/bedrock/responses.go
core/providers/vertex/types.go
Streaming content index propagation, thinking param flexibility, function-call id handling, cohere function-call id mapping, bedrock message-role switch, and many new Vertex voice/pronunciation types.
Config & CI
tests/integrations/config.yml
.github/workflows/*.yml
Switched tests config to provider-centric schema (providers, provider_api_keys, provider_scenarios, scenario_capabilities); added providers (bedrock, cohere) and Gemini 2.5 entries; CI workflows updated with check-skip jobs and expanded publish flow.
Tooling & Makefiles
Makefile
transports/bifrost-http/.air.debug.toml
Added DEBUG flag, install-delve target, test-integrations target, .env loading; added Delve + air debug config.
Misc (go mod, logging, docs, UI, gitignore)
core/go.mod
core/schemas/utils.go
docs/*
ui/*
.gitignore
.gitattributes
Added/updated module deps, switched JSON marshaler to sonic in JsonifyInput, docs/image URL change and style tweak, UI logo/layout adjustments, added python ignore patterns and marked dummy credential file as generated.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Router as Bifrost Router
  participant RouteCfg as RouteConfig
  participant Provider as Gemini/Other
  participant Converter as Speech/Transcription Converter
  Note over Client,Router: High-level request flow (multimodal)
  Client->>Router: HTTP request (chat / speech / transcription / embedding)
  Router->>RouteCfg: extractAndSetModelFromURL(url, body)
  Note right of RouteCfg: detect IsSpeech / IsTranscription via MIME & params
  alt IsSpeech
    RouteCfg->>Provider: Build SpeechRequest (ToGeminiSpeechRequest)
    Provider-->>Converter: BifrostSpeechResponse
    Converter->>Router: SpeechResponseConverter(ctx, resp)
    Router->>Client: audio payload (wav/mp3) or converted object
  else IsTranscription
    RouteCfg->>Provider: Build TranscriptionRequest (ToBifrostTranscriptionRequest)
    Provider-->>Converter: BifrostTranscriptionResponse
    Converter->>Router: TranscriptionResponseConverter(...)
    Router->>Client: text transcription
  else Chat/Embedding
    RouteCfg->>Provider: Build Chat/Embedding Request
    Provider-->>Router: Response (streaming allowed)
    Router->>Client: JSON / streaming chunks (includes tool-call ids and content_index)
  end
Loading

Notes:

  • Converter box represents per-route SpeechResponseConverter/TranscriptionResponseConverter hooks.
  • Streaming and tool-call events emit ContentIndex to correlate parts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Potential attention areas:

  • Gemini types JSON tag/name renames (types.go) — verify external API compatibility and marshal/unmarshal behavior.
  • Speech/transcription conversion paths and PCM↔WAV handling across gemini <-> bifrost (speech.go, utils/audio.go, testutils).
  • RouteConfig changes: ensure new Speech/Transcription converters are correctly wired and non-breaking for existing integrations (genai.go, router.go).
  • Large test reorganization and provider-centric config (tests/*) — validate CI/test invocation and secrets handling (dummy credentials file flagged).
  • Schema marshaling change to sonic in core/schemas/utils.go — check performance/semantics and error paths.

Poem

"A rabbit hops on code so bright,
I stitch the audio into light.
Messages, voices, streaming art —
I bind the parts and play my part.
Carrots for tests, a tiny cheer,
New routes and sounds — the release is near!" 🐇🎧

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.47% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix: integration test cases' is vague and generic, failing to convey the specific changes (timeout increases and model listing tests). Consider a more specific title like 'fix: increase integration test timeouts and add model listing tests' to clearly communicate the main changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers the main changes (timeouts, null check, model listing tests) and follows the template structure with appropriate sections, but the Summary section is somewhat brief and could provide more context on why these changes were needed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 10-23-fix_integration_test_cases

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@TejasGhatte TejasGhatte marked this pull request as ready for review October 23, 2025 14:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8618bf8 and 4fedb68.

📒 Files selected for processing (3)
  • tests/integrations/tests/integrations/test_anthropic.py (4 hunks)
  • tests/integrations/tests/integrations/test_google.py (1 hunks)
  • tests/integrations/tests/integrations/test_openai.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (1)
  • collect_streaming_content (783-865)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (1)
  • collect_streaming_content (783-865)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1386-1397)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py

1058-1058: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

589-589: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_google.py

506-506: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (8)
tests/integrations/tests/integrations/test_anthropic.py (3)

78-78: Timeout increase looks good.

The increased timeout (30→120 seconds) for the Anthropic client is appropriate for integration tests and aligns with similar changes across OpenAI and other providers.


486-486: Good defensive check for None content.

Adding the null check before evaluating length prevents potential AttributeError if final_response.content is None.


564-564: Streaming timeout increases are appropriate.

The extended timeouts for streaming operations are consistent with the client configuration and help prevent premature timeouts during integration tests.

Also applies to: 582-582

tests/integrations/tests/integrations/test_openai.py (3)

166-166: Timeout increase is appropriate.

The increased timeout (30→120 seconds) aligns with the timeout changes across all provider integration tests and helps prevent premature timeouts.


473-473: Streaming timeout increases look good.

The extended timeouts for both basic streaming and tool-based streaming are consistent with the overall timeout strategy.

Also applies to: 491-491


573-573: Transcription streaming timeout increase is reasonable.

Audio transcription streaming may require additional processing time, making the 120-second timeout appropriate.

tests/integrations/tests/integrations/test_google.py (2)

505-510: Well-implemented model listing test.

This test correctly includes the @skip_if_no_api_key("google") decorator and uses a consistent approach with the Anthropic test (limiting to 5 results and asserting exactly 5). Good work!

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.


514-535: Well-structured helper function with proper error handling.

The extract_google_function_calls helper follows the pattern established in other provider test files and includes appropriate defensive checks and error handling.

Comment thread tests/integrations/tests/integrations/test_anthropic.py Outdated
Comment thread tests/integrations/tests/test_openai.py
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 4fedb68 to ed5714f Compare October 23, 2025 16:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
core/schemas/providers/anthropic/responses.go (3)

328-336: Bug: tool_use start uses ToolUseID instead of ID (breaks call/result association).

For content_block_start of a tool_use, the identifier is in ContentBlock.ID. ToolUseID is only present on tool_result to reference the earlier tool_use. Using ToolUseID here will produce nil IDs and prevent correlating tool_result with its call.

Apply this diff:

 case AnthropicContentBlockTypeToolUse:
   // This is a function call starting - create function call message

   item := &schemas.ResponsesMessage{
-    ID:   chunk.ContentBlock.ToolUseID,
+    ID:   chunk.ContentBlock.ID,
     Type: schemas.Ptr(schemas.ResponsesMessageTypeFunctionCall),
     ResponsesToolMessage: &schemas.ResponsesToolMessage{
-      CallID:    chunk.ContentBlock.ToolUseID,
+      CallID:    chunk.ContentBlock.ID,
       Name:      chunk.ContentBlock.Name,
       Arguments: schemas.Ptr(""), // Arguments will be filled by deltas
     },
   }

373-383: Use Arguments field for function-call JSON deltas (mapper mismatch).

ToAnthropicResponsesStreamResponse reads bifrostResp.Arguments for FunctionCallArgumentsDelta, but here we set Delta. Populate Arguments to keep round‑trip symmetry and avoid dropped deltas.

Apply this diff:

 case AnthropicStreamDeltaTypeInputJSON:
   // Function call arguments delta
   if chunk.Delta.PartialJSON != nil && *chunk.Delta.PartialJSON != "" {
     return &schemas.BifrostResponsesStreamResponse{
       Type:           schemas.ResponsesStreamResponseTypeFunctionCallArgumentsDelta,
       SequenceNumber: sequenceNumber,
       OutputIndex:    schemas.Ptr(0),
       ContentIndex:   chunk.Index,
-      Delta:          chunk.Delta.PartialJSON,
+      Arguments:      chunk.Delta.PartialJSON,
     }, nil, false
   }

318-346: Fix field mapping errors and populate Arguments field in FunctionCallArgumentsDelta events.

Two critical issues require correction:

  1. Lines 329, 332 (responses.go): Within case AnthropicContentBlockTypeToolUse, use chunk.ContentBlock.ID instead of chunk.ContentBlock.ToolUseID. Per the type definitions, ID is the tool_use identifier, while ToolUseID is for tool_result blocks.

  2. Line 381 (responses.go): When handling FunctionCallArgumentsDelta, populate the Arguments field instead of Delta. The struct defines Arguments *string for this purpose, and other parts of the codebase (mux.go line 915) correctly set streamResp.Arguments. Change Delta: chunk.Delta.PartialJSON to Arguments: chunk.Delta.PartialJSON.

  3. Line 702 (chat.go): Same fix as #1—use chunk.ContentBlock.ID instead of chunk.ContentBlock.ToolUseID within the tool_use case.

♻️ Duplicate comments (2)
tests/integrations/tests/integrations/test_anthropic.py (1)

589-594: Missing @skip_if_no_api_key decorator.

This test should include the @skip_if_no_api_key("anthropic") decorator to gracefully skip when the API key is unavailable, consistent with all other tests in this file.

Apply this diff:

+    @skip_if_no_api_key("anthropic")
     def test_14_list_models(self, anthropic_client, test_config):

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

tests/integrations/tests/integrations/test_openai.py (1)

1058-1064: Missing @skip_if_no_api_key decorator and inconsistent test approach.

Two issues to address:

  1. Like other test methods, this should use @skip_if_no_api_key("openai") to gracefully skip when the API key is unavailable.

  2. This test's approach differs from the Anthropic and Google equivalents:

    • Anthropic (line 592): models.list(limit=5) → asserts len(response.data) == 5
    • Google (line 517): models.list(config={"page_size": 5}) → asserts len(response) == 5
    • OpenAI (line 1062): models.list() → asserts len(response.data) > 0

For consistency and determinism, align with the other providers.

Apply this diff:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

🧹 Nitpick comments (7)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider using proper logging instead of print.

While this debug output can be helpful for troubleshooting, consider using Python's logging module for consistency with production code practices.

Apply this diff if you want to use proper logging:

-        print(error)
+        import logging
+        logging.debug(f"Error from invalid role test: {error}")

Alternatively, if the print was added only for temporary debugging, consider removing it before merging.

core/schemas/providers/anthropic/responses.go (4)

471-507: SSE conformance: ensure index for tool_use starts and include empty content on message_start.

  • content_block_start events should include an index; default to 0 if absent.
  • message_start payload should include an empty content array to mirror Anthropic’s schema.

Apply this diff:

 case schemas.ResponsesStreamResponseTypeOutputItemAdded:
   // Check if this is a function call (tool use) message
   if bifrostResp.Item != nil && bifrostResp.Item.Type != nil && *bifrostResp.Item.Type == schemas.ResponsesMessageTypeFunctionCall {
     // Convert function call to tool_use content_block_start event
     streamResp.Type = AnthropicStreamEventTypeContentBlockStart
-    if bifrostResp.ContentIndex != nil {
-      streamResp.Index = bifrostResp.ContentIndex
-    }
+    if bifrostResp.ContentIndex != nil {
+      streamResp.Index = bifrostResp.ContentIndex
+    } else {
+      zero := 0
+      streamResp.Index = &zero
+    }

     contentBlock := &AnthropicContentBlock{
       Type: AnthropicContentBlockTypeToolUse,
     }

     if bifrostResp.Item.ResponsesToolMessage != nil {
       if bifrostResp.Item.ResponsesToolMessage.CallID != nil {
         contentBlock.ID = bifrostResp.Item.ResponsesToolMessage.CallID
       }
       if bifrostResp.Item.ResponsesToolMessage.Name != nil {
         contentBlock.Name = bifrostResp.Item.ResponsesToolMessage.Name
       }
     }

     streamResp.ContentBlock = contentBlock
   } else {
     // Regular message start event
     streamResp.Type = AnthropicStreamEventTypeMessageStart
     if bifrostResp.Item != nil {
       // Create message start event
       streamMessage := &AnthropicMessageResponse{
         Type: "message",
-        Role: string(schemas.ResponsesInputMessageRoleAssistant),
+        Role: string(schemas.ResponsesInputMessageRoleAssistant),
+        Content: []AnthropicContentBlock{},
       }
       if bifrostResp.Item.ID != nil {
         streamMessage.ID = *bifrostResp.Item.ID
       }
       streamResp.Message = streamMessage
     }
   }

1265-1279: Guard against nil msg.Content before dereference.

Several branches dereference msg.Content without a prior nil check, risking a panic for messages lacking content.

Apply this diff:

-      case schemas.ResponsesMessageTypeMessage:
-        // Regular text message
-        if msg.Content.ContentStr != nil {
+      case schemas.ResponsesMessageTypeMessage:
+        // Regular text message
+        if msg.Content != nil && msg.Content.ContentStr != nil {
           contentBlocks = append(contentBlocks, AnthropicContentBlock{
             Type: "text",
             Text: msg.Content.ContentStr,
           })
-        } else if msg.Content.ContentBlocks != nil {
+        } else if msg.Content != nil && msg.Content.ContentBlocks != nil {
           // Convert content blocks
           for _, block := range msg.Content.ContentBlocks {
             anthropicBlock := convertContentBlockToAnthropic(block)
             if anthropicBlock != nil {
               contentBlocks = append(contentBlocks, *anthropicBlock)
             }
           }
         }

509-596: Optional: expand ContentPartAdded mapping to support images when streaming.

Currently handles only text blocks. If Anthropic emits image content blocks mid‑stream, mirror them here for completeness.


416-425: Minor: include stop_reason in message_delta when known.

If upstream carries stop_reason in the Bifrost event, add it to the message_delta for better parity. Safe to defer.

core/schemas/providers/gemini/chat.go (2)

497-501: Normalize non‑streaming assistant role to Gemini's 'model'.

Gemini Content.Role allows only 'user' or 'model'. Mapping 'assistant' → 'model' avoids invalid role values.

-          candidate.Content = &Content{
-            Parts: parts,
-            Role:  string(choice.ChatNonStreamResponseChoice.Message.Role),
-          }
+          role := string(choice.ChatNonStreamResponseChoice.Message.Role)
+          if role == string(schemas.ChatMessageRoleAssistant) {
+            role = "model"
+          }
+          candidate.Content = &Content{
+            Parts: parts,
+            Role:  role,
+          }

Please confirm Gemini consumption points expect only 'user'|'model'; adjust if there’s any downstream relying on 'assistant'.


418-462: Reduce duplication: extract shared tool‑call → Part conversion.

Both branches build FunctionCall Parts from tool calls. Consider a small helper:

  • toFunctionCallPart(idPtr, fnNamePtr, argsJSON string) (*Part, bool)

This trims repeated JSON parsing and ID/name wiring and centralizes error handling.

I can draft the helper and update both branches if you want.

Also applies to: 463-501

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fedb68 and ed5714f.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
  • ResponsesInputMessageRoleAssistant (304-304)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (13)
tests/integrations/tests/integrations/test_anthropic.py (4)

78-78: LGTM: Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds for the Anthropic client is appropriate for integration tests and aligns with the PR objective to fix timeout issues.


486-486: LGTM: Defensive null check prevents potential AttributeError.

The added null check appropriately handles the case where Anthropic returns empty content when tool results are sufficient, as documented in the surrounding comment.


564-564: LGTM: Streaming timeout increases improve test reliability.

The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with the PR's objective to address timeout issues in integration tests.

Also applies to: 582-582


540-550: Based on my verification, I found a critical inconsistency: the test behavior contradicts both the Bifrost codebase and the pattern across other provider integrations.

Key Findings:

  1. Inconsistent Test Patterns: While test_anthropic.py expects a successful response without exceptions, test_openai.py, test_litellm.py, and test_google.py all use pytest.raises(Exception) for the same INVALID_ROLE_MESSAGES test data.

  2. No Role Normalization in Bifrost: Examination of the Bifrost Anthropic provider code shows direct pass-through of roles without validation or normalization:

    • Role: schemas.ChatMessageRole(msg.Role) (line 54 in core/schemas/providers/anthropic/chat.go)
    • No filtering, validation, or normalization logic present
  3. Misleading Comment: The comment "bifrost handles invalid roles internally" is not supported by the Bifrost codebase. The code performs no special handling for invalid roles like "tester".

The test is likely incorrect and should be aligned with the other provider tests. The Anthropic API (like other LLM APIs) rejects invalid message roles; Bifrost does not normalize them as the comment suggests.

tests/integrations/tests/integrations/test_openai.py (3)

166-166: LGTM: Timeout increase improves integration test resilience.

The timeout increase from 60 to 120 seconds aligns with the PR objective to address timeout issues in integration tests.


474-474: LGTM: Streaming timeout increases address test stability.

The timeout increases from 30 to 120 seconds for streaming tests are appropriate and consistent with similar changes across other provider tests.

Also applies to: 492-492


574-574: LGTM: Transcription streaming timeout increase is appropriate.

The timeout increase from 60 to 120 seconds for transcription streaming is reasonable given that transcription operations can be more time-intensive.

tests/integrations/tests/integrations/test_google.py (4)

147-152: Excellent defensive improvements to image loading.

The additions are all best practices:

  • User-Agent header prevents 403 errors from bot detection
  • Timeout prevents indefinite hanging
  • raise_for_status() ensures HTTP errors are caught early

These changes improve test reliability significantly.


469-483: LGTM: Improved streaming content extraction with proper fallback.

The enhanced logic properly navigates Google GenAI's nested streaming structure (candidates → content → parts → text) with a fallback to direct chunk.text access for compatibility. The reduced minimum content length from 10 to 5 appears to reflect observed streaming behavior.


514-519: LGTM: Well-implemented model listing test.

This test correctly includes the @skip_if_no_api_key decorator and uses a deterministic approach with page_size: 5 and an exact count assertion, consistent with best practices.

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.


523-544: Clarify intent for unused helper function.

The extract_google_function_calls helper is well-structured with proper defensive programming, but verification confirms it's not called anywhere in the codebase. Existing tests at lines 236-239 and 316-319 access response.function_calls directly instead.

Either refactor the existing tests to use this helper for consistency, or remove it if it's not intended for future use or external consumption.

core/schemas/providers/anthropic/responses.go (1)

266-268: Good fix: ensure JSON emits [] instead of null for content.

Setting Content to an empty slice avoids nulls and aligns better with clients expecting arrays. LGTM.

core/schemas/providers/gemini/chat.go (1)

425-453: Remove the proposed diff; the nil deref risk does not exist, but role normalization is needed only in the non-streaming path.

The review comment incorrectly identifies a nil pointer dereference risk. In the code, toolCall.Function is an embedded value field (ChatAssistantMessageToolCallFunction), not a pointer, so it cannot be nil—accessing toolCall.Function.Arguments and toolCall.Function.Name is always safe.

However, there is a legitimate issue in the non-streaming path at line 499: the code outputs string(choice.ChatNonStreamResponseChoice.Message.Role) directly, which will be "assistant" for ChatMessageRoleAssistant. Since Gemini expects roles to map "assistant" to "model", only line 499 needs normalization. The streaming path already defaults to "model" and does not require the suggested guard. The json.Unmarshal errors being silently ignored is a minor quality issue (not critical for streaming partial JSON).

Likely an incorrect or invalid review comment.

@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from ed5714f to 21dd884 Compare October 23, 2025 18:10
@TejasGhatte TejasGhatte force-pushed the 10-17-feat_added_list_models_request branch from 8618bf8 to 40d6d8f Compare October 23, 2025 18:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

1059-1064: Missing decorator and non-deterministic assertion.

Two issues need addressing:

  1. Missing @skip_if_no_api_key("openai") decorator: All other tests in this suite use this decorator to gracefully skip when the API key is unavailable. This test should follow the same pattern.

  2. Non-deterministic assertion: The test uses assert len(response.data) > 0, which makes the test non-deterministic. For consistency with the Anthropic (line 592-594) and Google (line 517-519) tests, use a fixed limit for deterministic validation.

Apply this diff:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The OpenAI Python SDK does support the limit parameter for models.list() (default 20, valid range 1-100). The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider using structured logging instead of print statements.

While printing the error aids debugging, consider using Python's logging module for better control and formatting in test output.

-        print(error)
+        import logging
+        logging.debug(f"Error in test_12_error_handling_invalid_roles: {error}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5714f and 21dd884.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (2)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
core/schemas/providers/anthropic/responses.go (1)

266-268: LGTM: Consistent empty slice initialization.

Setting Content to an empty slice instead of leaving it nil ensures consistent JSON serialization (empty array [] instead of null). This is a good practice for API responses.

tests/integrations/tests/integrations/test_anthropic.py (5)

78-78: LGTM! Timeout increase addresses integration test reliability.

The 4x timeout increase aligns with the PR's goal of fixing integration test timeout issues and is consistently applied across all provider tests.


486-491: LGTM! Proper null-safety for known Anthropic behavior.

The null check correctly handles cases where Anthropic returns empty content after tool results, which is documented as valid behavior.


540-550: LGTM! Test correctly reflects bifrost's error handling behavior.

The test now validates that bifrost handles invalid roles internally rather than propagating errors to the provider. This is a significant behavioral change from the original test expectation but aligns with bifrost's role as a translation/normalization layer.


564-564: LGTM! Streaming timeout increases improve test reliability.

The 4x timeout increase for streaming operations is consistent with the broader timeout adjustments in this PR and should reduce flaky test failures.

Also applies to: 582-582


589-594: LGTM! Model listing test is well-structured and deterministic.

The test correctly uses the @skip_if_no_api_key decorator and requests a fixed number of models (5) for deterministic validation, consistent with the Google provider's test approach.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

tests/integrations/tests/integrations/test_openai.py (2)

166-166: LGTM! Timeout increase improves test reliability.

Consistent with the timeout adjustments across all provider tests in this PR.


474-474: LGTM! Streaming timeout increases address test flakiness.

The timeout increases across all streaming scenarios (chat, tools, transcription) should improve test reliability for slower network conditions or provider throttling.

Also applies to: 492-492, 574-574

tests/integrations/tests/integrations/test_google.py (4)

147-152: LGTM! Robust image loading with proper error handling.

The improvements add three important safety measures:

  1. User-Agent header prevents 403 errors from servers that block default clients
  2. 30-second timeout prevents indefinite hangs
  3. raise_for_status() ensures HTTP errors are caught early

469-479: LGTM! More robust streaming text extraction.

The improved extraction logic properly traverses the Google GenAI response structure (candidates → content → parts → text) with defensive attribute checks, while maintaining a fallback for compatibility. This should handle edge cases better than the previous implementation.


483-483: Relaxed content threshold—verify this is appropriate.

The minimum content length was reduced from 10 to 5 characters. While this makes the test more tolerant of short responses, 5 characters is quite minimal. Ensure this threshold still validates meaningful streaming content and won't pass on truncated or error responses.


514-519: LGTM! Deterministic model listing test with proper decorator.

The test correctly uses the @skip_if_no_api_key decorator and requests a fixed page size for deterministic validation, consistent with the Anthropic test approach. The Google SDK uses config={"page_size": 5} syntax, which is appropriate for this provider.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

Comment thread core/providers/gemini/chat.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Consider removing debug print statement.

The print(error) statement appears to be for debugging purposes. While helpful during development, consider removing it or replacing it with proper logging for production test code.

Apply this diff if you want to remove the debug statement:

         # Verify the error is properly caught and contains role-related information
         error = exc_info.value
-        print(error)
         assert_valid_error_response(error, "tester")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5714f and 21dd884.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
core/schemas/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (517-519)
  • ChatNonStreamResponseChoice (511-514)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (949-957)
  • Part (982-1006)
  • FunctionCall (1078-1088)
ui/lib/types/logs.ts (1)
  • Function (139-144)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (2)
  • ResponsesMessageTypeFunctionCall (268-268)
  • ResponsesToolMessage (431-451)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
tests/integrations/tests/integrations/test_anthropic.py (5)

78-78: LGTM! Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds aligns with the PR objectives to fix integration test timeout issues.


486-491: Good defensive null check.

This properly handles the case where Anthropic returns an empty content array when a tool result is sufficient, preventing potential AttributeErrors.


564-564: LGTM! Extended streaming timeouts.

The timeout increases from 30 to 120 seconds for streaming operations align with the PR objectives and ensure tests don't fail prematurely on slower network connections or API responses.

Also applies to: 582-582


589-594: LGTM! Model listing test added with proper decorator.

The test correctly uses the @skip_if_no_api_key decorator and validates that exactly 5 models are returned when limited to 5. This aligns with similar tests in other provider integrations.


540-550: Test behavior is inconsistent with other provider tests and the claim about Bifrost is unverified.

The Anthropic test_12 does not follow the established pattern used across OpenAI, Google, and LiteLLM integrations. While all other provider tests expect exceptions for invalid roles and call assert_valid_error_response(), the Anthropic test expects a successful response without validation. Additionally:

  • The test does not use convert_to_anthropic_messages() like other tests in the same file
  • The comment claims "bifrost handles invalid roles internally" but this is not verified against Bifrost's actual implementation
  • The same INVALID_ROLE_MESSAGES fixture with role="tester" is expected to raise exceptions in all other providers

Either verify that Bifrost's Anthropic integration has documented, provider-specific role handling that differs from OpenAI/Google/LiteLLM, or align the test with the established error-handling pattern used elsewhere.

tests/integrations/tests/integrations/test_openai.py (3)

166-166: LGTM! Timeout increase improves test reliability.

The timeout increase from 30 to 120 seconds addresses timeout issues mentioned in the PR objectives.


474-474: LGTM! Extended streaming timeouts.

The timeout increases from 30 to 120 seconds for streaming operations are consistent with changes in other provider tests and address the PR's timeout-related objectives.

Also applies to: 492-492, 574-574


1059-1064: Missing decorator and inconsistent test pattern.

This test has two issues:

  1. Missing @skip_if_no_api_key("openai") decorator – Unlike all other test methods in this file, this test lacks the decorator to skip when the API key is unavailable.

  2. Inconsistent assertion pattern – This test uses assert len(response.data) > 0 while the equivalent tests in other providers use deterministic assertions with a fixed limit:

    • Anthropic (line 592): models.list(limit=5)assert len(response.data) == 5
    • Google (line 517): models.list(config={"page_size": 5})assert len(response) == 5

The past review thread indicates you mentioned "list is not supported in openai sdk," but the OpenAI Python SDK does support the limit parameter for models.list().

Apply this diff to align with other provider tests:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

Likely an incorrect or invalid review comment.

core/schemas/providers/anthropic/responses.go (2)

266-268: Good defensive initialization.

Setting Content to an empty slice instead of leaving it nil prevents potential nil pointer dereferences in downstream code that expects a slice.


471-507: LGTM! Improved function call handling in streaming responses.

The code now correctly differentiates between function call messages and regular messages in the OutputItemAdded event:

  • Function calls emit a ContentBlockStart event with a ToolUse content block
  • Regular messages emit a MessageStart event with an assistant role

This aligns with Anthropic's streaming event model and ensures proper handling of tool-use blocks.

tests/integrations/tests/integrations/test_google.py (4)

147-152: Excellent improvements to image loading reliability.

The additions address common issues when fetching remote images:

  • User-Agent header prevents 403 errors from servers like Wikipedia
  • timeout=30 prevents indefinite hangs
  • raise_for_status() ensures bad HTTP responses are caught early

469-483: LGTM! Robust streaming content extraction.

The updated parsing correctly navigates Google GenAI's nested streaming structure (candidates -> content -> parts -> text) with a compatibility fallback to chunk.text. The reduced minimum content length assertion (>5 vs >10) is more realistic for streaming scenarios.


514-519: LGTM! Model listing test properly implemented.

The test correctly includes the @skip_if_no_api_key decorator and uses page_size=5 to request exactly 5 models, then asserts the count matches. This is consistent with the Anthropic provider's test pattern.


523-544: LGTM! Helper function with proper error handling.

The new extract_google_function_calls helper safely extracts function call metadata with:

  • Proper type checking using hasattr
  • Exception handling for AttributeError and TypeError
  • Warning messages for debugging

Comment thread core/providers/gemini/chat.go
@TejasGhatte TejasGhatte changed the base branch from 10-17-feat_added_list_models_request to graphite-base/671 October 23, 2025 18:47
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 21dd884 to d1ad6c4 Compare October 23, 2025 18:49
@TejasGhatte TejasGhatte changed the base branch from graphite-base/671 to 10-17-feat_added_list_models_request October 23, 2025 18:49
@TejasGhatte TejasGhatte changed the base branch from 10-17-feat_added_list_models_request to graphite-base/671 October 23, 2025 18:59
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from d1ad6c4 to 06eadf7 Compare October 24, 2025 04:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)

486-491: Fix critical logic error in tool call argument parsing.

Both the if and else branches set argsMap to an empty map, which discards successfully parsed arguments. This breaks tool call argument handling for non-streaming responses.

Apply this diff to fix the logic:

 					argsMap := make(map[string]interface{})
 					if toolCall.Function.Arguments != "" {
 						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+							// Keep argsMap empty on error
 							argsMap = map[string]interface{}{}
-						} else {
-							argsMap = map[string]interface{}{}
 						}
 					}
♻️ Duplicate comments (1)
core/schemas/providers/gemini/chat.go (1)

443-460: Handle unmarshaling errors for tool call arguments.

This issue was already identified in previous reviews. Line 447 calls json.Unmarshal without checking the error, which can silently produce empty argsMap if the JSON is malformed.

🧹 Nitpick comments (4)
tests/integrations/tests/integrations/test_google.py (3)

515-515: Remove unused parameter.

The test_config parameter is not used in this test method.

As per static analysis hints.

Apply this diff:

-    def test_15_list_models(self, google_client, test_config):
+    def test_15_list_models(self, google_client):

517-519: Assertion may be too strict.

The assertion assert len(response) == 5 assumes the API will return exactly 5 models. If fewer models are available or if the API behavior changes, this test will fail. Consider using assert len(response) <= 5 or assert len(response) > 0 instead.

Apply this diff:

         response = google_client.models.list(config={"page_size": 5})
         assert response is not None
-        assert len(response) == 5
+        assert len(response) > 0, "Should return at least one model"
+        assert len(response) <= 5, "Should not exceed requested page_size"

523-544: Well-implemented helper function with good error handling.

The function correctly extracts function calls with proper type checking and tolerant error handling. The defensive programming approach is appropriate for parsing potentially variable response structures.

One minor suggestion: consider using Python's logging module instead of print for the warning message to allow better control over test output verbosity.

Optional improvement:

+import logging
+
+logger = logging.getLogger(__name__)
+
 def extract_google_function_calls(response: Any) -> List[Dict[str, Any]]:
     """Extract function calls from Google GenAI response format with proper type checking"""
     function_calls = []
 
     # Type check for Google GenAI response
     if not hasattr(response, "function_calls") or not response.function_calls:
         return function_calls
 
     for fc in response.function_calls:
         if hasattr(fc, "name") and hasattr(fc, "args"):
             try:
                 function_calls.append(
                     {
                         "name": fc.name,
                         "arguments": dict(fc.args) if fc.args else {},
                     }
                 )
             except (AttributeError, TypeError) as e:
-                print(f"Warning: Failed to extract Google function call: {e}")
+                logger.warning(f"Failed to extract Google function call: {e}")
                 continue
 
     return function_calls
tests/integrations/tests/integrations/test_openai.py (1)

458-458: Remove debug print statement.

This debug print statement should be removed or replaced with proper logging using pytest's built-in output capture.

Apply this diff to remove the debug statement:

-        print(error)
         assert_valid_error_response(error, "tester")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21dd884 and 06eadf7.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🧰 Additional context used
🧬 Code graph analysis (5)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (9-9)
  • Content (868-876)
  • Part (882-906)
  • FunctionCall (978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (5)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (232-232)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (230-230)
  • AnthropicMessageResponse (194-203)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (272-272)
  • ResponsesToolMessage (435-455)
  • ResponsesInputMessageRoleAssistant (308-308)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
tests/integrations/tests/integrations/test_google.py (2)

469-479: Well-implemented streaming content extraction.

The enhanced parsing logic correctly handles Google GenAI's nested streaming response structure (candidates → content → parts → text) with proper attribute checks. The fallback to chunk.text maintains backward compatibility.


147-152: ****

The review comment misinterprets the timeout changes. The codebase appropriately uses different timeout values for different operation types: timeout=120 is used for streaming content collection in test_openai.py and test_anthropic.py, while timeout=30 is used for simple HTTP GET requests like image fetching in load_image_from_url. A 30-second timeout for a direct image fetch is reasonable and consistent with the PR's objectives of increasing timeouts specifically for longer-running streaming operations. The code change does not require adjustment.

Likely an incorrect or invalid review comment.

tests/integrations/tests/integrations/test_openai.py (4)

166-166: Timeout increase looks good.

The increase from 30 to 120 seconds is appropriate for integration tests and consistent with the PR's objective to address timeout issues.


474-474: Streaming timeout increases look good.

The increased timeouts for streaming operations are appropriate for integration tests and align with the PR's objective to address timeout issues.

Also applies to: 492-492


574-574: Transcription streaming timeout increase looks good.

The increased timeout is appropriate for transcription operations which can take longer than chat completions.


1058-1064: Critical: Missing decorator and inconsistent test approach.

Despite the previous review feedback and your comment about adding the skip decorator, the @skip_if_no_api_key("openai") decorator is still missing from this test method. This will cause the test to fail with an error (rather than skip gracefully) when the API key is unavailable in CI/CD environments.

Additionally, contrary to your comment that "list is not supported in openai sdk," the OpenAI Python SDK does support the limit parameter for models.list(). The previous review included web search results confirming this, showing that the method accepts a limit parameter with a valid range of 1–100 (default 20).

For consistency with the Anthropic test (line 591: models.list(limit=5)) and Google test (line 508: models.list(config={"page_size": 5})), this test should use a deterministic assertion rather than just checking > 0.

Apply this diff to add the decorator and align with other provider tests:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) == 5

This ensures:

  • The test is skipped when no API key is available (consistent with all other tests)
  • The test is deterministic by requesting exactly 5 models (consistent with test_14_list_models for Anthropic and test_15_list_models for Google)

Note: The static analysis hint about unused test_config is a false positive—it's a standard pytest fixture pattern.

Likely an incorrect or invalid review comment.

tests/integrations/tests/integrations/test_anthropic.py (4)

78-78: LGTM! Timeout increase addresses integration test stability.

The timeout increase from 30 to 120 seconds aligns with the PR objectives to address timeout issues in integration tests. This is consistently applied across streaming tests as well.


486-491: LGTM! Defensive null check for Anthropic response content.

The added null check properly handles cases where Anthropic returns empty content when the tool result is sufficient, preventing potential AttributeError exceptions.


540-550: LGTM! Test updated to reflect bifrost's internal error handling.

The test correctly validates that bifrost handles invalid roles internally without raising exceptions, which is the expected behavior per the inline comment.


589-623: LGTM! Comprehensive model listing test with pagination.

The new test properly validates model listing functionality with pagination parameters (after_id, before_id). The test includes appropriate guards for pagination edge cases and follows the established patterns in the test suite.

Note: The static analysis warning about unused test_config is a false positive—it's a standard pytest fixture pattern.

core/schemas/providers/anthropic/responses.go (2)

266-268: LGTM! Defensive initialization ensures consistent response format.

Setting Content to an empty slice when no content blocks exist ensures consistent JSON serialization (empty array vs null), which is good practice for API responses.


471-507: LGTM! Proper differentiation of function calls from regular messages.

The refactored logic correctly handles function calls by emitting ContentBlockStart events with tool_use type, while regular messages continue to emit MessageStart events. The extraction of tool message details (CallID, Name) is handled appropriately.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/schemas/providers/gemini/chat.go (1)

487-491: Critical: Tool call arguments are always empty (non-streaming path).

Lines 487-491 contain inverted logic that discards parsed tool call arguments. Both the error and success branches set argsMap to an empty map, meaning:

  • If unmarshaling succeeds, the parsed data is immediately overwritten with an empty map
  • If unmarshaling fails, an empty map is used (and the error is silently ignored)

This breaks all tool/function calling in non-streaming responses.

Apply this diff to fix the logic:

 					argsMap := make(map[string]interface{})
 					if toolCall.Function.Arguments != "" {
-						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
-							argsMap = map[string]interface{}{}
-						} else {
-							argsMap = map[string]interface{}{}
-						}
+						if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+							// Log the error or handle appropriately
+							// argsMap remains empty on error
+						}
+						// On success, argsMap already contains the parsed data
 					}
♻️ Duplicate comments (2)
core/schemas/providers/gemini/chat.go (1)

447-447: Missing error handling for JSON unmarshaling (streaming path).

Line 447 calls json.Unmarshal without checking the returned error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty, potentially passing incorrect tool call arguments downstream.

tests/integrations/tests/integrations/test_openai.py (1)

1059-1064: Make model listing deterministic; request exactly one page.

Use a page size to avoid environment-dependent counts and align with Anthropic/Google tests.

OpenAI list methods support a limit parameter for pagination; using it here makes the test stable. (github.com)

-        response = openai_client.models.list()
-        assert response.data is not None
-        assert len(response.data) > 0
+        response = openai_client.models.list(limit=5)
+        assert response.data is not None
+        assert 0 < len(response.data) <= 5
🧹 Nitpick comments (2)
tests/integrations/tests/integrations/test_openai.py (1)

456-460: Avoid noisy prints in tests.

Use assertion messages or logging instead of print(error) to keep CI output clean.

-        print(error)
+        # Optionally log at debug if needed:
+        # import logging; logging.getLogger(__name__).debug("OpenAI error: %s", error)
tests/integrations/tests/integrations/test_google.py (1)

515-519: Make the page-size assertion tolerant.

models.list(config={"page_size": 5}) is correct, but some environments may return fewer than 5 items. Prefer <= 5 and > 0 to avoid flakiness. (github.com)

-        assert response is not None
-        assert len(response) == 5
+        assert response is not None
+        assert 0 < len(response) <= 5
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21dd884 and 06eadf7.

📒 Files selected for processing (6)
  • core/schemas/providers/anthropic/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (5 hunks)
  • tests/integrations/tests/integrations/test_google.py (3 hunks)
  • tests/integrations/tests/integrations/test_openai.py (6 hunks)
  • tests/integrations/tests/utils/common.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/integrations/tests/utils/common.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/schemas/providers/anthropic/responses.go
🧰 Additional context used
🧬 Code graph analysis (4)
tests/integrations/tests/integrations/test_google.py (1)
tests/integrations/tests/utils/common.py (1)
  • skip_if_no_api_key (1366-1377)
tests/integrations/tests/integrations/test_anthropic.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (9-9)
  • Content (868-876)
  • Part (882-906)
  • FunctionCall (978-988)
tests/integrations/tests/integrations/test_openai.py (1)
tests/integrations/tests/utils/common.py (2)
  • collect_streaming_content (763-845)
  • skip_if_no_api_key (1366-1377)
🪛 Ruff (0.14.1)
tests/integrations/tests/integrations/test_google.py

515-515: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

590-590: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

1060-1060: Unused method argument: test_config

(ARG002)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (42)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (9)
tests/integrations/tests/integrations/test_openai.py (3)

163-168: LGTM: increased client timeout to 120s.

Matches the PR’s reliability goal and uses the config default cleanly.


472-476: LGTM: extend streaming collector timeouts to 120s.

Reduces test flakiness on slower runs.

Also applies to: 490-494


571-576: LGTM: transcription streaming timeout to 120s.

tests/integrations/tests/integrations/test_anthropic.py (4)

75-81: LGTM: increased client timeout to 120s.


484-491: Good defensive check on empty content.

Prevents None/empty content access during tool-result-only paths.


561-566: LGTM: extend streaming collector timeouts to 120s.

Also applies to: 581-584


589-624: Nice, thorough pagination test.

Covers limit, paging forward/backward, and pagination metadata. Tolerant <= avoids flakiness.

tests/integrations/tests/integrations/test_google.py (2)

147-153: LGTM: hardened image fetch (UA, 30s timeout, raise_for_status).


467-484: LGTM: robust streaming parsing and assertion.

Traverses candidates→content→parts safely; relaxed length avoids brittle failures.

@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 06eadf7 to b027c49 Compare November 3, 2025 02:50
@TejasGhatte TejasGhatte changed the base branch from graphite-base/671 to main November 3, 2025 02:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/schemas/providers/gemini/utils.go (1)

85-104: Consider logging errors in schema conversion.

The function silently returns empty maps on marshal/unmarshal errors (lines 89, 94) and type assertion failures (line 103). This could hide schema conversion issues and result in missing required properties downstream.

Consider adding logging to improve observability:

 func convertSchemaToMap(schema *Schema) map[string]interface{} {
 	// Convert map[string]*Schema to map[string]interface{} using JSON marshaling
 	data, err := sonic.Marshal(schema.Properties)
 	if err != nil {
+		// TODO: Add logging here - log.Warn("failed to marshal schema properties", "error", err)
 		return make(map[string]interface{})
 	}
 
 	var properties map[string]interface{}
 	if err := sonic.Unmarshal(data, &properties); err != nil {
+		// TODO: Add logging here - log.Warn("failed to unmarshal schema properties", "error", err)
 		return make(map[string]interface{})
 	}
 
 	result := convertTypeToLowerCase(properties)
 
 	// Type assert back to map[string]interface{}
 	if resultMap, ok := result.(map[string]interface{}); ok {
 		return resultMap
 	}
+	// TODO: Add logging here - log.Warn("convertTypeToLowerCase returned unexpected type")
 	return make(map[string]interface{})
 }
transports/bifrost-http/integrations/utils.go (1)

503-515: Honor SpeechResponseConverter instead of always returning raw audio.

The new RouteConfig.SpeechResponseConverter is never used—handleNonStreamingRequest still writes raw MP3 bytes. For routes like GenAI we now configure a converter that expects a Gemini GenerateContentResponse; with the current code those requests get binary audio instead of JSON, breaking API compatibility.

Please invoke the converter when it’s provided (and fall back to the legacy raw-audio response only when it’s nil). That keeps OpenAI working while letting GenAI return the expected payload.

 	case bifrostReq.SpeechRequest != nil:
 		speechResponse, bifrostErr := g.client.SpeechRequest(*bifrostCtx, bifrostReq.SpeechRequest)
 		if bifrostErr != nil {
 			g.sendError(ctx, config.ErrorConverter, bifrostErr)
 			return
 		}
 
-		ctx.Response.Header.Set("Content-Type", "audio/mpeg")
-		ctx.Response.Header.Set("Content-Disposition", "attachment; filename=speech.mp3")
-		ctx.Response.Header.Set("Content-Length", strconv.Itoa(len(speechResponse.Audio)))
-		ctx.Response.SetBody(speechResponse.Audio)
-		return
+		if config.PostCallback != nil {
+			if err := config.PostCallback(ctx, req, speechResponse); err != nil {
+				g.sendError(ctx, config.ErrorConverter, newBifrostError(err, "failed to execute post-request callback"))
+				return
+			}
+		}
+
+		if speechResponse == nil {
+			g.sendError(ctx, config.ErrorConverter, newBifrostError(nil, "Bifrost response is nil after post-request callback"))
+			return
+		}
+
+		if config.SpeechResponseConverter != nil {
+			response, err = config.SpeechResponseConverter(speechResponse)
+		} else {
+			ctx.Response.Header.Set("Content-Type", "audio/mpeg")
+			ctx.Response.Header.Set("Content-Disposition", "attachment; filename=speech.mp3")
+			ctx.Response.Header.Set("Content-Length", strconv.Itoa(len(speechResponse.Audio)))
+			ctx.Response.SetBody(speechResponse.Audio)
+			return
+		}
♻️ Duplicate comments (1)
core/schemas/providers/gemini/chat.go (1)

443-460: Handle JSON unmarshaling errors for tool call arguments.

Line 447 calls json.Unmarshal without checking the error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty, potentially causing incorrect tool call arguments.

Apply this diff to add error handling:

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Log or skip this tool call on error
+								continue
+							}
 						}
 						if toolCall.Function.Name != nil {
 							fc := &FunctionCall{
 								Name: *toolCall.Function.Name,
 								Args: argsMap,
 							}
 							if toolCall.ID != nil {
 								fc.ID = *toolCall.ID
 							}
 							parts = append(parts, &Part{FunctionCall: fc})
 						}
 					}
 				}
🧹 Nitpick comments (9)
core/schemas/providers/cohere/responses.go (1)

246-246: Consider defining an enum constant for consistency.

For consistency with the pattern established at lines 172 and 211, consider defining a constant like schemas.ResponsesMessageTypeFunctionCallOutput and using it here instead of the string literal.

core/schemas/providers/gemini/utils.go (1)

106-137: Consider adding recursion depth limit for robustness.

The recursive function correctly handles maps, slices, and primitives to lowercase all "type" fields. However, it lacks a recursion depth limit, which could theoretically cause stack overflow on pathological inputs with deeply nested structures.

While function parameter schemas are typically shallow, consider adding a defensive depth limit:

// convertTypeToLowerCase recursively converts all 'type' fields to lowercase in a schema
func convertTypeToLowerCase(schema interface{}) interface{} {
	return convertTypeToLowerCaseWithDepth(schema, 0, 100)
}

func convertTypeToLowerCaseWithDepth(schema interface{}, depth, maxDepth int) interface{} {
	if depth > maxDepth {
		return schema // Return unchanged if max depth exceeded
	}
	
	switch v := schema.(type) {
	case map[string]interface{}:
		newMap := make(map[string]interface{})
		for key, value := range v {
			if key == "type" {
				if strValue, ok := value.(string); ok {
					newMap[key] = strings.ToLower(strValue)
				} else {
					newMap[key] = value
				}
			} else {
				newMap[key] = convertTypeToLowerCaseWithDepth(value, depth+1, maxDepth)
			}
		}
		return newMap
	case []interface{}:
		newSlice := make([]interface{}, len(v))
		for i, item := range v {
			newSlice[i] = convertTypeToLowerCaseWithDepth(item, depth+1, maxDepth)
		}
		return newSlice
	default:
		return v
	}
}
tests/integrations/tests/utils/provider_capabilities.py (1)

109-112: Consider using list comprehension for cleaner code.

The loop can be simplified to a list comprehension for better readability and Pythonic style.

Apply this diff to refactor the function:

 def get_available_providers() -> List[str]:
     """
     Get list of providers that are available (have API keys configured).
     
     Returns:
         List of available provider names
     """
-    available = []
-    for provider in PROVIDER_CAPABILITY_MODELS.keys():
-        if is_provider_available(provider):
-            available.append(provider)
-    
-    return available
+    return [
+        provider
+        for provider in PROVIDER_CAPABILITY_MODELS.keys()
+        if is_provider_available(provider)
+    ]
core/schemas/providers/gemini/speech.go (1)

25-33: Consider adding separator for concatenated text.

The current implementation concatenates text parts directly without any separator, which could result in words running together. Consider adding a space or newline between parts.

Apply this diff:

 	// Extract text input from contents
 	var textInput string
 	for _, content := range request.Contents {
 		for _, part := range content.Parts {
 			if part.Text != "" {
+				if textInput != "" {
+					textInput += " "
+				}
 				textInput += part.Text
 			}
 		}
 	}
tests/integrations/tests/utils/parametrize.py (1)

18-30: Use Optional for nullable provider filters.

Ruff is flagging the implicit Optional defaults here. Switching to Optional[List[str]] (and importing Optional) clears the warning and keeps the annotations accurate.

-from typing import List, Tuple
+from typing import List, Optional, Tuple
 ...
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: Optional[List[str]] = None,
+    exclude_providers: Optional[List[str]] = None,
tests/integrations/tests/integrations/test_anthropic.py (1)

593-612: Fix unused variable in tuple unpacking.

The variable content_tools is unpacked but never used (line 606). When you don't need all values from a tuple, use underscore prefix for unused variables.

Apply this diff:

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
-                    collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
-                )
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                    collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
+                )
tests/integrations/tests/integrations/test_openai.py (3)

518-539: Fix unused variable in tuple unpacking.

Line 531 has the same issue as in test_anthropic.py: content_tools is unpacked but never used. This should be prefixed with underscore for consistency.

Apply this diff:

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
-                    collect_streaming_content(stream_with_tools, "openai", timeout=300)
-                )
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                    collect_streaming_content(stream_with_tools, "openai", timeout=300)
+                )

1098-1104: Consider using limit parameter for deterministic testing.

The past review comments suggest using .list(limit=5) for consistency with Anthropic (line 618) and Google (line 567) tests. The current non-deterministic assertion len(response.data) > 0 could pass with varying result counts.

For consistency and determinism, consider:

-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) <= 5  # May return fewer if not enough models

However, if the past discussion concluded that OpenAI doesn't support the limit parameter, the current implementation is acceptable—just document why it differs from other providers.


1266-1353: Fix unused variable; otherwise LGTM.

The streaming tests (36-37) provide good coverage of the Responses API streaming functionality with proper event type validation. However, line 1336 has an unused variable issue.

Apply this diff to fix the unused variable:

-        content, chunk_count, tool_calls_detected, event_types = (
-            collect_responses_streaming_content(stream, timeout=300)
-        )
+        _content, chunk_count, tool_calls_detected, event_types = (
+            collect_responses_streaming_content(stream, timeout=300)
+        )

The tests properly validate:

  • Event type presence and variety
  • Tool call detection
  • Chunk count expectations
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06eadf7 and b027c49.

📒 Files selected for processing (19)
  • core/schemas/providers/anthropic/responses.go (3 hunks)
  • core/schemas/providers/cohere/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (1 hunks)
  • core/schemas/providers/gemini/speech.go (1 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/types.go (3 hunks)
  • core/schemas/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (3 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (14 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (2 hunks)
  • tests/integrations/tests/integrations/test_openai.py (14 hunks)
  • tests/integrations/tests/utils/common.py (10 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • tests/integrations/tests/utils/provider_capabilities.py (1 hunks)
  • tests/integrations/tests/utils/provider_test_scenarios.py (1 hunks)
  • transports/bifrost-http/handlers/server.go (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/utils.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (14)
transports/bifrost-http/integrations/utils.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (17-21)
core/schemas/providers/gemini/speech.go (4)
core/schemas/providers/gemini/types.go (5)
  • GeminiGenerationRequest (53-68)
  • GenerationConfig (626-692)
  • SpeechConfig (842-851)
  • VoiceConfig (821-824)
  • PrebuiltVoiceConfig (815-818)
core/schemas/speech.go (3)
  • BifrostSpeechRequest (9-15)
  • SpeechParameters (28-37)
  • SpeechVoiceInput (39-42)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (47-47)
  • Vertex (40-40)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/provider_test_scenarios.py (2)
  • get_providers_for_scenario (258-279)
  • get_model_for_scenario (282-300)
core/schemas/providers/gemini/transcription.go (4)
core/schemas/providers/gemini/types.go (2)
  • GeminiGenerationRequest (53-68)
  • FileData (1021-1029)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-9)
  • TranscriptionParameters (27-36)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (47-47)
  • Vertex (40-40)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (2)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (11-11)
  • Content (872-880)
  • Part (886-910)
  • FunctionCall (1032-1042)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (6)
  • AnthropicThinking (51-54)
  • AnthropicContentBlock (129-139)
  • AnthropicStreamEventTypeContentBlockStart (244-244)
  • AnthropicContentBlockTypeToolUse (123-123)
  • AnthropicStreamEventTypeMessageStart (242-242)
  • AnthropicMessageResponse (206-215)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (272-272)
  • ResponsesToolMessage (435-455)
  • ResponsesInputMessageRoleAssistant (308-308)
core/schemas/providers/cohere/responses.go (1)
core/schemas/responses.go (1)
  • ResponsesMessageTypeFunctionCall (272-272)
tests/integrations/tests/integrations/test_google.py (4)
tests/integrations/tests/utils/common.py (7)
  • assert_valid_transcription_response (1074-1103)
  • assert_valid_streaming_transcription_response (1309-1352)
  • assert_valid_speech_response (1022-1071)
  • collect_streaming_content (855-940)
  • collect_streaming_transcription_content (1405-1466)
  • generate_test_audio (969-999)
  • assert_valid_embedding_response (1106-1167)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
core/schemas/providers/gemini/utils.go (1)
core/schemas/providers/gemini/types.go (1)
  • Type (773-773)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
tests/integrations/tests/utils/provider_capabilities.py (3)
  • get_model_for_provider (66-81)
  • is_provider_available (84-99)
  • get_available_providers (102-114)
tests/integrations/tests/integrations/test_anthropic.py (4)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/common.py (4)
  • assert_valid_chat_response (504-530)
  • assert_valid_image_response (550-599)
  • collect_streaming_content (855-940)
  • skip_if_no_api_key (1490-1501)
transports/bifrost-http/integrations/genai.go (5)
core/schemas/bifrost.go (3)
  • BifrostRequest (130-140)
  • SpeechRequest (90-90)
  • TranscriptionRequest (92-92)
transports/bifrost-http/integrations/utils.go (2)
  • SpeechResponseConverter (128-128)
  • TranscriptionResponseConverter (130-130)
core/schemas/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (139-162)
core/schemas/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/schemas/providers/gemini/types.go (5)
  • GeminiGenerationRequest (53-68)
  • GenerationConfig (626-692)
  • ModalityAudio (711-711)
  • SpeechConfig (842-851)
  • FileData (1021-1029)
transports/bifrost-http/handlers/server.go (1)
plugins/maxim/main.go (1)
  • Init (62-92)
tests/integrations/tests/integrations/test_openai.py (4)
tests/integrations/tests/utils/common.py (8)
  • get_content_string (1743-1750)
  • convert_to_responses_tools (1505-1516)
  • assert_valid_responses_response (1519-1571)
  • assert_responses_has_tool_calls (1574-1594)
  • collect_responses_streaming_content (1597-1651)
  • assert_valid_text_completion_response (1679-1698)
  • collect_text_completion_streaming_content (1701-1741)
  • skip_if_no_api_key (1490-1501)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
🪛 Ruff (0.14.2)
tests/integrations/tests/utils/parametrize.py

18-18: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

tests/integrations/tests/utils/common.py

1623-1623: Avoid specifying long messages outside the exception class

(TRY003)


1645-1647: Avoid specifying long messages outside the exception class

(TRY003)


1660-1660: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1725-1725: Avoid specifying long messages outside the exception class

(TRY003)


1735-1737: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/integrations/test_google.py

234-234: Unused method argument: test_config

(ARG002)


248-248: Unused method argument: test_config

(ARG002)


269-269: Unused method argument: test_config

(ARG002)


292-292: Unused method argument: test_config

(ARG002)


317-317: Unused method argument: test_config

(ARG002)


352-352: Unused method argument: test_config

(ARG002)


372-372: Unused method argument: test_config

(ARG002)


384-384: Unused method argument: test_config

(ARG002)


395-395: Unused method argument: test_config

(ARG002)


504-504: Unused method argument: test_config

(ARG002)


552-552: Unused method argument: test_config

(ARG002)


565-565: Unused method argument: test_config

(ARG002)


572-572: Unused method argument: test_config

(ARG002)


596-596: Unused method argument: test_config

(ARG002)


622-622: Unused method argument: test_config

(ARG002)


646-646: Unused method argument: test_config

(ARG002)


669-669: Unused method argument: test_config

(ARG002)


697-697: Unused method argument: test_config

(ARG002)


720-720: Unused method argument: test_config

(ARG002)


753-753: Unused method argument: test_config

(ARG002)


807-807: Unused method argument: test_config

(ARG002)


851-851: Unused method argument: test_config

(ARG002)


888-888: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_anthropic.py

186-186: Unused method argument: test_config

(ARG002)


200-200: Unused method argument: test_config

(ARG002)


217-217: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


262-262: Unused method argument: test_config

(ARG002)


323-323: Unused method argument: test_config

(ARG002)


341-341: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


574-574: Unused method argument: test_config

(ARG002)


606-606: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


615-615: Unused method argument: test_config

(ARG002)


651-651: Unused method argument: test_config

(ARG002)


726-726: Unused method argument: test_config

(ARG002)


762-762: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_openai.py

225-225: Unused method argument: test_config

(ARG002)


238-238: Unused method argument: test_config

(ARG002)


255-255: Unused method argument: test_config

(ARG002)


270-270: Unused method argument: test_config

(ARG002)


289-289: Unused method argument: test_config

(ARG002)


331-331: Unused method argument: test_config

(ARG002)


347-347: Unused method argument: test_config

(ARG002)


358-358: Unused method argument: test_config

(ARG002)


369-369: Unused method argument: test_config

(ARG002)


499-499: Unused method argument: test_config

(ARG002)


531-531: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1100-1100: Unused method argument: test_config

(ARG002)


1111-1111: Unused method argument: test_config

(ARG002)


1144-1144: Unused method argument: test_config

(ARG002)


1174-1174: Unused method argument: test_config

(ARG002)


1216-1216: Unused method argument: test_config

(ARG002)


1266-1266: Unused method argument: test_config

(ARG002)


1318-1318: Unused method argument: test_config

(ARG002)


1336-1336: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1356-1356: Unused method argument: test_config

(ARG002)


1460-1460: String contains ambiguous (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?

(RUF001)


1478-1478: String contains ambiguous (INFORMATION SOURCE). Did you mean i (LATIN SMALL LETTER I)?

(RUF001)


1499-1499: Unused method argument: test_config

(ARG002)


1520-1520: Unused method argument: test_config

(ARG002)

🔇 Additional comments (24)
core/schemas/providers/cohere/responses.go (2)

211-211: LGTM! Good refactoring to use enum constant.

Replacing the string literal with schemas.ResponsesMessageTypeFunctionCall improves type safety and maintainability, and aligns with the pattern used for other message types in this switch statement.


224-226: LGTM! Correct field for function call identification.

Using msg.CallID instead of msg.ID is the correct approach for function call identifiers, consistent with how CallID is used elsewhere in this file (lines 248, 275, 393) and in the streaming conversion logic.

core/schemas/providers/gemini/utils.go (1)

63-63: Gemini API requirement verified: lowercase types are correct.

The Gemini API function-calling documentation shows OpenAPI-style lowercase types (e.g. "string", "integer", "boolean", "array", "object"), and the code correctly implements this requirement. The implementation at line 63 and the recursive convertTypeToLowerCase function (lines 106-137) properly normalize all type fields to lowercase throughout nested schemas.

The implementation is sound: error handling safely returns empty maps, and the recursive function properly handles maps, slices, and primitives without unbounded recursion.

core/schemas/providers/anthropic/responses.go (2)

140-153: LGTM!

The type assertion for *AnthropicThinking provides a more efficient path when the thinking parameter is already the correct type, while maintaining backward compatibility with map-based parsing.


263-265: LGTM! Good defensive coding.

Explicitly initializing Content to an empty slice instead of leaving it nil ensures consistent behavior for downstream consumers and prevents potential nil pointer dereferences.

tests/integrations/tests/integrations/test_litellm.py (1)

411-413: LGTM! Timeout increase addresses integration test stability.

The timeout increase from 30s to 120s aligns with similar changes across the codebase and addresses the timeout issues mentioned in the PR objectives.

Also applies to: 430-432

core/schemas/providers/gemini/types.go (1)

923-971: LGTM! Base64 handling is correctly implemented.

The custom JSON marshaling/unmarshaling for Blob properly handles base64 encoding/decoding with appropriate error handling. Using RawURLEncoding (URL-safe base64 without padding) aligns with Google GenAI SDK requirements.

tests/integrations/config.yml (1)

54-54: LGTM! Model configuration updates align with cross-provider testing.

The updates to Anthropic models and the addition of Gemini 2.5 speech/transcription models are consistent with the PR objectives and broader cross-provider test infrastructure changes.

Also applies to: 68-78, 263-316

core/schemas/providers/gemini/transcription.go (1)

9-105: LGTM! Transcription request conversion is well-implemented.

The ToBifrostTranscriptionRequest method properly handles:

  • Text prompt extraction with appropriate spacing
  • Audio data extraction from both inline and file data
  • Correct MIME type checking with case-insensitive comparison
  • Safety settings, cached content, and labels propagation to ExtraParams

The implementation aligns well with the existing speech conversion pattern in core/schemas/providers/gemini/speech.go.

tests/integrations/tests/integrations/test_anthropic.py (5)

97-97: Verify the timeout increase is necessary.

The default timeout has been increased from 30s to 120s (4x increase). While the PR objectives mention addressing timeout issues, such a significant increase should be validated:

  • Is this necessary for all providers or just specific ones?
  • Could this mask underlying performance issues?
  • Have you tested that 120s is sufficient without being excessive?

Consider making timeouts provider-specific if only certain providers require extended timeouts.


507-512: LGTM! Good defensive null check.

The added null check before accessing final_response.content properly handles the case where Anthropic returns an empty response when the tool result is sufficient. This prevents potential errors and the fallback message is informative.


614-648: LGTM! Comprehensive pagination testing.

The test thoroughly validates the models.list() API including:

  • Basic listing with limit
  • Forward pagination with after_id
  • Backward pagination with before_id
  • Response structure validation

The conditional logic properly handles cases where pagination isn't available.


650-723: LGTM! Thorough extended thinking validation.

The test properly validates the extended thinking feature with:

  • Structural checks for thinking blocks
  • Content quality validation via keyword matching
  • Separate validation of thinking vs regular content
  • Helpful debug output

The reasoning keyword list and threshold (≥2 keywords) provides good validation without being too strict.


725-823: LGTM! Comprehensive streaming thinking validation.

The test thoroughly validates extended thinking in streaming mode:

  • Properly distinguishes thinking_delta from text_delta events
  • Collects and validates both thinking and text content
  • Uses appropriate safety checks (1000 chunk limit)
  • Validates content quality with keyword matching

Note: The Ruff warning about line 762's f-string is a false positive—the message is intentionally simple for logging.

tests/integrations/tests/integrations/test_google.py (7)

185-191: LGTM! Good defensive improvements for image loading.

The updates properly handle common issues with external image URLs:

  • User-Agent header prevents 403 errors from servers that block bots
  • Timeout prevents hanging on slow connections
  • raise_for_status() ensures errors are caught early

These are essential for robust image URL handling.


218-227: LGTM! Necessary helper for PCM-to-WAV conversion.

This helper properly wraps raw PCM audio data from Google's speech API in WAV format for validation. The parameters (mono, 24kHz, 16-bit) are appropriate defaults for speech audio.


519-529: LGTM! More defensive streaming content extraction.

The updated logic handles Google GenAI's nested response structure more safely by checking for attributes at each level. While verbose, this prevents AttributeError when the response structure varies.

The fallback to direct chunk.text (lines 528-529) ensures compatibility with different response formats.


564-569: LGTM! Simple and sufficient list models test.

The test appropriately validates Google's models.list() with a page size limit. Unlike Anthropic's test, this doesn't test pagination, which is fine if Google's API doesn't support after_id/before_id style pagination.


571-717: LGTM! Comprehensive audio transcription test coverage.

The six transcription tests provide excellent coverage:

  • test_16: Basic transcription
  • test_17: With language and temperature parameters
  • test_18: With timestamp references
  • test_19: Inline audio data handling
  • test_20: Token counting validation
  • test_21: Different audio formats

The use of min_text_length=0 is appropriate since synthetic sine wave audio may not produce meaningful transcriptions.


719-920: LGTM! Thorough speech generation test coverage.

The five speech generation tests comprehensively validate Google's TTS API:

  • test_22: Single speaker basic TTS
  • test_23: Multi-speaker conversations
  • test_24: Different voice options
  • test_25: Language support
  • test_26: Style/tone control

The tests properly convert PCM to WAV for validation and include good quality checks (audio size, format validation). The voice and language loops are efficient while staying focused.


149-164: Lowercase type conversion aligns with JSON Schema requirements — no action needed.

The code correctly converts parameter types to lowercase. JSON Schema uses lowercase type values, and Google GenAI uses JSON Schema / OpenAPI-style declarations. The existing test fixtures (WEATHER_TOOL and CALCULATOR_TOOL) already define types in lowercase ("object", "string"), so the .lower() calls at lines 154 and 157 are defensive normalization measures that ensure compliance regardless of input format. The change is correct and necessary.

tests/integrations/tests/integrations/test_openai.py (3)

1110-1263: LGTM! Well-structured Responses API tests.

Tests 32-35 provide excellent coverage of the Responses API:

  • Proper response structure validation
  • Content quality checks with relevant keyword lists
  • Appropriate handling of different input types (text, system message, image, tools)
  • Function call validation includes proper argument checking

The keyword-based validation is a pragmatic approach for testing AI responses without being overly prescriptive.


1498-1562: LGTM! Good text completions coverage.

Tests 39-40 appropriately validate the text completions API:

  • test_39: Basic completion with quality checks
  • test_40: Streaming with haiku-appropriate validation

The use of the legacy gpt-3.5-turbo-instruct model is correct for text completions. The flexible content validation (tech keywords OR line breaks OR length) appropriately handles the creative nature of haiku generation.


202-202: Verify that the 300-second timeout is justified and necessary for production use.

The facts you raised are accurate: OpenAI client defaults to 300s (verified in test_openai.py:202) while Anthropic uses 120s (verified in test_anthropic.py:97). However, the codebase lacks documentation explaining why this more aggressive timeout is needed.

Key findings:

  • Streaming operations already explicitly pass timeout=300 to helper functions (e.g., lines 510, 532, 614 in test_openai.py)
  • The client-level 300s default may be redundant for streaming but intended for other operations
  • README configuration template shows the expected default as 30 seconds, but actual code uses 300s for OpenAI
  • No comments, docstrings, or issue references explain the rationale

Your concerns about hidden performance issues and cross-provider complexity are valid. Before merging, verify:

  1. Whether 300s is truly necessary for non-streaming operations
  2. If streaming already handles its own timeouts, whether client default matters
  3. Whether lower values (60-120s) were actually tested
  4. Add a comment explaining why OpenAI requires 5-minute timeouts while others use lower values

Comment thread tests/integrations/tests/test_anthropic.py
Comment thread tests/integrations/tests/test_openai.py
Comment thread tests/integrations/tests/utils/common.py
Comment thread tests/integrations/tests/utils/common.py
Comment thread transports/bifrost-http/server.go Outdated
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from b027c49 to 7150dd3 Compare November 3, 2025 06:49
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
tests/integrations/config.yml (2)

27-29: Align test API timeout with PR intent (30s → 120s).

PR summary states timeouts increased to 120s; config still sets 30s, overriding higher defaults in clients. Bump here to reduce flakiness.

-  timeout: 30 # seconds
+  timeout: 120 # seconds (raised for stability in streaming/tests)

362-364: Update Anthropic API version to enable thinking blocks support.

The version "2023-06-01" is too old for thinking blocks. Thinking blocks require post-May-14-2025 Messages API behavior. The tests at lines 651 and 726 in test_anthropic.py explicitly enable thinking, which will fail with the current version.

Update tests/integrations/config.yml line 363 to:

  anthropic:
    version: "2025-05-14"

This aligns with when thinking support became available and matches the model date (claude-sonnet-4-20250514) already configured in the same file.

tests/integrations/tests/utils/common.py (1)

856-860: Raise default streaming timeouts to 120s (match PR intent).

Default 30/60s are tight for cross‑provider streams. Bump to 120s; callers can still override.

-def collect_streaming_content(
-    stream, integration: str, timeout: int = 30
+def collect_streaming_content(
+    stream, integration: str, timeout: int = 120
@@
-def collect_streaming_speech_content(
-    stream, integration: str, timeout: int = 60
+def collect_streaming_speech_content(
+    stream, integration: str, timeout: int = 120
@@
-def collect_streaming_transcription_content(
-    stream, integration: str, timeout: int = 60
+def collect_streaming_transcription_content(
+    stream, integration: str, timeout: int = 120
@@
-def collect_responses_streaming_content(
-    stream, timeout: int = 30
+def collect_responses_streaming_content(
+    stream, timeout: int = 120
@@
-def collect_text_completion_streaming_content(
-    stream, timeout: int = 30
+def collect_text_completion_streaming_content(
+    stream, timeout: int = 120

Also applies to: 1356-1360, 1406-1410, 1598-1603, 1702-1706

tests/integrations/tests/integrations/test_anthropic.py (1)

134-145: Fix base64 data URL media_type.

media_type currently becomes "data:image/png;base64". Anthropic expects just "image/png". Parse header correctly.

-                        # Base64 image
-                        media_type, data = url.split(",", 1)
+                        # Base64 image
+                        header, data = url.split(",", 1)  # e.g., "data:image/png;base64,..."
+                        media_type = header.split(":", 1)[1].split(";", 1)[0]  # "image/png"
                         content.append(
                             {
                                 "type": "image",
                                 "source": {
                                     "type": "base64",
                                     "media_type": media_type,
                                     "data": data,
                                 },
                             }
                         )
♻️ Duplicate comments (5)
core/schemas/providers/gemini/chat.go (1)

428-471: Handle JSON unmarshaling errors in streaming tool calls.

Line 451 calls json.Unmarshal without checking the error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty, potentially causing incorrect tool call arguments.

Apply this diff to add error handling:

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Log and continue with empty args to avoid blocking the stream
+								continue
+							}
 						}
 						if toolCall.Function.Name != nil {
 							fc := &FunctionCall{
 								Name: *toolCall.Function.Name,
 								Args: argsMap,
 							}
 							if toolCall.ID != nil {
 								fc.ID = *toolCall.ID
 							}
 							parts = append(parts, &Part{FunctionCall: fc})
 						}
 					}
 				}
core/schemas/providers/anthropic/responses.go (1)

357-375: Add missing ContentIndex for tool_use OutputItemAdded.

The OutputItemAdded response for tool_use blocks does not set ContentIndex, which means downstream code at lines 534-536 will not populate streamResp.Index for function call events during Bifrost→Anthropic conversion.

Apply this diff to add the missing ContentIndex:

 return &schemas.BifrostResponsesStreamResponse{
     Type:           schemas.ResponsesStreamResponseTypeOutputItemAdded,
     SequenceNumber: sequenceNumber,
     OutputIndex:    schemas.Ptr(0),
+    ContentIndex:   chunk.Index,
     Item:           item,
 }, nil, false

Note: This issue was previously flagged and marked as addressed in commits 4b86a8b to d1ad6c4, but the fix does not appear in the current code.

transports/bifrost-http/handlers/server.go (1)

203-203: Fix: pass logger to maxim.Init (compile-time mismatch).

plugins/maxim/main.go still defines Init(config *Config, logger schemas.Logger) (schemas.Plugin, error). Calling it with a single argument will not compile. Align with other plugins and pass logger.

-        plugin, err := maxim.Init(maximConfig)
+        plugin, err := maxim.Init(maximConfig, logger)

Reference: plugins/maxim/main.go Lines 61-91 (Init requires logger).

tests/integrations/tests/utils/common.py (1)

1744-1751: Harden get_content_string for SDK objects.

List items can be SDK objects with .text; current code calls .get unconditionally and can crash.

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""
tests/integrations/tests/integrations/test_openai.py (1)

1460-1460: Replace ambiguous Unicode characters (previously flagged).

These lines still contain the ambiguous Unicode character (INFORMATION SOURCE) that was flagged in a previous review.

Apply these fixes:

Line 1460:

-                print("ℹ Reasoning may be integrated in regular message content")
+                print("Info: Reasoning may be integrated in regular message content")

Line 1478:

-                print(f"ℹ Model {model_to_use} may not fully support reasoning parameters")
+                print(f"Info: Model {model_to_use} may not fully support reasoning parameters")

Also applies to: 1478-1478

🧹 Nitpick comments (8)
transports/bifrost-http/integrations/genai.go (1)

164-168: Consider optimizing redundant speech detection call.

The current implementation calls isSpeechRequest twice: once at line 167 to set the flag, and again inside isTranscriptionRequest at line 199. Since IsSpeech is already set before isTranscriptionRequest is called, you could optimize line 199 to check req.IsSpeech instead of re-evaluating isSpeechRequest(req).

Tradeoff: The current approach makes isTranscriptionRequest self-contained and order-independent, which is more robust. The optimization would create coupling between the two functions and require careful ordering. Given the negligible performance impact, the current design may be preferable for maintainability.

Apply this diff if you prefer the optimization:

 func isTranscriptionRequest(req *gemini.GeminiGenerationRequest) bool {
-	// If this is already detected as a speech request, it's not transcription
-	// This handles the edge case of bidirectional audio (input + output)
-	if isSpeechRequest(req) {
-		return false
-	}
+	// If this is already detected as a speech request, it's not transcription
+	// This handles the edge case of bidirectional audio (input + output)
+	if req.IsSpeech {
+		return false
+	}

Note: This optimization requires that IsSpeech is set before isTranscriptionRequest is called, which is currently the case at line 167.

Also applies to: 194-219

tests/integrations/config.yml (1)

346-350: Test timeouts: consider raising simple/complex to 60/120s.

Streaming/tests now take longer; increase these to match the raised API timeout to avoid intermittent failures.

-    simple: 30 # seconds
-    complex: 60 # seconds
+    simple: 60 # seconds
+    complex: 120 # seconds

Also applies to: 381-387, 393-396

tests/integrations/tests/utils/common.py (1)

1656-1670: Remove unused variable to satisfy linter.

valid_event_types is defined but unused. Delete it or use it for validation.

-    # Validate common streaming event types
-    valid_event_types = [
-        "response.created",
-        "response.output_item.added",
-        "response.content_part.added",
-        "response.output_text.delta",
-        "response.function_call_arguments.delta",
-        "response.completed",
-        "response.error",
-    ]
-
-    # Log the event type for debugging
+    # Log the event type for debugging
tests/integrations/tests/utils/parametrize.py (1)

18-19: PEP 604 union for optional params.

Modernize type hints per Ruff RUF013.

-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
tests/integrations/tests/integrations/test_anthropic.py (2)

606-608: Unused variable: prefix with underscore.

content_tools isn’t used; avoid warning.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
                 )

97-99: Timeout source still 30s via config.

Client uses api_config.get("timeout", 120), but config.yml sets 30s. Either bump the config (preferred) or enforce a minimum here.

Example:

-        "timeout": api_config.get("timeout", 120),
+        "timeout": max(120, int(api_config.get("timeout", 0) or 0)),

Or update tests/integrations/config.yml as suggested.

tests/integrations/tests/integrations/test_openai.py (2)

1099-1104: Still missing deterministic assertions for consistency.

While the @skip_if_no_api_key("openai") decorator has been added (addressing part of the previous review), the test still differs from the Anthropic and Google equivalents in terms of determinism:

  • Anthropic (uses limit=5 → asserts len == 5)
  • Google (uses page_size: 5 → asserts len == 5)
  • OpenAI (no limit → asserts len > 0)

Per the previous review's web search, the OpenAI Python SDK does support the limit parameter for models.list() (valid range 1–100, default 20).

Apply this diff for consistency:

 @skip_if_no_api_key("openai")
 def test_31_list_models(self, openai_client, test_config):
     """Test Case 31: List models"""
-    response = openai_client.models.list()
+    response = openai_client.models.list(limit=5)
     assert response.data is not None
-    assert len(response.data) > 0
+    assert len(response.data) == 5

531-533: Prefix unused unpacked variables with underscore.

Two unpacked variables are never used. Following Python convention, prefix them with _ to indicate intentional discard.

Line 531:

-        content_tools, chunk_count_tools, tool_calls_detected_tools = (
+        _content_tools, chunk_count_tools, tool_calls_detected_tools = (
             collect_streaming_content(stream_with_tools, "openai", timeout=300)
         )

Line 1336:

-        content, chunk_count, tool_calls_detected, event_types = (
+        _content, chunk_count, tool_calls_detected, event_types = (
             collect_responses_streaming_content(stream, timeout=300)
         )

Also applies to: 1336-1338

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b027c49 and 7150dd3.

📒 Files selected for processing (19)
  • core/schemas/providers/anthropic/responses.go (3 hunks)
  • core/schemas/providers/cohere/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (4 hunks)
  • core/schemas/providers/gemini/speech.go (1 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/types.go (3 hunks)
  • core/schemas/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (3 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (2 hunks)
  • tests/integrations/tests/integrations/test_openai.py (14 hunks)
  • tests/integrations/tests/utils/common.py (10 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • tests/integrations/tests/utils/provider_capabilities.py (1 hunks)
  • tests/integrations/tests/utils/provider_test_scenarios.py (1 hunks)
  • transports/bifrost-http/handlers/server.go (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/utils.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • core/schemas/providers/cohere/responses.go
  • transports/bifrost-http/integrations/utils.go
  • core/schemas/providers/gemini/speech.go
  • tests/integrations/tests/integrations/test_litellm.py
🧰 Additional context used
🧬 Code graph analysis (11)
transports/bifrost-http/handlers/server.go (1)
plugins/maxim/main.go (1)
  • Init (62-92)
core/schemas/providers/gemini/transcription.go (5)
core/schemas/providers/gemini/types.go (2)
  • GeminiGenerationRequest (54-69)
  • FileData (1031-1039)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-9)
  • TranscriptionParameters (27-36)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/models.go (1)
  • Model (47-66)
core/schemas/bifrost.go (2)
  • Gemini (47-47)
  • Vertex (40-40)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (6)
  • AnthropicThinking (52-55)
  • AnthropicContentBlock (134-145)
  • AnthropicStreamEventTypeContentBlockStart (297-297)
  • AnthropicContentBlockTypeToolUse (124-124)
  • AnthropicStreamEventTypeMessageStart (295-295)
  • AnthropicMessageResponse (259-268)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (272-272)
  • ResponsesToolMessage (435-455)
  • ResponsesInputMessageRoleAssistant (308-308)
core/schemas/providers/gemini/utils.go (1)
core/schemas/providers/gemini/types.go (1)
  • Type (774-774)
tests/integrations/tests/integrations/test_anthropic.py (4)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/common.py (4)
  • assert_valid_chat_response (505-531)
  • assert_valid_image_response (551-600)
  • collect_streaming_content (856-941)
  • skip_if_no_api_key (1491-1502)
transports/bifrost-http/integrations/genai.go (8)
core/schemas/bifrost.go (3)
  • BifrostRequest (130-140)
  • SpeechRequest (90-90)
  • TranscriptionRequest (92-92)
transports/bifrost-http/handlers/inference.go (2)
  • SpeechRequest (208-212)
  • TranscriptionRequest (214-218)
transports/bifrost-http/integrations/utils.go (2)
  • SpeechResponseConverter (128-128)
  • TranscriptionResponseConverter (130-130)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (17-21)
core/schemas/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (139-162)
core/schemas/transcriptions.go (1)
  • BifrostTranscriptionResponse (11-21)
core/schemas/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/schemas/providers/gemini/types.go (5)
  • GeminiGenerationRequest (54-69)
  • GenerationConfig (627-693)
  • ModalityAudio (712-712)
  • SpeechConfig (843-852)
  • FileData (1031-1039)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (3)
  • ChatAssistantMessageToolCall (473-477)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (873-881)
  • Part (887-911)
  • FunctionCall (1042-1052)
tests/integrations/tests/integrations/test_google.py (4)
tests/integrations/tests/utils/common.py (6)
  • assert_valid_transcription_response (1075-1104)
  • assert_valid_streaming_transcription_response (1310-1353)
  • assert_valid_speech_response (1023-1072)
  • collect_streaming_content (856-941)
  • collect_streaming_transcription_content (1406-1467)
  • generate_test_audio (970-1000)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
tests/integrations/tests/utils/provider_capabilities.py (3)
  • get_model_for_provider (66-81)
  • is_provider_available (84-99)
  • get_available_providers (102-114)
tests/integrations/tests/integrations/test_openai.py (4)
tests/integrations/tests/utils/common.py (11)
  • get_content_string (1744-1751)
  • convert_to_responses_tools (1506-1517)
  • assert_valid_responses_response (1520-1572)
  • collect_responses_streaming_content (1598-1652)
  • assert_valid_responses_streaming_chunk (1655-1676)
  • assert_valid_text_completion_response (1680-1699)
  • collect_text_completion_streaming_content (1702-1742)
  • assert_valid_chat_response (505-531)
  • assert_valid_image_response (551-600)
  • collect_streaming_content (856-941)
  • skip_if_no_api_key (1491-1502)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/provider_test_scenarios.py (2)
  • get_providers_for_scenario (258-279)
  • get_model_for_scenario (282-300)
🪛 Ruff (0.14.2)
tests/integrations/tests/integrations/test_anthropic.py

186-186: Unused method argument: test_config

(ARG002)


200-200: Unused method argument: test_config

(ARG002)


217-217: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


262-262: Unused method argument: test_config

(ARG002)


323-323: Unused method argument: test_config

(ARG002)


341-341: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


574-574: Unused method argument: test_config

(ARG002)


606-606: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


615-615: Unused method argument: test_config

(ARG002)


651-651: Unused method argument: test_config

(ARG002)


726-726: Unused method argument: test_config

(ARG002)


762-762: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/utils/common.py

1624-1624: Avoid specifying long messages outside the exception class

(TRY003)


1646-1648: Avoid specifying long messages outside the exception class

(TRY003)


1661-1661: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1726-1726: Avoid specifying long messages outside the exception class

(TRY003)


1736-1738: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/integrations/test_google.py

234-234: Unused method argument: test_config

(ARG002)


248-248: Unused method argument: test_config

(ARG002)


269-269: Unused method argument: test_config

(ARG002)


292-292: Unused method argument: test_config

(ARG002)


317-317: Unused method argument: test_config

(ARG002)


352-352: Unused method argument: test_config

(ARG002)


372-372: Unused method argument: test_config

(ARG002)


385-385: Unused method argument: test_config

(ARG002)


396-396: Unused method argument: test_config

(ARG002)


414-414: Unused method argument: test_config

(ARG002)


506-506: Unused method argument: test_config

(ARG002)


554-554: Unused method argument: test_config

(ARG002)


567-567: Unused method argument: test_config

(ARG002)


574-574: Unused method argument: test_config

(ARG002)


598-598: Unused method argument: test_config

(ARG002)


624-624: Unused method argument: test_config

(ARG002)


648-648: Unused method argument: test_config

(ARG002)


671-671: Unused method argument: test_config

(ARG002)


699-699: Unused method argument: test_config

(ARG002)


722-722: Unused method argument: test_config

(ARG002)


755-755: Unused method argument: test_config

(ARG002)


809-809: Unused method argument: test_config

(ARG002)


853-853: Unused method argument: test_config

(ARG002)


890-890: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

225-225: Unused method argument: test_config

(ARG002)


238-238: Unused method argument: test_config

(ARG002)


255-255: Unused method argument: test_config

(ARG002)


270-270: Unused method argument: test_config

(ARG002)


289-289: Unused method argument: test_config

(ARG002)


331-331: Unused method argument: test_config

(ARG002)


347-347: Unused method argument: test_config

(ARG002)


358-358: Unused method argument: test_config

(ARG002)


369-369: Unused method argument: test_config

(ARG002)


499-499: Unused method argument: test_config

(ARG002)


531-531: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1100-1100: Unused method argument: test_config

(ARG002)


1111-1111: Unused method argument: test_config

(ARG002)


1144-1144: Unused method argument: test_config

(ARG002)


1174-1174: Unused method argument: test_config

(ARG002)


1216-1216: Unused method argument: test_config

(ARG002)


1266-1266: Unused method argument: test_config

(ARG002)


1266-1266: Unused method argument: provider

(ARG002)


1266-1266: Unused method argument: model

(ARG002)


1318-1318: Unused method argument: test_config

(ARG002)


1336-1336: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1356-1356: Unused method argument: test_config

(ARG002)


1499-1499: Unused method argument: test_config

(ARG002)


1520-1520: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/parametrize.py

18-18: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (20)
transports/bifrost-http/integrations/genai.go (3)

39-46: LGTM! Request routing logic is consistent.

The branching logic for speech and transcription requests follows the same pattern as the existing embedding handling. The structure is clear and maintainable.


61-66: LGTM! Response converters follow established patterns.

The speech and transcription response converters are consistent with existing converter implementations.


176-192: LGTM! Speech detection logic is clear and robust.

The function correctly identifies speech requests by checking for audio response modalities or speech configuration. The implementation is straightforward with no nil pointer risks.

core/schemas/providers/gemini/utils.go (3)

61-83: LGTM! Type field lowercasing is correctly implemented.

The Type field is now consistently lowercased using strings.ToLower, which ensures compatibility with APIs that expect lowercase type values.


85-104: LGTM! Safe type assertion with proper fallback.

The function safely handles the type assertion failure by returning an empty map, preventing potential panics.


106-137: LGTM! Recursive type lowercasing is well-implemented.

The function safely handles nested structures (maps and slices) and correctly lowercases all "type" fields while preserving the rest of the schema structure.

core/schemas/providers/gemini/types.go (1)

67-68: LGTM! Request routing flags appropriately added.

The IsTranscription and IsSpeech flags enable proper routing of requests to transcription and speech pathways.

core/schemas/providers/gemini/chat.go (1)

31-32: LGTM! Tool call correlation logic correctly improved.

The renaming to previousToolCalls (line 32) and tracking accumulated tool calls (line 191) properly enables function response correlation across multiple assistant messages. The fallback logic (lines 103-111) ensures responses can be matched even when IDs are missing.

Also applies to: 103-111, 190-191

core/schemas/providers/anthropic/responses.go (2)

165-167: LGTM: Type assertion optimization.

The code correctly handles both *AnthropicThinking pointer and map-based thinking parameters, avoiding unnecessary re-parsing when the value is already the correct type.


300-301: LGTM: Explicit empty slice initialization.

Initializing Content to an empty slice rather than leaving it nil is good practice, ensuring consistent JSON serialization and distinguishing between "no content" and "uninitialized."

tests/integrations/tests/utils/provider_capabilities.py (1)

24-29: Model names: please verify Anthropic identifiers.

Entries like "claude-3-7-sonnet-20250219" may have changed naming. Ensure these map to actual deployable models in your environment to avoid test skips.

tests/integrations/tests/utils/common.py (3)

794-797: Good: ignore empty deltas in OpenAI streaming.

Prevents false negatives with benign empty chunks.


911-913: Good: detect Anthropic input_json_delta as tool signal.

Marks tool activity without polluting user-visible content.


1505-1516: Confirm Responses tool shape is correct (flat function tool).

The flat shape {"type":"function","name":...,"parameters":...} matches this repo’s Responses schema and provider converters. No change needed.

If you later switch to Chat Completions tools, that API uses the nested {"type":"function","function":{...}} form.

tests/integrations/tests/utils/provider_test_scenarios.py (1)

120-134: Verify: Gemini image_url set to False intentionally?

Google/Gemini generally accepts URL-sourced images (or can fetch after client loads). If this False is a routing constraint in Bifrost, keep it; otherwise consider enabling to widen coverage.

tests/integrations/tests/integrations/test_anthropic.py (1)

731-739: Model id for thinking streaming: please verify.

"anthropic/claude-sonnet-4-5" may not exist in your catalog. Consider sourcing from config (e.g., get_model(...)) or confirm the exact thinking-capable identifier.

tests/integrations/tests/integrations/test_google.py (1)

186-191: Good: fetch image with UA + timeout to avoid 403s.

Reduces intermittent Wikipedia/host rejections.

tests/integrations/tests/integrations/test_openai.py (3)

1-142: LGTM! Well-structured cross-provider test suite.

The updated documentation and imports appropriately reflect the cross-provider testing framework. The import organization is clean and the new test cases are well-documented in the file header.


224-540: Excellent cross-provider parametrization implementation.

The refactoring to use @pytest.mark.parametrize with get_cross_provider_params_for_scenario is well-executed:

  • Automatic provider/model selection based on capabilities
  • Proper use of format_provider_model() for model names
  • Defensive programming in test_13_streaming (lines 519-539) to conditionally test tool calls based on provider support

The static analysis warnings about unused test_config parameters are false positives—this is the standard pytest fixture pattern.


202-202: Review comment is based on unsupported assumptions and is incorrect.

The code consistently shows timeout=300 across all mentioned lines (202, 510, 532, 614, 1277, 1337, 1532), and the commit message makes no mention of changing timeout values from 30s to 120s. The review comment's foundational claim about what "the PR description states" cannot be verified and contradicts the actual code implementation. The 300-second timeout values appear to be stable test configuration, not a recent change inconsistent with PR intent.

Likely an incorrect or invalid review comment.

Comment thread core/providers/anthropic/responses.go Outdated
Comment thread core/providers/gemini/transcription.go
Comment thread core/providers/gemini/types.go
Comment thread tests/integrations/tests/test_google.py
Comment thread tests/integrations/tests/test_openai.py
Comment thread tests/integrations/tests/integrations/test_openai.py Outdated
Comment thread tests/integrations/tests/utils/provider_capabilities.py Outdated
Comment thread transports/bifrost-http/integrations/genai.go
@TejasGhatte TejasGhatte force-pushed the 10-23-fix_integration_test_cases branch from 7150dd3 to d2172e0 Compare November 3, 2025 10:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
core/schemas/providers/gemini/chat.go (1)

447-452: Handle JSON parsing errors for streaming tool calls

We’re still ignoring the error from json.Unmarshal, so malformed arguments just disappear silently and the tool call is sent upstream with an empty args map. Let’s surface the failure (or at least retain the raw payload) instead of swallowing it.

-           if toolCall.Function.Arguments != "" {
-               json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
-           }
+           if toolCall.Function.Arguments != "" {
+               if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+                   argsMap["_raw"] = toolCall.Function.Arguments
+               }
+           }
core/schemas/providers/gemini/transcription.go (1)

50-67: FileData audio references remain unused in transcription requests.

This is the same issue flagged in the previous review: when only FileData is provided (lines 50-61), the file URI is stored in ExtraParams["file_uri"] but Input.File remains an empty byte slice (line 67). The downstream ToGeminiTranscriptionRequest() function (line 159) only adds audio to the Gemini API request when Input.File contains bytes, so FileData-only transcription requests would send no audio data to Gemini.

Either:

  • Add logic to fetch FileData URIs and populate Input.File before calling ToGeminiTranscriptionRequest(), or
  • Update ToGeminiTranscriptionRequest() to retrieve and handle file_uri from ExtraParams, constructing a Gemini FileData part instead of InlineData.

Note: The Gemini API does support FileData parts in requests, so the second approach is viable.

tests/integrations/tests/utils/provider_capabilities.py (1)

57-63: Incorrect environment variable detection for Gemini and Bedrock.

The past review identified that:

  1. Gemini (line 61): The code expects GEMINI_API_KEY, but the rest of the test suite uses GOOGLE_API_KEY. This mismatch will cause Gemini tests to be incorrectly skipped even when the API key is configured.

  2. Bedrock (line 62): The code expects a single BEDROCK_API_KEY, but Bedrock authentication uses AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) and region configuration (AWS_REGION or AWS_DEFAULT_REGION).

These issues will cause providers to appear unavailable when they are actually configured, leading to skipped tests.

Update the environment variable mapping and detection logic:

 PROVIDER_API_KEY_ENV_VARS: Dict[str, str] = {
     "openai": "OPENAI_API_KEY",
     "anthropic": "ANTHROPIC_API_KEY",
     "cohere": "COHERE_API_KEY",
-    "gemini": "GEMINI_API_KEY",
-    "bedrock": "BEDROCK_API_KEY",
+    "gemini": "GOOGLE_API_KEY",  # Primary key; fallback checked in is_provider_available
+    "bedrock": "",  # Uses AWS credentials; handled specially in is_provider_available
 }
 def is_provider_available(provider: str) -> bool:
     """
     Check if a provider is available (has API key in environment).
     
     Args:
         provider: Provider name
     
     Returns:
         True if provider's API key is set in environment
     """
+    # Special cases
+    if provider == "gemini":
+        # Check both GOOGLE_API_KEY (primary) and GEMINI_API_KEY (fallback)
+        return any((os.getenv("GOOGLE_API_KEY"), os.getenv("GEMINI_API_KEY")))
+    
+    if provider == "bedrock":
+        # Bedrock requires region and either AWS profile or access keys
+        has_region = any((os.getenv("AWS_REGION"), os.getenv("AWS_DEFAULT_REGION")))
+        has_auth = any((os.getenv("AWS_PROFILE"), os.getenv("AWS_ACCESS_KEY_ID")))
+        return has_region and has_auth
+    
+    # Default single-var check
     env_var = PROVIDER_API_KEY_ENV_VARS.get(provider)
     if not env_var:
         return False
     
     api_key = os.getenv(env_var)
     return api_key is not None and api_key.strip() != ""

Based on learnings

Also applies to: 94-100

tests/integrations/tests/integrations/test_openai.py (1)

1099-1104: Add decorator and use limit for consistency.

Based on the past review discussion, this test is missing the @skip_if_no_api_key("openai") decorator and should use a limit parameter for consistency with the Anthropic (line 618) and Google (line 571) equivalents.

Apply this diff to align with other providers:

+    @skip_if_no_api_key("openai")
     def test_31_list_models(self, openai_client, test_config):
         """Test Case 31: List models"""
-        response = openai_client.models.list()
+        response = openai_client.models.list(limit=5)
         assert response.data is not None
-        assert len(response.data) > 0
+        assert len(response.data) <= 5  # May return fewer models
+        assert len(response.data) > 0  # But at least one

Note: The OpenAI Python SDK does support the limit parameter (range 1-100, default 20) for models.list().

tests/integrations/tests/utils/common.py (1)

1807-1814: Handle SDK objects in get_content_string.

The list handling in get_content_string assumes all items are dictionaries and calls .get("text", ""), which raises AttributeError when content items are SDK objects (e.g., OpenAI's ChatCompletionMessageContentPartText). This mirrors the issue already resolved in lines 575-577.

Apply this diff to handle both dicts and SDK objects:

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""
🧹 Nitpick comments (4)
tests/integrations/tests/utils/parametrize.py (1)

16-20: Consider making Optional types explicit.

Lines 18-19 use implicit Optional (parameters that default to None without explicit T | None type hints). While the code works correctly, explicit typing improves clarity and aligns with PEP 484 recommendations.

Apply this diff to make the types explicit:

 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:

Based on static analysis

tests/integrations/tests/integrations/test_anthropic.py (1)

606-612: Optional: Prefix unused variable with underscore.

The content_tools variable is unpacked but never used. Consider prefixing with _ to indicate it's intentionally unused, following Python conventions.

Apply this diff:

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
                 )
tests/integrations/tests/integrations/test_openai.py (1)

533-541: Optional: Prefix unused variable with underscore.

The content_tools variable is unpacked but never used. Consider prefixing with _ to indicate it's intentionally unused.

Apply this diff:

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "openai", timeout=300)
                 )
tests/integrations/tests/utils/common.py (1)

1718-1740: Remove unused valid_event_types variable.

The valid_event_types list is defined but never used in validation. Either use it to validate event types or remove it to reduce clutter.

Apply this diff to remove the unused variable:

 def assert_valid_responses_streaming_chunk(chunk: Any):
     """Assert that a responses streaming chunk is valid"""
     assert chunk is not None, "Streaming chunk should not be None"
     assert hasattr(chunk, "type"), "Chunk should have a 'type' attribute"
 
-    # Validate common streaming event types
-    valid_event_types = [
-        "response.created",
-        "response.output_item.added",
-        "response.content_part.added",
-        "response.output_text.delta",
-        "response.function_call_arguments.delta",
-        "response.completed",
-        "response.error",
-    ]
-
     # Log the event type for debugging
     if hasattr(chunk, "type"):
         event_type = chunk.type
         # Don't fail on unknown event types, just warn
         if not any(evt in event_type for evt in ["response.", "error"]):
             print(f"Warning: Unexpected event type: {event_type}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7150dd3 and d2172e0.

📒 Files selected for processing (19)
  • core/schemas/providers/anthropic/responses.go (5 hunks)
  • core/schemas/providers/cohere/responses.go (2 hunks)
  • core/schemas/providers/gemini/chat.go (4 hunks)
  • core/schemas/providers/gemini/speech.go (2 hunks)
  • core/schemas/providers/gemini/transcription.go (1 hunks)
  • core/schemas/providers/gemini/types.go (3 hunks)
  • core/schemas/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (3 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (15 hunks)
  • tests/integrations/tests/integrations/test_openai.py (21 hunks)
  • tests/integrations/tests/utils/common.py (11 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • tests/integrations/tests/utils/provider_capabilities.py (1 hunks)
  • tests/integrations/tests/utils/provider_test_scenarios.py (1 hunks)
  • transports/bifrost-http/handlers/server.go (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/utils.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • transports/bifrost-http/integrations/utils.go
🧰 Additional context used
🧬 Code graph analysis (14)
core/schemas/providers/anthropic/responses.go (2)
core/schemas/providers/anthropic/types.go (6)
  • AnthropicThinking (52-55)
  • AnthropicContentBlock (134-145)
  • AnthropicStreamEventTypeContentBlockStart (297-297)
  • AnthropicContentBlockTypeToolUse (124-124)
  • AnthropicStreamEventTypeMessageStart (295-295)
  • AnthropicMessageResponse (259-268)
core/schemas/responses.go (3)
  • ResponsesMessageTypeFunctionCall (272-272)
  • ResponsesToolMessage (435-455)
  • ResponsesInputMessageRoleAssistant (308-308)
core/schemas/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (3)
  • ChatAssistantMessageToolCall (473-477)
  • ChatStreamResponseChoice (518-520)
  • ChatNonStreamResponseChoice (512-515)
core/schemas/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (873-881)
  • Part (887-911)
  • FunctionCall (1042-1052)
transports/bifrost-http/handlers/server.go (1)
plugins/maxim/main.go (1)
  • Init (62-92)
transports/bifrost-http/integrations/genai.go (8)
core/schemas/bifrost.go (3)
  • BifrostRequest (130-140)
  • SpeechRequest (90-90)
  • TranscriptionRequest (92-92)
transports/bifrost-http/handlers/inference.go (2)
  • SpeechRequest (208-212)
  • TranscriptionRequest (214-218)
transports/bifrost-http/integrations/utils.go (2)
  • SpeechResponseConverter (128-128)
  • TranscriptionResponseConverter (130-130)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (17-21)
core/schemas/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (163-186)
core/schemas/transcriptions.go (1)
  • BifrostTranscriptionResponse (11-21)
core/schemas/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/schemas/providers/gemini/types.go (5)
  • GeminiGenerationRequest (54-69)
  • GenerationConfig (627-693)
  • ModalityAudio (712-712)
  • SpeechConfig (843-852)
  • FileData (1031-1039)
core/schemas/providers/gemini/speech.go (4)
core/schemas/providers/gemini/types.go (6)
  • GeminiGenerationRequest (54-69)
  • GenerationConfig (627-693)
  • SpeechConfig (843-852)
  • VoiceConfig (822-825)
  • PrebuiltVoiceConfig (816-819)
  • MultiSpeakerVoiceConfig (837-840)
core/schemas/speech.go (3)
  • BifrostSpeechRequest (9-15)
  • SpeechParameters (28-37)
  • SpeechVoiceInput (39-42)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (47-47)
  • Vertex (40-40)
core/schemas/providers/cohere/responses.go (1)
core/schemas/responses.go (1)
  • ResponsesMessageTypeFunctionCall (272-272)
tests/integrations/tests/integrations/test_anthropic.py (4)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/common.py (4)
  • assert_valid_chat_response (505-531)
  • assert_valid_image_response (551-600)
  • collect_streaming_content (856-941)
  • skip_if_no_api_key (1554-1565)
core/schemas/providers/gemini/utils.go (1)
core/schemas/providers/gemini/types.go (1)
  • Type (774-774)
tests/integrations/tests/integrations/test_litellm.py (2)
tests/integrations/tests/utils/common.py (3)
  • get_provider_voice (969-1005)
  • get_provider_voices (1008-1029)
  • collect_streaming_content (856-941)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/integrations/test_openai.py (4)
tests/integrations/tests/utils/common.py (8)
  • get_content_string (1807-1814)
  • get_provider_voice (969-1005)
  • get_provider_voices (1008-1029)
  • convert_to_responses_tools (1569-1580)
  • assert_valid_responses_response (1583-1635)
  • collect_responses_streaming_content (1661-1715)
  • collect_streaming_content (856-941)
  • skip_if_no_api_key (1554-1565)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
tests/integrations/tests/utils/provider_capabilities.py (3)
  • get_model_for_provider (66-81)
  • is_provider_available (84-99)
  • get_available_providers (102-114)
tests/integrations/tests/integrations/test_google.py (4)
tests/integrations/tests/utils/common.py (12)
  • assert_valid_transcription_response (1138-1167)
  • assert_valid_streaming_transcription_response (1373-1416)
  • assert_valid_speech_response (1086-1135)
  • collect_streaming_content (856-941)
  • collect_streaming_transcription_content (1469-1530)
  • generate_test_audio (1033-1063)
  • get_provider_voice (969-1005)
  • get_provider_voices (1008-1029)
  • get_api_key (1534-1551)
  • skip_if_no_api_key (1554-1565)
  • assert_valid_chat_response (505-531)
  • assert_valid_image_response (551-600)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (16-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/provider_capabilities.py (1)
  • get_model_for_provider (66-81)
tests/integrations/tests/utils/provider_test_scenarios.py (1)
  • provider_supports_scenario (240-255)
core/schemas/providers/gemini/transcription.go (4)
core/schemas/providers/gemini/types.go (2)
  • GeminiGenerationRequest (54-69)
  • FileData (1031-1039)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-9)
  • TranscriptionParameters (27-36)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (47-47)
  • Vertex (40-40)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/provider_test_scenarios.py (2)
  • get_providers_for_scenario (258-279)
  • get_model_for_scenario (282-300)
🪛 Ruff (0.14.2)
tests/integrations/tests/integrations/test_anthropic.py

186-186: Unused method argument: test_config

(ARG002)


200-200: Unused method argument: test_config

(ARG002)


217-217: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


262-262: Unused method argument: test_config

(ARG002)


323-323: Unused method argument: test_config

(ARG002)


341-341: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


574-574: Unused method argument: test_config

(ARG002)


606-606: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


615-615: Unused method argument: test_config

(ARG002)


651-651: Unused method argument: test_config

(ARG002)


726-726: Unused method argument: test_config

(ARG002)


762-762: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_litellm.py

146-146: Unused method argument: test_config

(ARG002)


146-146: Unused method argument: provider

(ARG002)


160-160: Unused method argument: test_config

(ARG002)


160-160: Unused method argument: provider

(ARG002)


177-177: Unused method argument: test_config

(ARG002)


177-177: Unused method argument: provider

(ARG002)


194-194: Unused method argument: test_config

(ARG002)


194-194: Unused method argument: provider

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: provider

(ARG002)


254-254: Unused method argument: test_config

(ARG002)


254-254: Unused method argument: provider

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: provider

(ARG002)


283-283: Unused method argument: test_config

(ARG002)


283-283: Unused method argument: provider

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: provider

(ARG002)


310-310: Unused method argument: test_config

(ARG002)


310-310: Unused method argument: provider

(ARG002)


422-422: Unused method argument: test_config

(ARG002)


422-422: Unused method argument: provider

(ARG002)

tests/integrations/tests/integrations/test_openai.py

227-227: Unused method argument: test_config

(ARG002)


240-240: Unused method argument: test_config

(ARG002)


257-257: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


291-291: Unused method argument: test_config

(ARG002)


333-333: Unused method argument: test_config

(ARG002)


349-349: Unused method argument: test_config

(ARG002)


360-360: Unused method argument: test_config

(ARG002)


371-371: Unused method argument: test_config

(ARG002)


501-501: Unused method argument: test_config

(ARG002)


533-533: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


544-544: Unused method argument: test_config

(ARG002)


1100-1100: Unused method argument: test_config

(ARG002)


1111-1111: Unused method argument: test_config

(ARG002)


1144-1144: Unused method argument: test_config

(ARG002)


1174-1174: Unused method argument: test_config

(ARG002)


1216-1216: Unused method argument: test_config

(ARG002)


1266-1266: Unused method argument: test_config

(ARG002)


1318-1318: Unused method argument: test_config

(ARG002)


1336-1336: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1356-1356: Unused method argument: test_config

(ARG002)


1499-1499: Unused method argument: test_config

(ARG002)


1520-1520: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/common.py

1687-1687: Avoid specifying long messages outside the exception class

(TRY003)


1709-1711: Avoid specifying long messages outside the exception class

(TRY003)


1724-1724: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1789-1789: Avoid specifying long messages outside the exception class

(TRY003)


1799-1801: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/integrations/test_google.py

236-236: Unused method argument: test_config

(ARG002)


250-250: Unused method argument: test_config

(ARG002)


271-271: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


319-319: Unused method argument: test_config

(ARG002)


354-354: Unused method argument: test_config

(ARG002)


374-374: Unused method argument: test_config

(ARG002)


387-387: Unused method argument: test_config

(ARG002)


398-398: Unused method argument: test_config

(ARG002)


416-416: Unused method argument: test_config

(ARG002)


508-508: Unused method argument: test_config

(ARG002)


556-556: Unused method argument: test_config

(ARG002)


569-569: Unused method argument: test_config

(ARG002)


576-576: Unused method argument: test_config

(ARG002)


600-600: Unused method argument: test_config

(ARG002)


626-626: Unused method argument: test_config

(ARG002)


650-650: Unused method argument: test_config

(ARG002)


674-674: Unused method argument: test_config

(ARG002)


697-697: Unused method argument: test_config

(ARG002)


729-729: Unused method argument: test_config

(ARG002)


783-783: Unused method argument: test_config

(ARG002)


827-827: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/parametrize.py

18-18: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


19-19: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🔇 Additional comments (23)
core/schemas/providers/anthropic/responses.go (2)

374-374: LGTM! ContentIndex properly set for tool_use blocks.

The addition of ContentIndex: chunk.Index at lines 374 and 400 correctly addresses the inconsistency flagged in the previous review. Tool_use and MCP tool_use blocks now track content index consistently with text content blocks, ensuring downstream consumers can correlate tool_use events across the response stream.

Also applies to: 400-400


165-179: Good defensive handling for thinking parameter type variants.

The code now safely handles both *AnthropicThinking pointer types (line 165-166) and map[string]interface{} representations (line 167-178), improving robustness when the thinking parameter comes from different sources or serialization paths.

tests/integrations/config.yml (1)

54-54: LGTM! Configuration updated for Gemini 2.5 models.

The model updates correctly introduce Gemini 2.5 transcription (gemini-2.5-flash, gemini-2.5-pro) and TTS (gemini-2.5-flash-preview-tts, gemini-2.5-pro-preview-tts) models with appropriate capability flags, context windows, and audio specifications. The Anthropic model upgrade to Claude Sonnet 4 is also properly reflected.

Also applies to: 68-78, 263-316

transports/bifrost-http/integrations/genai.go (2)

39-46: LGTM! Speech and transcription request detection properly implemented.

The new branches correctly route speech (lines 39-42) and transcription (lines 43-46) requests through their respective converters (lines 61-66). The detection logic (lines 164-169, 176-219) appropriately identifies speech requests via responseModalities containing AUDIO or presence of speechConfig, and transcription requests via audio input in InlineData or FileData, with proper precedence (speech detection takes priority to avoid misclassifying bidirectional audio scenarios).

Also applies to: 61-66, 164-169, 176-219


221-242: Bare MIME type checks have been successfully removed and implementation is correct.

The verification confirms that the previous review concern has been fully addressed. The isAudioMimeType function (lines 223-242) now uses only the proper "audio/" prefix check, with no bare string comparisons like "wav" or "mp3". The implementation correctly:

  • Handles empty strings and parameters
  • Performs case-insensitive comparison
  • Strips MIME type parameters before validation
  • Aligns with Gemini API specifications
core/schemas/providers/gemini/speech.go (1)

9-88: LGTM! Speech request conversion properly handles multi-voice configurations.

The new ToBifrostSpeechRequest() method correctly converts Gemini speech requests to Bifrost format, including proper handling of both single-speaker (lines 46-52) and multi-speaker (lines 53-71) voice configurations. The broadened condition at lines 122-127 ensures speech config is populated for both single-voice and multi-voice scenarios, and response modalities are appropriately preserved in ExtraParams (lines 75-84).

Also applies to: 122-127

tests/integrations/tests/utils/parametrize.py (1)

16-47: LGTM! Cross-provider parametrization utilities properly implemented.

The functions correctly enable scenario-based, cross-provider test parametrization by mapping scenarios to providers and their appropriate models. The dummy tuple fallback (lines 44-45) prevents pytest errors when no providers are available, and deterministic sorting (line 35) ensures consistent test ordering.

Also applies to: 50-65

tests/integrations/tests/integrations/test_litellm.py (2)

433-433: Timeout increases align with PR objectives.

The streaming test timeouts increased from 30s to 120s correctly address the timeout issues mentioned in the PR summary. This change is consistent with similar timeout adjustments across other provider tests and reflects the PR's goal to fix integration test timing issues.

Also applies to: 452-452


145-157: Note: Static analysis warnings about unused parameters are false positives.

The linter flags test_config and provider as unused in parametrized test methods. These are not issues:

  • test_config is a pytest fixture that may be used for future assertions or kept for consistency across test signatures
  • provider is part of the parametrization tuple used in test IDs and filtering but doesn't need to be referenced in the function body when the model parameter is sufficient

No action needed.

Also applies to: 159-175, 176-192

tests/integrations/tests/integrations/test_openai.py (1)

1266-1278: Use parametrized model instead of hardcoded selection.

This test is parametrized with provider and model but doesn't use them—line 1269 hardcodes get_model("openai", "chat"). This was flagged in a previous review but hasn't been addressed yet.

Apply this diff to use the parametrized model:

-            model=get_model("openai", "chat"),
+            model=format_provider_model(provider, model),

Or if this test must remain OpenAI-specific, remove the parametrization:

-    @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("responses_api"))
-    def test_36_responses_streaming(self, openai_client, test_config, provider, model):
+    @skip_if_no_api_key("openai")
+    def test_36_responses_streaming(self, openai_client, test_config):
         """Test Case 36: Responses API streaming"""
         stream = openai_client.responses.create(
-            model=get_model("openai", "chat"),
+            model="openai/" + get_model("openai", "chat"),

Likely an incorrect or invalid review comment.

tests/integrations/tests/utils/common.py (13)

120-120: LGTM: Improved test specificity.

Adding "in fahrenheit" makes the tool call test more explicit about the expected temperature unit, which aligns well with the WEATHER_TOOL's unit parameter enum.


142-218: LGTM: Comprehensive Responses API test data.

The new test data constants cover a good range of scenarios (simple text, images, tool calls, streaming, reasoning) and follow consistent naming conventions. These will provide solid coverage for Responses API testing.


221-221: LGTM: Secondary image URL added.


570-582: LGTM: Robust content extraction for image responses.

The enhanced logic correctly handles both dictionary-based content blocks (using .get("text", "")) and SDK object-based blocks (using .text attribute), making the validation more resilient across different response formats.


794-796: LGTM: Handles empty streaming deltas gracefully.

The early return for completely empty deltas (no content, tool_calls, or role) prevents false failures for provider-specific streaming behaviors like Cohere's content-start events.


910-912: LGTM: Proper handling of Anthropic tool call streaming.

Correctly handles input_json_delta events by collecting the partial JSON and marking that tool calls were detected, which is appropriate for Anthropic's streaming function call format.


969-1029: LGTM: Clean provider voice abstraction.

The helper functions provide a clean interface for selecting voices across providers, with appropriate defaults and fallback behavior. The distinction between OpenAI's lowercase voice names and Google's capitalized names is correctly handled.


1400-1416: LGTM: Comprehensive Google/Gemini transcription validation.

Properly handles Google's GenerateContentResponse structure by navigating the candidates → content → parts hierarchy to extract text, with appropriate handling for empty or metadata-only chunks.


1509-1520: LGTM: Consistent Google/Gemini streaming collection.

The text collection logic mirrors the validation function and properly accumulates text chunks from Google's nested response structure.


1583-1630: LGTM: Thorough Responses API response validation.

The function comprehensively validates Responses API output, handling both regular content and summary fields (for reasoning models), with proper support for string and list content formats. The accumulated content length check provides helpful debugging information.


1638-1659: LGTM: Clean tool call validation.

The function properly identifies and validates function calls in Responses API output with appropriate structure checks.


1661-1715: LGTM: Comprehensive Responses streaming collection.

The function thoroughly tracks streaming events, content deltas, and tool calls with appropriate timeout handling and event type categorization for debugging.


1743-1805: LGTM: Well-structured text completion helpers.

Both the validation and streaming collection functions for text completions are properly implemented with appropriate structure checks and timeout handling.

@akshaydeo akshaydeo force-pushed the 11-22-moves_all_tests_related_to_core_to_corresponding_files branch from 91d3339 to 132a1f5 Compare November 24, 2025 16:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integrations/tests/integrations/test_openai.py (1)

2-51: Fix numbering for final test cases in the header docstring.

The summary list still labels the last item as "42. Text Completions - streaming" even though it corresponds to Test Case 40; keeping the numbering consistent with the actual test names will avoid confusion.

-39. Text Completions - simple prompt
-42. Text Completions - streaming
+39. Text Completions - simple prompt
+40. Text Completions - streaming
♻️ Duplicate comments (7)
tests/integrations/tests/utils/parametrize.py (1)

12-16: Use explicit union syntax for optional parameters.

Lines 14-15 use deprecated implicit Optional style (List[str] = None). Modern Python typing requires explicit union syntax with None.

Based on learnings, apply this diff:

 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:
core/providers/gemini/chat.go (1)

457-474: Add error handling for JSON unmarshaling of tool call arguments.

Line 461 calls json.Unmarshal without checking the error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty, potentially causing incorrect tool calls. Note that the non-streaming path at line 501 correctly handles this error.

Apply this diff to add consistent error handling:

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Silently use empty argsMap on parse error to match streaming behavior
+								continue
+							}
 						}
 						if toolCall.Function.Name != nil {
 							fc := &FunctionCall{
 								Name: *toolCall.Function.Name,
 								Args: argsMap,
 							}
 							if toolCall.ID != nil {
 								fc.ID = *toolCall.ID
 							}
 							parts = append(parts, &Part{FunctionCall: fc})
 						}
 					}
 				}
core/providers/gemini/speech.go (1)

25-33: Add separator between concatenated text parts.

Line 30 concatenates part.Text directly without separators, which will merge words from separate parts (e.g., "Hello" + "world" becomes "Helloworld" instead of "Hello world"). Past review comments indicated this was addressed, but the current code still lacks separators.

Apply this diff:

 	// Extract text input from contents
 	var textInput string
 	for _, content := range request.Contents {
 		for _, part := range content.Parts {
 			if part.Text != "" {
+				if textInput != "" {
+					textInput += " "
+				}
 				textInput += part.Text
 			}
 		}
 	}
tests/integrations/tests/utils/common.py (1)

504-527: get_content_string still breaks on SDK objects in choice.message.content; use dict+attribute handling.

assert_valid_chat_response now delegates OpenAI message content extraction to get_content_string, but get_content_string assumes list elements are dicts and calls .get("text", ""). With real OpenAI SDK responses, choice.message.content is often a list of SDK objects (e.g., ChatCompletionMessageContentPartText), so this will raise AttributeError and fail tests.

Adapt get_content_string to handle both dicts and objects with a .text attribute, e.g.:

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""

This matches how the SDK structures content parts and avoids AttributeError crashes.

Also applies to: 1806-1813

tests/integrations/tests/integrations/test_google.py (1)

229-237: Still need sentinel guard for _no_providers_ / _no_model_ in parametrized tests.

get_cross_provider_params_for_scenario(...) can return the sentinel pair ("_no_providers_", "_no_model_") when no providers are available for a scenario, but the parametrized tests here pass format_provider_model(provider, model) straight into the Google GenAI client without short‑circuiting. In a misconfigured or key‑less environment that yields only the sentinel, these tests will attempt to call google_client.models.generate_content (or friends) with an invalid "provider/model" and fail hard instead of cleanly skipping.

Add a simple guard at the top of every parametrized test in this module, e.g.:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

to align with the earlier guidance and keep the suite resilient when provider configuration changes.

Also applies to: 243-263, 264-311, 312-366, 367-408, 501-548, 549-561, 690-719, 777-819

tests/integrations/tests/integrations/test_anthropic.py (1)

721-813: Avoid hard‑coding "anthropic/claude-sonnet-4-5" in streaming thinking test; use config‑backed model.

test_16_extended_thinking_streaming currently does:

stream = anthropic_client.messages.create(
    model="anthropic/claude-sonnet-4-5",
    max_tokens=16000,
    thinking={...},
    messages=messages,
    stream=True,
)

This literal model name isn’t tied to the provider configuration in tests/integrations/config.yml (which defines Anthropic models under providers.anthropic.*) and will easily drift or be invalid, especially as model IDs change. The non‑streaming thinking test (test_15) already uses get_model("anthropic", "chat") for a thinking‑capable model.

Align the streaming test with the config‑backed model, e.g.:

-        stream = anthropic_client.messages.create(
-            model="anthropic/claude-sonnet-4-5",
+        stream = anthropic_client.messages.create(
+            model=get_model("anthropic", "chat"),
             max_tokens=16000,
             thinking={
                 "type": "enabled",
                 "budget_tokens": 10000,
             },
             messages=messages,
             stream=True,
         )

so both tests stay in sync with the configured Anthropic thinking model.

tests/integrations/tests/integrations/test_openai.py (1)

1353-1490: Hard-coded openai/gpt-5 and narrow error-string checks make the reasoning test brittle.

test_38_responses_reasoning is thorough, but coupling it to "openai/gpt-5" and only treating errors containing "reasoning" or "not supported" as recoverable means model-availability issues (e.g., “model not found/invalid”) will re-raise instead of falling back.

Consider two small robustness tweaks:

  • Drive model_to_use from configuration (e.g., a “reasoning” capability or scenario) instead of a literal string, so you can swap models without editing the test.
  • Broaden the error check so common model-related errors also trigger the fallback, while still re-raising unexpected failures. For example:
-        except Exception as e:
-            # If reasoning parameters are not supported by the model, that's okay
-            # Just verify basic response works
-            error_str = str(e).lower()
-            if "reasoning" in error_str or "not supported" in error_str:
+        except Exception as e:
+            # If reasoning parameters or the chosen model are not supported, fall back
+            error_str = str(e).lower()
+            recoverable_terms = ["reasoning", "not supported", "model", "not found", "invalid"]
+            if any(term in error_str for term in recoverable_terms):
                 print(f"Info: Model {model_to_use} may not fully support reasoning parameters")

You could also swap model_to_use to a config-driven model before retrying, if desired.

🧹 Nitpick comments (8)
transports/bifrost-http/integrations/router.go (1)

521-521: Fix Go formatting: add space before else.

Line 521 has }else { which violates Go formatting conventions. Go requires a space between } and else.

Apply this diff:

-		}else {
+		} else {
tests/integrations/tests/integrations/test_litellm.py (1)

81-92: Cross‑provider parametrization and timeouts look solid; drop unused import for cleanliness.

The LiteLLM tests now correctly use get_cross_provider_params_for_scenario(..., exclude_providers=LITELLM_EXCLUDED_PROVIDERS) and pass the injected model through to all litellm.completion calls, and streaming uses a 120‑second timeout via collect_streaming_content, which should help with flakiness. The speech test’s use of get_provider_voice("openai", ...) is also consistent with the shared voice helpers.

The only nit is that format_provider_model is imported but never used in this file; consider removing that import to avoid confusion.

Also applies to: 145-152, 159-167, 176-187, 193-203, 211-223, 253-270, 271-307, 309-353, 421-460, 584-624

tests/integrations/config.yml (2)

25-30: api.timeout remains 30s while clients now default to 120s via get_api_config().

Anthropic and LiteLLM clients read api_config.get("timeout", 120), but this config sets api.timeout: 30, so the effective per‑request timeout is still 30 seconds under normal test runs. If the goal of this PR is to actually operate at 120s by default, you’ll need to either bump api.timeout here or have the fixtures consult an environment‑specific override (e.g., from environments.development.api.timeout) when constructing the SDK clients.

Also applies to: 477-485, 508-527


92-100: Confirm environment setup for Google/Gemini: GEMINI_API_KEY vs GOOGLE_API_KEY.

Provider availability is driven by provider_api_keys (e.g., gemini: GEMINI_API_KEY), while the Google client fixture uses get_api_key("google") and expects GOOGLE_API_KEY. That means cross‑provider parametrization will only see Gemini as “available” if GEMINI_API_KEY is set, even if GOOGLE_API_KEY alone is configured for the SDK.

If intentional, it’s worth documenting in test setup instructions that both GOOGLE_API_KEY and GEMINI_API_KEY should be present; otherwise, consider aligning the key mapping so a single env var controls both availability and client initialization.

Also applies to: 101-212, 369-418

tests/integrations/tests/integrations/test_anthropic.py (1)

588-609: Minor: unused content_tools could be dropped or asserted.

In the streaming‑with‑tools branch you unpack content_tools, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(...) but only ever reference chunk_count_tools and tool_calls_detected_tools. Either prefix content_tools with _ or add an assertion about it (e.g., minimum length) to avoid the unused‑variable warning and make intent clearer.

tests/integrations/tests/integrations/test_openai.py (3)

480-494: Consider removing debug print(error) from the invalid-roles test.

The explicit print(error) in test_12_error_handling_invalid_roles can add noise to successful test runs; the assertions already validate the error.

-        error = exc_info.value
-        print(error)
-        assert_valid_error_response(error, "tester")
+        error = exc_info.value
+        assert_valid_error_response(error, "tester")

495-538: Tighten streaming-with-tools helper usage by dropping unused content_tools.

The extended streaming test (with tools) looks good and uses provider_supports_scenario/get_provider_model appropriately, but content_tools from collect_streaming_content is never used, which triggers Ruff’s RUF059.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
-                    collect_streaming_content(stream_with_tools, "openai", timeout=300)
-                )
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                    collect_streaming_content(stream_with_tools, "openai", timeout=300)
+                )

1106-1351: Responses API tests provide strong coverage; clean up unused content in streaming-with-tools.

The new Responses tests (simple text, system message, image, tools, streaming, streaming+tools) are well-structured and use the shared utilities and cross-provider parametrization correctly. In test_37_responses_streaming_with_tools, the unpacked content from collect_responses_streaming_content is never used, which Ruff flags.

-        content, chunk_count, tool_calls_detected, event_types = (
-            collect_responses_streaming_content(stream, timeout=300)
-        )
+        _content, chunk_count, tool_calls_detected, event_types = (
+            collect_responses_streaming_content(stream, timeout=300)
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c2f889 and f994102.

📒 Files selected for processing (17)
  • core/providers/anthropic/responses.go (4 hunks)
  • core/providers/cohere/responses.go (2 hunks)
  • core/providers/gemini/chat.go (4 hunks)
  • core/providers/gemini/speech.go (2 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (15 hunks)
  • tests/integrations/tests/integrations/test_openai.py (21 hunks)
  • tests/integrations/tests/utils/common.py (12 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/providers/cohere/responses.go
  • core/providers/anthropic/responses.go
🧰 Additional context used
🧬 Code graph analysis (8)
core/providers/gemini/chat.go (3)
core/schemas/chatcompletions.go (3)
  • ChatAssistantMessageToolCall (483-489)
  • ChatStreamResponseChoice (530-532)
  • ChatNonStreamResponseChoice (524-527)
core/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (876-884)
  • Part (890-914)
  • FunctionCall (1045-1055)
ui/lib/types/logs.ts (1)
  • Function (141-146)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
transports/bifrost-http/integrations/genai.go (7)
core/schemas/bifrost.go (3)
  • BifrostRequest (143-153)
  • SpeechRequest (94-94)
  • TranscriptionRequest (96-96)
transports/bifrost-http/integrations/router.go (2)
  • SpeechResponseConverter (104-104)
  • TranscriptionResponseConverter (108-108)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (157-180)
core/schemas/transcriptions.go (1)
  • BifrostTranscriptionResponse (16-26)
core/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/providers/gemini/types.go (2)
  • GeminiGenerationRequest (54-72)
  • GenerationConfig (630-696)
core/providers/gemini/transcription.go (5)
core/providers/gemini/types.go (1)
  • GeminiGenerationRequest (54-72)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionParameters (32-45)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (1)
  • Type (777-777)
tests/integrations/tests/integrations/test_anthropic.py (6)
tests/integrations/tests/utils/config_loader.py (7)
  • get_model (138-159)
  • get_model (441-443)
  • get_config (429-434)
  • provider_supports_scenario (373-390)
  • provider_supports_scenario (471-473)
  • get_provider_model (306-325)
  • get_provider_model (456-458)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/integrations/test_google.py (6)
  • test_01_simple_chat (230-240)
  • test_02_multi_turn_conversation (244-262)
  • test_03_single_tool_call (265-285)
  • test_04_multiple_tool_calls (288-310)
  • test_05_end2end_tool_calling (313-345)
  • test_13_streaming (502-547)
tests/integrations/tests/integrations/test_litellm.py (6)
  • test_01_simple_chat (146-156)
  • test_02_multi_turn_conversation (160-174)
  • test_03_single_tool_call (177-191)
  • test_04_multiple_tool_calls (194-209)
  • test_05_end2end_tool_calling (212-251)
  • test_13_streaming (422-460)
tests/integrations/tests/integrations/test_openai.py (6)
  • test_01_simple_chat (222-232)
  • test_02_multi_turn_conversation (235-249)
  • test_03_single_tool_call (252-264)
  • test_04_multiple_tool_calls (267-283)
  • test_05_end2end_tool_calling (286-325)
  • test_13_streaming (496-537)
tests/integrations/tests/utils/common.py (2)
  • assert_valid_chat_response (504-530)
  • assert_valid_image_response (550-599)
tests/integrations/tests/integrations/test_litellm.py (3)
tests/integrations/tests/utils/common.py (2)
  • get_provider_voice (968-1004)
  • get_provider_voices (1007-1028)
tests/integrations/tests/utils/config_loader.py (2)
  • get_model (138-159)
  • get_model (441-443)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/integrations/test_openai.py (3)
tests/integrations/tests/utils/common.py (4)
  • get_content_string (1806-1813)
  • get_provider_voice (968-1004)
  • get_provider_voices (1007-1028)
  • skip_if_no_api_key (1553-1564)
tests/integrations/tests/utils/config_loader.py (7)
  • get_model (138-159)
  • get_model (441-443)
  • get_config (429-434)
  • provider_supports_scenario (373-390)
  • provider_supports_scenario (471-473)
  • get_provider_model (306-325)
  • get_provider_model (456-458)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 Ruff (0.14.5)
tests/integrations/tests/integrations/test_google.py

230-230: Unused method argument: test_config

(ARG002)


244-244: Unused method argument: test_config

(ARG002)


265-265: Unused method argument: test_config

(ARG002)


288-288: Unused method argument: test_config

(ARG002)


313-313: Unused method argument: test_config

(ARG002)


348-348: Unused method argument: test_config

(ARG002)


368-368: Unused method argument: test_config

(ARG002)


381-381: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


410-410: Unused method argument: test_config

(ARG002)


502-502: Unused method argument: test_config

(ARG002)


550-550: Unused method argument: test_config

(ARG002)


563-563: Unused method argument: test_config

(ARG002)


570-570: Unused method argument: test_config

(ARG002)


594-594: Unused method argument: test_config

(ARG002)


620-620: Unused method argument: test_config

(ARG002)


644-644: Unused method argument: test_config

(ARG002)


668-668: Unused method argument: test_config

(ARG002)


691-691: Unused method argument: test_config

(ARG002)


723-723: Unused method argument: test_config

(ARG002)


777-777: Unused method argument: test_config

(ARG002)


821-821: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/common.py

1686-1686: Avoid specifying long messages outside the exception class

(TRY003)


1708-1710: Avoid specifying long messages outside the exception class

(TRY003)


1723-1723: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1788-1788: Avoid specifying long messages outside the exception class

(TRY003)


1798-1800: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/integrations/test_anthropic.py

181-181: Unused method argument: test_config

(ARG002)


195-195: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


230-230: Unused method argument: test_config

(ARG002)


257-257: Unused method argument: test_config

(ARG002)


318-318: Unused method argument: test_config

(ARG002)


336-336: Unused method argument: test_config

(ARG002)


361-361: Unused method argument: test_config

(ARG002)


387-387: Unused method argument: test_config

(ARG002)


569-569: Unused method argument: test_config

(ARG002)


602-602: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


611-611: Unused method argument: test_config

(ARG002)


647-647: Unused method argument: test_config

(ARG002)


722-722: Unused method argument: test_config

(ARG002)


758-758: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_litellm.py

146-146: Unused method argument: test_config

(ARG002)


146-146: Unused method argument: provider

(ARG002)


160-160: Unused method argument: test_config

(ARG002)


160-160: Unused method argument: provider

(ARG002)


177-177: Unused method argument: test_config

(ARG002)


177-177: Unused method argument: provider

(ARG002)


194-194: Unused method argument: test_config

(ARG002)


194-194: Unused method argument: provider

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: provider

(ARG002)


254-254: Unused method argument: test_config

(ARG002)


254-254: Unused method argument: provider

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: provider

(ARG002)


283-283: Unused method argument: test_config

(ARG002)


283-283: Unused method argument: provider

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: provider

(ARG002)


310-310: Unused method argument: test_config

(ARG002)


310-310: Unused method argument: provider

(ARG002)


422-422: Unused method argument: test_config

(ARG002)


422-422: Unused method argument: provider

(ARG002)

tests/integrations/tests/utils/config_loader.py

153-156: Avoid specifying long messages outside the exception class

(TRY003)


227-227: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


233-233: Avoid specifying long messages outside the exception class

(TRY003)


236-236: Avoid specifying long messages outside the exception class

(TRY003)


286-286: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


252-252: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


286-286: Unused method argument: test_config

(ARG002)


328-328: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


355-355: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


496-496: Unused method argument: test_config

(ARG002)


529-529: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


540-540: Unused method argument: test_config

(ARG002)


1096-1096: Unused method argument: test_config

(ARG002)


1107-1107: Unused method argument: test_config

(ARG002)


1140-1140: Unused method argument: test_config

(ARG002)


1170-1170: Unused method argument: test_config

(ARG002)


1212-1212: Unused method argument: test_config

(ARG002)


1262-1262: Unused method argument: test_config

(ARG002)


1316-1316: Unused method argument: test_config

(ARG002)


1334-1334: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1354-1354: Unused method argument: test_config

(ARG002)


1497-1497: Unused method argument: test_config

(ARG002)


1518-1518: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/parametrize.py

14-14: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (21)
transports/bifrost-http/integrations/genai.go (3)

183-199: LGTM!

The speech request detection logic correctly prioritizes AUDIO modality and SpeechConfig presence. The implementation properly identifies speech generation requests.


201-226: LGTM!

The transcription detection properly excludes speech requests first (line 206) to handle bidirectional audio scenarios correctly, then checks for audio input in InlineData and FileData.


228-249: LGTM!

The MIME type validation correctly uses the audio/ prefix check as required by Gemini API specifications. The parameter trimming and lowercase normalization ensure robust matching.

core/providers/gemini/utils.go (2)

79-101: LGTM!

The change to use strings.ToLower(string(schema.Type)) at line 81 ensures consistent lowercase type values in function parameters, aligning with JSON Schema standards.


103-155: LGTM!

The new convertTypeToLowerCase helper comprehensively handles nested schema structures (maps, arrays, and primitives) and correctly normalizes all "type" fields to lowercase while preserving other values unchanged.

core/providers/gemini/chat.go (2)

32-33: LGTM!

The previousToolCalls tracking enables proper correlation between function calls and responses when IDs are not explicitly provided.


113-121: LGTM!

The fallback logic properly searches through previousToolCalls to find matching function calls by name when an explicit ID is not provided, ensuring robust correlation.

tests/integrations/tests/utils/parametrize.py (1)

17-47: LGTM!

The parametrization logic correctly filters providers by scenario support, applies include/exclude filters, maps scenarios to capabilities and models, and provides a sensible dummy fallback when no providers are available.

core/providers/gemini/speech.go (1)

116-121: LGTM!

The updated condition correctly handles both single-voice and multi-voice configurations by checking for either Voice or MultiVoiceConfig before adding speech config.

tests/integrations/tests/integrations/test_litellm.py (1)

692-726: extract_litellm_tool_calls helper is robust and consistent with OpenAI‑style tool calls.

The helper defensively checks the response shape, handles both JSON‑string and object arguments, and tolerates parse failures without breaking the tests, which is appropriate for integration test code.

tests/integrations/tests/integrations/test_google.py (1)

169-187: New Google helpers (tool conversion, image loading, PCM→WAV, TTS tests) look consistent.

  • convert_to_google_tools now builds a lowercase JSON‑schema‑style function declaration and wraps it in types.Tool, which matches how you later pass tools= into GenerateContentConfig.
  • load_image_from_url’s explicit User‑Agent, 30‑second timeout, and raise_for_status() are good choices for avoiding flaky 403s/5xx.
  • convert_pcm_to_wav plus the speech tests’ inline_data.data extraction integrate cleanly with assert_valid_speech_response.

No functional issues stand out here.

Also applies to: 214-223, 690-722, 777-819

tests/integrations/tests/utils/common.py (2)

550-582: Image and streaming helpers are now multi‑provider‑robust.

  • assert_valid_image_response correctly normalizes content across Google, Anthropic, and OpenAI (handling both list‑of‑blocks and plain string content, including SDK objects with .text).
  • assert_valid_streaming_response and collect_streaming_content now tolerate empty OpenAI deltas and Anthropic input_json_delta tool‑call chunks while still tracking tool calls.
  • New transcription streaming helpers cover both OpenAI and Google/Gemini shapes and plug cleanly into collect_streaming_transcription_content.

These changes should substantially reduce flakiness across providers.

Also applies to: 774-802, 903-921, 1372-1416, 1468-1529


1568-1579: Responses API tool conversion correctly uses the flat function shape for this repo.

convert_to_responses_tools produces {"type": "function", "name": ..., "description": ..., "parameters": ...}, which matches the flat tool schema used by Bifrost’s Responses path and the Go schemas in this repository (as per previous discussion). Leaving out the nested "function" wrapper here is intentional and correct.

tests/integrations/tests/integrations/test_anthropic.py (1)

64-72: Anthropic cross‑provider setup and extended‑thinking (non‑streaming) test look consistent.

  • The anthropic_client fixture wiring (base_url, integration settings, retries) and use of api_config.get("timeout", 120) align with the shared config loader.
  • Parametrized tests 01–06 and 07–09 correctly use format_provider_model(provider, model) plus convert_to_anthropic_messages/convert_to_anthropic_tools.
  • test_15_extended_thinking sensibly reuses get_model("anthropic", "chat"), inspects both thinking and regular text blocks, and asserts on reasoning‑related keywords, which should give good signal without being overly brittle.

No functional issues stand out in these portions.

Also applies to: 76-103, 180-223, 255-334, 335-423, 510-552, 647-717

tests/integrations/tests/utils/config_loader.py (1)

15-24: Provider‑centric config loader refactor is coherent and matches the new YAML layout.

  • INTEGRATION_TO_PROVIDER_MAP plus the updated get_model/get_model_alternatives, list_integrations, and list_models correctly route integrations through the providers section.
  • Provider‑level helpers (get_provider_model, is_provider_available, provider_supports_scenario, get_providers_for_scenario, get_scenario_capability) are a good fit for get_cross_provider_params_for_scenario and centralize capability logic.
  • validate_config’s check that every mapped provider exists in providers will catch config drift early.

This structure should make it much easier to evolve provider support and keep tests in sync with configuration.

Also applies to: 138-176, 223-247, 248-269, 285-296, 306-409, 456-479

tests/integrations/tests/integrations/test_openai.py (6)

81-138: New imports for content/voice/responses/completions utilities and param helpers look correct.

The added imports for get_content_string, provider voice helpers, Responses API utilities, text completion helpers, and get_cross_provider_params_for_scenario/format_provider_model align with how they’re used later in the tests; no issues here.


183-209: Config-driven timeout and retries in openai_client fixture are appropriate.

Wiring timeout and max_retries through api_config with a sensible default (300s) and honoring integration-specific organization/project settings matches the PR goal of more robust integration tests.


218-380: Cross‑provider parametrization for chat/tool/image tests is well integrated.

Using get_cross_provider_params_for_scenario(..., "simple_chat"/"multi_turn_conversation"/tool/image scenarios) plus format_provider_model(provider, model) and get_content_string gives good multi‑provider coverage while remaining OpenAI‑SDK‑native.


540-794: Config-based voice selection for speech tests is a solid improvement.

Switching to get_provider_voice("openai", ...) and get_provider_voices("openai", count=3) ties the speech/transcription tests to the central provider config and reduces hard-coded voice names; the assertions and error‑handling logic remain coherent.


1095-1100: List‑models test now properly guarded by API‑key skip.

Adding @skip_if_no_api_key("openai") makes test_31_list_models consistent with other OpenAI-only tests; asserting len(response.data) > 0 is a reasonable invariant for this integration test.


1496-1562: Text completion and streaming completion tests look consistent and focused.

The new completion tests exercise both standard and streaming completions with reasonable content/quality checks and reuse of shared helpers; they’re properly gated by skip_if_no_api_key("openai").

Comment thread core/providers/gemini/types.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/integrations/config.yml (1)

25-30: API timeout in config still 30s, so client timeout bump will not take effect.

openai_client and anthropic_client both read api_config["timeout"] from this section; because timeout is present and set to 30, their new fallback defaults of 120/300 are never used. If the aim of this PR is to reduce timeout-related flakiness, you likely also want to raise:

api:
  timeout: 120  # or 300, to match the fixtures

so the clients actually run with the increased timeout.

tests/integrations/tests/integrations/test_openai.py (1)

221-233: Handle ("_no_providers_", "_no_model_") tuples in parametrized tests.

get_cross_provider_params_for_scenario can emit the sentinel ("_no_providers_", "_no_model_") when no providers are available for a scenario. The parametrized tests here use provider/model directly, so in that case they will call OpenAI endpoints with _no_providers_/_no_model_ as the model and hard‑fail instead of skipping.

Add a small guard at the top of each parametrized test in this module, e.g.:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

(or abstract it into a helper) so the cross‑provider matrix degrades to clean skips when provider API keys are absent.

Also applies to: 234-250, 251-265, 266-284, 285-326, 327-342, 343-380, 354-380, 365-380, 495-538, 1106-1138, 1139-1167, 1170-1210, 1212-1260, 1262-1314, 1316-1351

♻️ Duplicate comments (5)
tests/integrations/tests/utils/parametrize.py (1)

14-15: Use explicit union syntax for optional parameters.

The implicit Optional style is deprecated in modern Python typing (PEP 484). Static analysis tools flag this as non-compliant.

Apply this diff:

 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:

Based on past review comments and static analysis hints (RUF013).

core/providers/gemini/transcription.go (1)

48-61: FileData URIs are stored but never used in transcription requests.

When only FileData (no InlineData) is provided, the audio URI is stored in ExtraParams["file_uri"] but Input.File remains empty. The downstream ToGeminiTranscriptionRequest() at line 159 only adds audio when len(bifrostReq.Input.File) > 0, so FileData-only transcription requests will send no audio to the Gemini API.

Clarify the intended behavior:

  • If FileData references should be supported, add logic to fetch the file content or pass FileData directly to Gemini (the API does support FileData parts)
  • If FileData-only transcription is not supported, add validation to reject such requests with a clear error message

Based on past review comments indicating this was previously flagged.

core/providers/gemini/speech.go (1)

25-33: Add separator between concatenated text parts for speech synthesis.

Multiple text parts are concatenated directly without separators (line 30: textInput += part.Text), causing words to merge (e.g., "Hello" + "world" becomes "Helloworld" instead of "Hello world"). The Gemini API supports multiple text parts within Contents that should be read sequentially with proper spacing.

Apply this diff to add proper spacing:

 	for _, content := range request.Contents {
 		for _, part := range content.Parts {
 			if part.Text != "" {
+				if textInput != "" {
+					textInput += " "
+				}
 				textInput += part.Text
 			}
 		}
 	}

Based on past review comments, this was previously flagged but the fix appears not to be present in the current code.

tests/integrations/tests/integrations/test_google.py (1)

229-236: Add guard for sentinel provider/model pairs in all parametrized tests.

get_cross_provider_params_for_scenario can return ("_no_providers_", "_no_model_") when no providers are available for a scenario; the parametrized tests here use provider/model directly without checking for this sentinel. That will drive _no_providers_/_no_model_ into google_client.models.generate_content / streaming calls and produce avoidable 4xxs instead of clean skips.

Add a small guard at the top of each parametrized test in this module, e.g.:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

(or factor a shared helper) so these tests skip gracefully when no providers are configured.

Also applies to: 264-300, 287-311, 312-366, 367-403, 501-532, 549-561, 569-689, 690-856

tests/integrations/tests/utils/common.py (1)

1806-1813: Handle SDK objects in get_content_string.

Line 1811 calls .get() on list elements without checking whether they're dictionaries or SDK objects. This will raise AttributeError when content contains SDK objects (e.g., ChatCompletionMessageContentPartText from OpenAI SDK).

Apply the same fix used in lines 574-576:

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""
🧹 Nitpick comments (3)
tests/integrations/tests/utils/config_loader.py (1)

286-296: Drop unnecessary f-string in print_config_summary.

print(f"\n🤖 MODEL CONFIGURATIONS (via providers):") is an f-string without placeholders and trips Ruff’s F541. You can safely change it to a plain string:

print("\n🤖 MODEL CONFIGURATIONS (via providers):")

No behavior change, but keeps lint clean.

tests/integrations/tests/integrations/test_openai.py (1)

529-531: Rename unused destructured variables in streaming tests.

In both streaming tests you unpack values you never use:

  • test_13_streaming: content_tools is unused.
  • test_37_responses_streaming_with_tools: content is unused.

For cleaner code and to address RUF059, either don’t unpack those elements or prefix them with an underscore, e.g.:

_content_tools, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(...)

_content, chunk_count, tool_calls_detected, event_types = collect_responses_streaming_content(...)

Also applies to: 1334-1336

tests/integrations/tests/utils/common.py (1)

1723-1723: Remove unused variable.

The valid_event_types list is defined but never used in the function. You can safely remove it or use it to validate event_type if stricter validation is desired.

Apply this diff to remove the unused variable:

-    # Validate common streaming event types
-    valid_event_types = [
-        "response.created",
-        "response.output_item.added",
-        "response.content_part.added",
-        "response.output_text.delta",
-        "response.function_call_arguments.delta",
-        "response.completed",
-        "response.error",
-    ]
-
     # Log the event type for debugging

Or, if you want stricter validation, use it to check event types:

     # Validate common streaming event types
     valid_event_types = [
         "response.created",
         "response.output_item.added",
         "response.content_part.added",
         "response.output_text.delta",
         "response.function_call_arguments.delta",
         "response.completed",
         "response.error",
     ]

     # Log the event type for debugging
     if hasattr(chunk, "type"):
         event_type = chunk.type
-        # Don't fail on unknown event types, just warn
-        if not any(evt in event_type for evt in ["response.", "error"]):
+        # Validate against known types
+        if not any(evt in event_type for evt in valid_event_types):
             print(f"Warning: Unexpected event type: {event_type}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c2f889 and f994102.

📒 Files selected for processing (17)
  • core/providers/anthropic/responses.go (4 hunks)
  • core/providers/cohere/responses.go (2 hunks)
  • core/providers/gemini/chat.go (4 hunks)
  • core/providers/gemini/speech.go (2 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (15 hunks)
  • tests/integrations/tests/integrations/test_openai.py (21 hunks)
  • tests/integrations/tests/utils/common.py (12 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/providers/gemini/chat.go
🧰 Additional context used
🧬 Code graph analysis (10)
core/providers/cohere/responses.go (1)
core/schemas/responses.go (1)
  • ResponsesMessageTypeFunctionCall (291-291)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/transcription.go (4)
core/providers/gemini/types.go (2)
  • GeminiGenerationRequest (54-72)
  • FileData (1034-1042)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionParameters (32-45)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
tests/integrations/tests/integrations/test_litellm.py (3)
tests/integrations/tests/utils/common.py (3)
  • get_provider_voice (968-1004)
  • get_provider_voices (1007-1028)
  • collect_streaming_content (855-940)
tests/integrations/tests/utils/config_loader.py (2)
  • get_model (138-159)
  • get_model (441-443)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
transports/bifrost-http/integrations/genai.go (6)
core/schemas/bifrost.go (3)
  • BifrostRequest (143-153)
  • SpeechRequest (94-94)
  • TranscriptionRequest (96-96)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (157-180)
core/schemas/transcriptions.go (1)
  • BifrostTranscriptionResponse (16-26)
core/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/providers/gemini/types.go (4)
  • GeminiGenerationRequest (54-72)
  • GenerationConfig (630-696)
  • ModalityAudio (715-715)
  • SpeechConfig (846-855)
tests/integrations/tests/integrations/test_openai.py (4)
tests/integrations/tests/utils/common.py (2)
  • get_content_string (1806-1813)
  • assert_valid_chat_response (504-530)
tests/integrations/tests/utils/config_loader.py (5)
  • get_model (138-159)
  • get_model (441-443)
  • get_config (429-434)
  • get_provider_model (306-325)
  • get_provider_model (456-458)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/integrations/test_google.py (2)
  • test_01_simple_chat (230-240)
  • test_config (116-118)
core/providers/anthropic/responses.go (1)
core/providers/anthropic/types.go (2)
  • AnthropicThinking (64-67)
  • AnthropicContentBlock (146-158)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (1)
  • Type (777-777)
tests/integrations/tests/integrations/test_google.py (3)
tests/integrations/tests/utils/common.py (9)
  • assert_valid_transcription_response (1137-1166)
  • assert_valid_streaming_transcription_response (1372-1415)
  • collect_streaming_transcription_content (1468-1529)
  • generate_test_audio (1032-1062)
  • get_provider_voice (968-1004)
  • get_provider_voices (1007-1028)
  • get_api_key (1533-1550)
  • skip_if_no_api_key (1553-1564)
  • assert_valid_embedding_response (1169-1230)
tests/integrations/tests/utils/config_loader.py (2)
  • get_model (138-159)
  • get_model (441-443)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/config_loader.py (6)
  • get_config (429-434)
  • get_providers_for_scenario (392-408)
  • get_providers_for_scenario (476-478)
  • get_scenario_capability (410-422)
  • get_provider_model (306-325)
  • get_provider_model (456-458)
🪛 Ruff (0.14.5)
tests/integrations/tests/integrations/test_litellm.py

146-146: Unused method argument: test_config

(ARG002)


146-146: Unused method argument: provider

(ARG002)


160-160: Unused method argument: test_config

(ARG002)


160-160: Unused method argument: provider

(ARG002)


177-177: Unused method argument: test_config

(ARG002)


177-177: Unused method argument: provider

(ARG002)


194-194: Unused method argument: test_config

(ARG002)


194-194: Unused method argument: provider

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: provider

(ARG002)


254-254: Unused method argument: test_config

(ARG002)


254-254: Unused method argument: provider

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: provider

(ARG002)


283-283: Unused method argument: test_config

(ARG002)


283-283: Unused method argument: provider

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: provider

(ARG002)


310-310: Unused method argument: test_config

(ARG002)


310-310: Unused method argument: provider

(ARG002)


422-422: Unused method argument: test_config

(ARG002)


422-422: Unused method argument: provider

(ARG002)

tests/integrations/tests/integrations/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


252-252: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


286-286: Unused method argument: test_config

(ARG002)


328-328: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


355-355: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


496-496: Unused method argument: test_config

(ARG002)


529-529: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


540-540: Unused method argument: test_config

(ARG002)


1096-1096: Unused method argument: test_config

(ARG002)


1107-1107: Unused method argument: test_config

(ARG002)


1140-1140: Unused method argument: test_config

(ARG002)


1170-1170: Unused method argument: test_config

(ARG002)


1212-1212: Unused method argument: test_config

(ARG002)


1262-1262: Unused method argument: test_config

(ARG002)


1316-1316: Unused method argument: test_config

(ARG002)


1334-1334: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1354-1354: Unused method argument: test_config

(ARG002)


1497-1497: Unused method argument: test_config

(ARG002)


1518-1518: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/config_loader.py

153-156: Avoid specifying long messages outside the exception class

(TRY003)


227-227: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


233-233: Avoid specifying long messages outside the exception class

(TRY003)


236-236: Avoid specifying long messages outside the exception class

(TRY003)


286-286: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_google.py

230-230: Unused method argument: test_config

(ARG002)


244-244: Unused method argument: test_config

(ARG002)


265-265: Unused method argument: test_config

(ARG002)


288-288: Unused method argument: test_config

(ARG002)


313-313: Unused method argument: test_config

(ARG002)


348-348: Unused method argument: test_config

(ARG002)


368-368: Unused method argument: test_config

(ARG002)


381-381: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


410-410: Unused method argument: test_config

(ARG002)


502-502: Unused method argument: test_config

(ARG002)


550-550: Unused method argument: test_config

(ARG002)


563-563: Unused method argument: test_config

(ARG002)


570-570: Unused method argument: test_config

(ARG002)


594-594: Unused method argument: test_config

(ARG002)


620-620: Unused method argument: test_config

(ARG002)


644-644: Unused method argument: test_config

(ARG002)


668-668: Unused method argument: test_config

(ARG002)


691-691: Unused method argument: test_config

(ARG002)


723-723: Unused method argument: test_config

(ARG002)


777-777: Unused method argument: test_config

(ARG002)


821-821: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/common.py

1686-1686: Avoid specifying long messages outside the exception class

(TRY003)


1708-1710: Avoid specifying long messages outside the exception class

(TRY003)


1723-1723: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1788-1788: Avoid specifying long messages outside the exception class

(TRY003)


1798-1800: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/utils/parametrize.py

14-14: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

tests/integrations/tests/integrations/test_anthropic.py

181-181: Unused method argument: test_config

(ARG002)


195-195: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


230-230: Unused method argument: test_config

(ARG002)


257-257: Unused method argument: test_config

(ARG002)


318-318: Unused method argument: test_config

(ARG002)


336-336: Unused method argument: test_config

(ARG002)


361-361: Unused method argument: test_config

(ARG002)


387-387: Unused method argument: test_config

(ARG002)


569-569: Unused method argument: test_config

(ARG002)


602-602: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


611-611: Unused method argument: test_config

(ARG002)


647-647: Unused method argument: test_config

(ARG002)


722-722: Unused method argument: test_config

(ARG002)


758-758: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (19)
core/providers/anthropic/responses.go (4)

318-331: LGTM! Flexible thinking parameter handling.

The dual-path logic correctly handles the thinking parameter whether it arrives as a typed pointer or as a map from JSON deserialization. The use of SafeExtractInt properly handles numeric type variations (float64 vs int) that can occur with JSON unmarshaling.


462-463: LGTM! Consistent JSON serialization with empty slice.

Initializing Content to an empty slice ensures consistent JSON serialization (empty array vs null), which prevents potential API contract issues with Anthropic's response format.


649-649: LGTM! ContentIndex now propagated for tool_use blocks.

This change addresses the past review concern by ensuring ContentIndex is consistently populated for tool_use OutputItemAdded responses, maintaining parity with text content block handling (line 602) and enabling proper index tracking across the response stream.


684-684: LGTM! Consistent ContentIndex propagation for MCP tools.

This mirrors the fix at line 649, ensuring all tool types (regular tools, MCP tools, computer tools) consistently propagate ContentIndex in their OutputItemAdded events.

core/providers/cohere/responses.go (1)

342-344: The CallID field change is correct and should be approved.

The git history confirms this is a legitimate bug fix: the previous code used msg.ID (the generic common ID field), but it should use msg.CallID (the dedicated tool call identifier from the embedded ResponsesToolMessage). This aligns with how the function_call_output case at line 366 correctly uses msg.ResponsesToolMessage.CallID for the same purpose. The change improves both correctness and consistency without breaking existing integrations.

core/providers/gemini/types.go (1)

927-984: The implementation is correct—no changes needed.

Google Gemini API expects standard Base64 (RFC 4648 §4, with +/ and = padding) for blob data in JSON requests, and MarshalJSON correctly uses base64.StdEncoding.EncodeToString() to produce this format. The asymmetry in UnmarshalJSON (which accepts URL-safe base64) is appropriate defensive programming—it handles flexible input while always sending the format the API expects.

tests/integrations/tests/integrations/test_google.py (1)

563-568: List-models test behavior looks robust now.

Using page_size=5 and asserting len(response) <= 5 avoids flakiness when the API returns fewer than requested models while still asserting that listing works.

tests/integrations/tests/integrations/test_openai.py (1)

1095-1100: test_31_list_models behavior is sensible and non‑flaky.

Using models.list() and only asserting len(response.data) > 0 (with @skip_if_no_api_key("openai")) gives a robust smoke test for model listing without depending on a particular page size or count.

tests/integrations/tests/utils/common.py (11)

25-30: LGTM! Well-structured test data constants.

The image test data constants (URLs and base64-encoded 1x1 PNG) are appropriate for integration testing.


148-223: LGTM! Comprehensive test data coverage.

The new RESPONSES_, TEXT_COMPLETION_, and ANTHROPIC_THINKING_* test data constants provide excellent coverage across different input types (text, images, tools, streaming, reasoning).


569-581: LGTM! Robust handling of mixed content formats.

The updated logic correctly handles both dictionary-based content blocks and SDK objects by checking types before accessing attributes. This prevents AttributeError when content is returned as SDK objects.


793-795: LGTM! Proper handling of empty deltas.

Ignoring completely empty deltas (no content, tool_calls, or role) is appropriate for providers that send empty start markers.


909-911: LGTM! Correct tool call streaming support.

Adding input_json_delta handling with partial_json extraction properly supports streaming tool calls in Anthropic format.


968-1028: LGTM! Clean provider abstraction for voice selection.

The get_provider_voice and get_provider_voices functions provide a clean abstraction for obtaining provider-specific voice names, with proper normalization and fallback to OpenAI voices as defaults.


1399-1415: LGTM! Proper Google/Gemini transcription support.

The additions correctly handle Google GenAI SDK's transcription streaming structure by extracting text from the candidates[].content.parts[].text hierarchy.

Also applies to: 1476-1476, 1508-1519


1568-1658: LGTM! Well-structured Responses API helpers.

The Responses API helper functions properly validate the API's output structure, handle both string and list content formats, and check for function calls with appropriate type filtering.


1660-1714: LGTM! Robust streaming collection with event tracking.

The collect_responses_streaming_content function properly collects content from streaming responses, tracks event types for debugging, and detects tool calls.


1742-1761: LGTM! Proper text completion validation.

The assert_valid_text_completion_response function correctly validates the text completion response structure and content length.


1764-1804: LGTM! Clean streaming collection for text completions.

The collect_text_completion_streaming_content function properly collects text from streaming chunks with appropriate timeout and safety checks.

Comment thread core/providers/cohere/responses.go
Comment thread tests/integrations/tests/test_anthropic.py
Comment thread tests/integrations/tests/test_anthropic.py
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from f994102 to dcf426f Compare November 25, 2025 10:53
@akshaydeo akshaydeo force-pushed the 11-22-moves_all_tests_related_to_core_to_corresponding_files branch 2 times, most recently from a8d18b1 to f946845 Compare November 25, 2025 10:58
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from dcf426f to 2470798 Compare November 25, 2025 10:58
@akshaydeo akshaydeo force-pushed the 11-22-moves_all_tests_related_to_core_to_corresponding_files branch from f946845 to 9800538 Compare November 25, 2025 11:02
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from 2470798 to 0e1e90a Compare November 25, 2025 11:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integrations/tests/integrations/test_litellm.py (1)

145-157: Add sentinel provider/model guards to parametrized tests.

All parametrized tests using get_cross_provider_params_for_scenario can receive sentinel values ("_no_providers_", "_no_model_") when no providers are configured. The exclude_providers filter helps but doesn't guarantee providers are available. Without guards, tests will call litellm.completion with model="_no_model_", causing failures.

Add a guard at the start of each parametrized test:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

Apply to: test_01_simple_chat, test_02_multi_turn_conversation, test_03_single_tool_call, test_04_multiple_tool_calls, test_05_end2end_tool_calling, test_06_automatic_function_calling, test_07_image_url, test_08_image_base64, test_09_multiple_images, test_10_complex_end2end, and test_13_streaming.

Also applies to: 159-175, 176-192, 193-210, 211-252, 253-270, 271-281, 282-292, 293-308, 309-353, 421-461

♻️ Duplicate comments (8)
tests/integrations/tests/utils/common.py (1)

1822-1829: Handle SDK objects in get_content_string.

content list elements may be OpenAI SDK objects (e.g., ChatCompletionMessageContentPartText) rather than dicts. Calling .get() on these raises AttributeError.

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""
tests/integrations/tests/utils/parametrize.py (1)

14-15: Use explicit union syntax for optional parameters.

The implicit Optional style is deprecated in modern Python typing. Use explicit union syntax with None.

Apply this diff:

 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:
core/providers/gemini/chat.go (1)

457-474: Handle JSON unmarshaling errors for streaming tool call arguments.

Line 461 calls json.Unmarshal without checking the error, which means malformed JSON in toolCall.Function.Arguments will be silently ignored and argsMap will remain empty, potentially causing incorrect tool call data in the streaming response.

Apply this diff to add error handling:

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Log the error and continue with empty argsMap
+								continue
+							}
 						}
 						if toolCall.Function.Name != nil {
 							fc := &FunctionCall{
 								Name: *toolCall.Function.Name,
 								Args: argsMap,
 							}
 							if toolCall.ID != nil {
 								fc.ID = *toolCall.ID
 							}
 							parts = append(parts, &Part{FunctionCall: fc})
 						}
 					}
 				}
tests/integrations/tests/integrations/test_anthropic.py (2)

180-201: Add sentinel provider/model guards to parametrized tests.

All parametrized tests using get_cross_provider_params_for_scenario can receive sentinel values ("_no_providers_", "_no_model_") when no providers are configured. Without guards, tests will call anthropic_client.messages.create with invalid model names, causing failures instead of graceful skips.

Add a guard at the top of each parametrized test:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

Also applies to: 211-241, 255-303, 316-334, 335-417, 568-587


721-787: Replace hardcoded invalid model with config-driven approach.

Line 729 uses the invalid hardcoded model "anthropic/claude-sonnet-4-5" which doesn't match current Anthropic model IDs. The non-streaming thinking test (test_15) correctly uses get_model("anthropic", "chat").

Apply this diff:

         stream = anthropic_client.messages.create(
-            model="anthropic/claude-sonnet-4-5",
+            model=get_model("anthropic", "chat"),
             max_tokens=16000,
             thinking={
                 "type": "enabled",
                 "budget_tokens": 10000,
             },
             messages=messages,
             stream=True,
         )
tests/integrations/tests/integrations/test_openai.py (3)

1095-1100: Previously reviewed: Decorator added as requested.

The @skip_if_no_api_key("openai") decorator has been added as requested in the previous review. The test now properly skips when no API key is available.


1471-1490: Previously reviewed: Error handling fragility still present.

This concern was raised in a previous review. The error check at line 1475 only catches "reasoning" or "not supported" errors. If the model "openai/gpt-5" becomes unavailable or returns a different error (e.g., "model not found", "invalid model"), the exception will be re-raised causing unexpected test failure.

The previous suggestion to broaden error handling or use config-driven model selection remains valid.


51-51: Minor inconsistency: Test case numbering in docstring.

The docstring lists "42. Text Completions - streaming" but the corresponding test is test_40_text_completion_streaming. Should be "40." for consistency.

-42. Text Completions - streaming
+40. Text Completions - streaming
🧹 Nitpick comments (9)
transports/bifrost-http/integrations/router.go (1)

210-210: Comment inconsistency: the code handles nil but comment says "SHOULD NOT BE NIL".

The comment states the converter should not be nil, but lines 516-530 explicitly handle the nil case with a fallback to raw audio output. Either update the comment to clarify it's optional (with fallback behavior), or document that nil triggers default audio/mpeg output.

-	SpeechResponseConverter        SpeechResponseConverter        // Function to convert BifrostSpeechResponse to integration format (SHOULD NOT BE NIL)
+	SpeechResponseConverter        SpeechResponseConverter        // Function to convert BifrostSpeechResponse to integration format (optional: if nil, defaults to raw audio/mpeg output)
transports/bifrost-http/integrations/genai.go (1)

171-176: Minor inefficiency: isSpeechRequest is called twice.

isSpeechRequest(geminiReq) is called once here at line 174, and again inside isTranscriptionRequest at line 206. Consider passing the already-computed IsSpeech flag to avoid redundant checks.

 		// Detect if this is a speech or transcription request by examining the request body
 		// Speech detection takes priority over transcription
 		geminiReq.IsSpeech = isSpeechRequest(geminiReq)
-		geminiReq.IsTranscription = isTranscriptionRequest(geminiReq)
+		geminiReq.IsTranscription = !geminiReq.IsSpeech && hasAudioInput(geminiReq)

Alternatively, modify isTranscriptionRequest to accept the pre-computed speech flag.

tests/integrations/tests/utils/config_loader.py (2)

227-227: Use explicit Optional or PEP 604 syntax.

The type hint integration: str = None implicitly allows None but doesn't declare it in the type. Per PEP 484/604, use explicit typing.

-    def list_models(self, integration: str = None) -> Dict[str, Any]:
+    def list_models(self, integration: Optional[str] = None) -> Dict[str, Any]:

286-286: Remove extraneous f-prefix.

This f-string has no placeholders, making the f prefix unnecessary.

-        print(f"\n🤖 MODEL CONFIGURATIONS (via providers):")
+        print("\n🤖 MODEL CONFIGURATIONS (via providers):")
tests/integrations/tests/utils/common.py (1)

1739-1754: Remove unused valid_event_types variable.

The variable valid_event_types is defined but never used. The validation logic on line 1753 checks for "response." or "error" in the event type string instead.

 def assert_valid_responses_streaming_chunk(chunk: Any):
     """Assert that a responses streaming chunk is valid"""
     assert chunk is not None, "Streaming chunk should not be None"
     assert hasattr(chunk, "type"), "Chunk should have a 'type' attribute"
 
-    # Validate common streaming event types
-    valid_event_types = [
-        "response.created",
-        "response.output_item.added",
-        "response.content_part.added",
-        "response.output_text.delta",
-        "response.function_call_arguments.delta",
-        "response.completed",
-        "response.error",
-    ]
-
     # Log the event type for debugging
     if hasattr(chunk, "type"):
         event_type = chunk.type
         # Don't fail on unknown event types, just warn
         if not any(evt in event_type for evt in ["response.", "error"]):
             print(f"Warning: Unexpected event type: {event_type}")
tests/integrations/tests/integrations/test_openai.py (4)

529-531: Prefix unused variable with underscore.

The content_tools variable is unpacked but never used. Prefix with underscore to indicate intentional non-use.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "openai", timeout=300)
                 )

1334-1336: Prefix unused variable with underscore.

The content variable from the streaming collection is not used in test_37_responses_streaming_with_tools. Use underscore prefix.

-        content, chunk_count, tool_calls_detected, event_types = (
+        _content, chunk_count, tool_calls_detected, event_types = (
             collect_responses_streaming_content(stream, timeout=300)
         )

1119-1133: Consider extracting repeated content extraction to helper function.

The same content extraction pattern appears in tests 32, 33, 34, and 38. Consider extracting to a reusable helper:

def extract_responses_content(response) -> str:
    """Extract text content from Responses API output."""
    content = ""
    for message in response.output:
        if hasattr(message, "content") and message.content:
            if isinstance(message.content, str):
                content += message.content
            elif isinstance(message.content, list):
                for block in message.content:
                    if hasattr(block, "text") and block.text:
                        content += block.text
    return content

This would reduce duplication and centralize the extraction logic.


1283-1286: Verify event type assertions for non-OpenAI providers.

The OpenAI-specific event type check (if provider == "openai") at line 1283 is good, but the comment mentions this is a "known issue" being worked on. Consider adding a TODO or tracking issue reference so this doesn't get forgotten.

         # Check that we got expected event types, some providers do not send in this order
         # this is a known issue and we are working on it
+        # TODO: Track issue for cross-provider event consistency
         if provider == "openai":
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f994102 and 0e1e90a.

📒 Files selected for processing (17)
  • core/providers/anthropic/responses.go (4 hunks)
  • core/providers/cohere/responses.go (2 hunks)
  • core/providers/gemini/chat.go (4 hunks)
  • core/providers/gemini/speech.go (2 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (15 hunks)
  • tests/integrations/tests/integrations/test_openai.py (21 hunks)
  • tests/integrations/tests/utils/common.py (10 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/providers/cohere/responses.go
  • core/providers/anthropic/responses.go
🧰 Additional context used
🧬 Code graph analysis (6)
transports/bifrost-http/integrations/genai.go (8)
core/schemas/bifrost.go (3)
  • BifrostRequest (143-153)
  • SpeechRequest (94-94)
  • TranscriptionRequest (96-96)
transports/bifrost-http/handlers/inference.go (2)
  • SpeechRequest (206-210)
  • TranscriptionRequest (212-216)
transports/bifrost-http/integrations/router.go (2)
  • SpeechResponseConverter (106-106)
  • TranscriptionResponseConverter (110-110)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (157-180)
core/schemas/transcriptions.go (1)
  • BifrostTranscriptionResponse (16-26)
core/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/providers/gemini/types.go (3)
  • GeminiGenerationRequest (54-72)
  • GenerationConfig (630-696)
  • ModalityAudio (715-715)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/transcription.go (4)
core/providers/gemini/types.go (2)
  • GeminiGenerationRequest (54-72)
  • FileData (1034-1042)
core/schemas/transcriptions.go (2)
  • BifrostTranscriptionRequest (3-10)
  • TranscriptionParameters (32-45)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
core/providers/gemini/speech.go (4)
core/schemas/speech.go (3)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (1)
  • Type (777-777)
tests/integrations/tests/integrations/test_litellm.py (3)
tests/integrations/tests/utils/common.py (2)
  • get_provider_voice (983-1019)
  • get_provider_voices (1022-1043)
tests/integrations/tests/utils/config_loader.py (2)
  • get_model (138-159)
  • get_model (441-443)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
🪛 Ruff (0.14.5)
tests/integrations/tests/integrations/test_anthropic.py

181-181: Unused method argument: test_config

(ARG002)


195-195: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


230-230: Unused method argument: test_config

(ARG002)


257-257: Unused method argument: test_config

(ARG002)


318-318: Unused method argument: test_config

(ARG002)


336-336: Unused method argument: test_config

(ARG002)


361-361: Unused method argument: test_config

(ARG002)


387-387: Unused method argument: test_config

(ARG002)


569-569: Unused method argument: test_config

(ARG002)


602-602: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


611-611: Unused method argument: test_config

(ARG002)


647-647: Unused method argument: test_config

(ARG002)


722-722: Unused method argument: test_config

(ARG002)


758-758: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/utils/config_loader.py

153-156: Avoid specifying long messages outside the exception class

(TRY003)


227-227: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


233-233: Avoid specifying long messages outside the exception class

(TRY003)


236-236: Avoid specifying long messages outside the exception class

(TRY003)


286-286: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_google.py

230-230: Unused method argument: test_config

(ARG002)


244-244: Unused method argument: test_config

(ARG002)


265-265: Unused method argument: test_config

(ARG002)


288-288: Unused method argument: test_config

(ARG002)


313-313: Unused method argument: test_config

(ARG002)


348-348: Unused method argument: test_config

(ARG002)


368-368: Unused method argument: test_config

(ARG002)


381-381: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


410-410: Unused method argument: test_config

(ARG002)


502-502: Unused method argument: test_config

(ARG002)


550-550: Unused method argument: test_config

(ARG002)


563-563: Unused method argument: test_config

(ARG002)


570-570: Unused method argument: test_config

(ARG002)


594-594: Unused method argument: test_config

(ARG002)


620-620: Unused method argument: test_config

(ARG002)


644-644: Unused method argument: test_config

(ARG002)


668-668: Unused method argument: test_config

(ARG002)


691-691: Unused method argument: test_config

(ARG002)


723-723: Unused method argument: test_config

(ARG002)


777-777: Unused method argument: test_config

(ARG002)


821-821: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


252-252: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


286-286: Unused method argument: test_config

(ARG002)


328-328: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


355-355: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


496-496: Unused method argument: test_config

(ARG002)


529-529: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


540-540: Unused method argument: test_config

(ARG002)


1096-1096: Unused method argument: test_config

(ARG002)


1107-1107: Unused method argument: test_config

(ARG002)


1140-1140: Unused method argument: test_config

(ARG002)


1170-1170: Unused method argument: test_config

(ARG002)


1212-1212: Unused method argument: test_config

(ARG002)


1262-1262: Unused method argument: test_config

(ARG002)


1316-1316: Unused method argument: test_config

(ARG002)


1334-1334: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1354-1354: Unused method argument: test_config

(ARG002)


1497-1497: Unused method argument: test_config

(ARG002)


1518-1518: Unused method argument: test_config

(ARG002)

tests/integrations/tests/integrations/test_litellm.py

146-146: Unused method argument: test_config

(ARG002)


146-146: Unused method argument: provider

(ARG002)


160-160: Unused method argument: test_config

(ARG002)


160-160: Unused method argument: provider

(ARG002)


177-177: Unused method argument: test_config

(ARG002)


177-177: Unused method argument: provider

(ARG002)


194-194: Unused method argument: test_config

(ARG002)


194-194: Unused method argument: provider

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: provider

(ARG002)


254-254: Unused method argument: test_config

(ARG002)


254-254: Unused method argument: provider

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: provider

(ARG002)


283-283: Unused method argument: test_config

(ARG002)


283-283: Unused method argument: provider

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: provider

(ARG002)


310-310: Unused method argument: test_config

(ARG002)


310-310: Unused method argument: provider

(ARG002)


422-422: Unused method argument: test_config

(ARG002)


422-422: Unused method argument: provider

(ARG002)

tests/integrations/tests/utils/common.py

1702-1702: Avoid specifying long messages outside the exception class

(TRY003)


1724-1726: Avoid specifying long messages outside the exception class

(TRY003)


1739-1739: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1804-1804: Avoid specifying long messages outside the exception class

(TRY003)


1814-1816: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/utils/parametrize.py

14-14: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (30)
transports/bifrost-http/integrations/router.go (1)

104-107: LGTM!

The SpeechResponseConverter type definition follows the established pattern of other response converters in this file (e.g., ChatResponseConverter, EmbeddingResponseConverter), accepting the context and response, returning an interface and error.

transports/bifrost-http/integrations/genai.go (5)

68-73: LGTM!

The speech and transcription response converters follow the established pattern. The underlying ToGeminiSpeechResponse and ToGeminiTranscriptionResponse functions handle nil inputs gracefully.


201-226: LGTM!

The transcription detection logic correctly identifies audio input while excluding speech (TTS) requests. The nil checks for InlineData and FileData are properly performed before accessing their MIMEType fields. Go's range over nil slices is safe.


228-249: LGTM!

The isAudioMimeType function correctly validates audio MIME types using only the audio/ prefix check, as recommended in the previous review. It properly handles case-insensitivity and strips MIME parameters.


41-48: LGTM!

The request converter correctly routes to speech and transcription request types based on the flags set during URL parsing. The priority order (embedding → speech → transcription → chat) is logical. Both ToBifrostSpeechRequest() and ToBifrostTranscriptionRequest() methods are properly implemented in their respective files.


183-199: The review comment is incorrect.

GenerationConfig is a value type (not a pointer), so it cannot be nil. The code safely accesses req.GenerationConfig.ResponseModalities (a slice, safe to iterate over) and req.GenerationConfig.SpeechConfig (a pointer, properly nil-checked). There is no dereference risk.

Likely an incorrect or invalid review comment.

core/providers/gemini/utils.go (2)

115-122: LGTM!

The type assertion fallback to an empty map on line 118-121 is a safe defensive pattern that prevents panics when the conversion returns an unexpected type.


124-155: Recursive type normalization is well-implemented.

The convertTypeToLowerCase helper correctly:

  • Handles nested maps and arrays recursively
  • Only lowercases string values under the "type" key
  • Preserves all other values unchanged

This ensures Gemini's uppercase type values (e.g., "STRING", "OBJECT") are normalized to lowercase JSON Schema types throughout nested schema structures.

tests/integrations/tests/utils/config_loader.py (2)

15-23: Provider mapping is well-structured.

The INTEGRATION_TO_PROVIDER_MAP provides a clean abstraction that decouples integration names from underlying provider configurations. The mapping of google to gemini correctly reflects that the Google integration uses the Gemini provider.


306-422: New provider-centric API methods look good.

The new methods (get_provider_model, is_provider_available, get_available_providers, provider_supports_scenario, get_providers_for_scenario, get_scenario_capability) provide a clean interface for cross-provider testing and follow consistent patterns with appropriate null checks and fallbacks.

tests/integrations/tests/utils/common.py (4)

808-810: Good addition for handling empty streaming deltas.

This early return correctly handles providers like Cohere that emit content-start events with empty text, preventing spurious assertion failures.


924-926: Proper tool call detection for Anthropic streaming.

Adding input_json_delta handling ensures tool calls are correctly detected when Anthropic streams function argument data.


1414-1430: Good extension for Google/Gemini transcription streaming.

The handlers correctly extract text from Google GenAI's GenerateContentResponse structure by navigating through candidates → content → parts → text.

Also applies to: 1523-1534


983-1043: Provider voice utilities are well-structured.

The get_provider_voice and get_provider_voices functions provide clean abstractions for provider-specific voice configurations, with sensible defaults for unknown providers.

core/providers/gemini/types.go (3)

67-68: New internal flags for request routing.

The IsTranscription and IsSpeech flags provide clean internal markers for routing Gemini requests to appropriate handlers without affecting JSON serialization.


964-984: MarshalJSON implementation is correct.

The encoding uses base64.StdEncoding with a matching comment, which is consistent with Gemini API expectations for standard base64 output.


927-962: Remove this review comment — the URL-safe base64 conversion is necessary.

The code is correct and justified. Web search findings confirm that the Python GenAI SDK schema shows the Blob.data field annotated as format "base64url" (URL-safe). If the SDK specifies base64url, you should handle URL-safe base64 conversion. The existing comment at line 928 already documents this: "data field which can be sent as a base64-encoded string from the Google GenAI SDK." The conversion from URL-safe to standard base64 (lines 945–953) is not defensive code—it correctly handles the actual format the SDK sends. The review's assertion that "the Gemini API expects standard base64" contradicts the SDK schema specification.

Likely an incorrect or invalid review comment.

core/providers/gemini/transcription.go (3)

10-23: Provider detection and model prefix logic is correct.

The code properly uses ParseModelString to extract provider and model, then correctly adds the google/ prefix for Vertex provider when not already present. This aligns with the ParseModelString utility shown in relevant snippets.


30-63: Content extraction logic handles multiple parts correctly.

The loop properly:

  • Concatenates multiple text parts with spaces
  • Appends audio data from multiple InlineData parts
  • Captures the first audio MIME type encountered

One minor consideration: appending audio from multiple parts (line 42) assumes they're all the same format. If parts contain different audio formats, concatenation may produce invalid audio.


48-61: File-URI storage without outbound usage confirmed; unclear if intentional.

The verification confirms the original observation: file_uri is stored in ExtraParams["file_uri"] at line 57, but ToGeminiTranscriptionRequest() (lines 121–137) only extracts safety_settings, cached_content, and labels from ExtraParamsfile_uri is never retrieved or converted back to a FileData part for the outbound request.

The code comment at lines 48–49 indicates awareness that FileData handling is incomplete ("would need to be fetched separately"). However, since a past review marked this as addressed, it's unclear whether:

  • This is an accepted limitation (FileData-only scenarios intentionally unsupported)
  • The fix was incomplete or reverted
  • FileData URIs require special handling before inclusion

Clarify the intended behavior for FileData-only transcription requests and whether this gap requires resolution.

core/providers/gemini/speech.go (1)

9-88: LGTM! Robust speech request conversion with comprehensive voice config support.

The implementation correctly handles:

  • Vertex normalization with google/ prefix
  • Text extraction from multiple content parts
  • Both single-speaker and multi-speaker voice configurations
  • Optional response modalities in ExtraParams

The bidirectional conversion maintains proper mapping between Gemini and Bifrost speech representations.

tests/integrations/config.yml (1)

32-236: LGTM! Well-structured provider-centric configuration.

The configuration effectively supports cross-provider testing with:

  • Clear provider-to-model mappings for each capability
  • Comprehensive scenario-to-capability mapping
  • Provider API key environment variable declarations
  • Detailed model capabilities matrix

This structure enables the parametrized testing approach used in the test files.

tests/integrations/tests/integrations/test_google.py (2)

181-187: LGTM! Defensive improvements to image loading.

The addition of User-Agent headers, timeout, and explicit status checking prevents common issues when fetching images from external servers.


214-223: LGTM! Clean PCM to WAV conversion helper.

The helper function properly uses the wave module to convert raw PCM data to WAV format for validation in speech tests.

tests/integrations/tests/integrations/test_litellm.py (1)

90-92: LGTM! Clear provider exclusion for LiteLLM compatibility.

The LITELLM_EXCLUDED_PROVIDERS constant explicitly documents that Bedrock and Cohere don't work well through LiteLLM proxy, preventing test failures.

tests/integrations/tests/integrations/test_openai.py (5)

199-199: LGTM: Increased timeout aligns with PR objectives.

The timeout increase from 30s to 300s is appropriate for cross-provider integration tests that may have variable latency.


221-232: LGTM: Cross-provider parametrization implemented correctly.

The use of get_cross_provider_params_for_scenario and format_provider_model enables clean cross-provider testing. The test_config fixture is a standard pytest pattern and the static analysis warning is a false positive.


1496-1515: LGTM: Text completion tests appropriately OpenAI-specific.

Text completions use legacy models (gpt-3.5-turbo-instruct) that are provider-specific, so using @skip_if_no_api_key("openai") instead of cross-provider parametrization is correct.


1517-1561: LGTM: Streaming text completion test is well-structured.

The test properly validates streaming behavior with chunk count assertions and reasonable content checks for the haiku prompt.


545-546: LGTM: Voice abstraction improves maintainability.

Using get_provider_voice("openai", "primary") and get_provider_voices() instead of hardcoded voice names enables easier updates if provider voice options change.

Comment thread tests/integrations/tests/test_google.py
Comment thread transports/bifrost-http/integrations/router.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
core/providers/gemini/speech.go (1)

103-121: Guard against nil Input in ToGeminiSpeechRequest and acknowledge improved multi‑voice handling.

bifrostReq.Input is a pointer; if bifrostReq.Input.Input != "" will panic if Input is nil (e.g., malformed or partial upstream request). It’s safer to nil‑check before dereferencing.

At the same time, the updated condition on VoiceConfig correctly enables both single‑voice and multi‑voice paths only when there’s actual data.

-	// Convert speech input to Gemini format
-	if bifrostReq.Input.Input != "" {
+	// Convert speech input to Gemini format
+	if bifrostReq.Input != nil && bifrostReq.Input.Input != "" {
 		geminiReq.Contents = []Content{
 			{
 				Parts: []*Part{
 					{
 						Text: bifrostReq.Input.Input,
 					},
 				},
 			},
 		}
 
-		// Add speech config to generation config if voice config is provided
-		if bifrostReq.Params != nil && bifrostReq.Params.VoiceConfig != nil {
-			// Handle both single voice and multi-voice configurations
-			if bifrostReq.Params.VoiceConfig.Voice != nil || len(bifrostReq.Params.VoiceConfig.MultiVoiceConfig) > 0 {
-				addSpeechConfigToGenerationConfig(&geminiReq.GenerationConfig, bifrostReq.Params.VoiceConfig)
-			}
-		}
+		// Add speech config to generation config if voice config is provided
+		if bifrostReq.Params != nil && bifrostReq.Params.VoiceConfig != nil {
+			// Handle both single voice and multi-voice configurations
+			if bifrostReq.Params.VoiceConfig.Voice != nil || len(bifrostReq.Params.VoiceConfig.MultiVoiceConfig) > 0 {
+				addSpeechConfigToGenerationConfig(&geminiReq.GenerationConfig, bifrostReq.Params.VoiceConfig)
+			}
+		}
 	}
core/providers/gemini/transcription.go (1)

159-166: Add nil check for bifrostReq.Input to prevent potential panic.

Line 159 accesses bifrostReq.Input.File without verifying that Input is non-nil. If this function is called with a manually constructed request where Input is nil, it will panic.

Apply this diff to add a nil check:

 // Add audio file if present
-if len(bifrostReq.Input.File) > 0 {
+if bifrostReq.Input != nil && len(bifrostReq.Input.File) > 0 {
   parts = append(parts, &Part{
     InlineData: &Blob{
       MIMEType: detectAudioMimeType(bifrostReq.Input.File),
       Data:     bifrostReq.Input.File,
     },
   })
 }
♻️ Duplicate comments (7)
core/providers/gemini/chat.go (1)

457-474: Handle unmarshaling errors for streaming tool call arguments.

Line 461 calls json.Unmarshal without checking the error, which was flagged in previous reviews. If unmarshaling fails, argsMap remains empty and the error is silently ignored, potentially causing malformed tool calls. Note that the non-streaming path (line 501) correctly handles this error.

Apply this diff to add error handling consistent with the non-streaming path:

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Log or handle error; fallback to empty args
+								argsMap = map[string]interface{}{}
+							}
 						}
 						if toolCall.Function.Name != nil {
 							fc := &FunctionCall{
 								Name: *toolCall.Function.Name,
 								Args: argsMap,
 							}
 							if toolCall.ID != nil {
 								fc.ID = *toolCall.ID
 							}
 							parts = append(parts, &Part{FunctionCall: fc})
 						}
 					}
 				}
core/providers/gemini/speech.go (1)

25-37: Avoid merging multiple text parts without separators in ToBifrostSpeechRequest.

This reintroduces the earlier issue: multiple part.Text values get concatenated as Helloworld instead of Hello world. For speech, that’s undesirable and breaks spacing between segments.

Consider adding a separator when appending subsequent parts:

-	// Extract text input from contents
-	var textInput string
-	for _, content := range request.Contents {
-		for _, part := range content.Parts {
-			if part.Text != "" {
-				textInput += part.Text
-			}
-		}
-	}
+	// Extract text input from contents, preserving spacing between parts
+	var textInput string
+	for _, content := range request.Contents {
+		for _, part := range content.Parts {
+			if part.Text == "" {
+				continue
+			}
+			if textInput != "" {
+				textInput += " "
+			}
+			textInput += part.Text
+		}
+	}
tests/integrations/tests/utils/common.py (1)

1822-1829: Fix get_content_string to handle SDK content objects, not just dicts.

This helper still assumes list elements are dicts and calls .get, which will raise AttributeError for SDK objects like ChatCompletionMessageContentPartText. It was previously flagged and remains a latent test crash.

You can safely support both dicts and SDK objects like this:

 def get_content_string(content: Any) -> str:
     """Get a string representation of content"""
     if isinstance(content, str):
         return content
     elif isinstance(content, list):
-        return " ".join([c.get("text", "") for c in content])
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(getattr(c, "text") or "")
+        return " ".join(filter(None, parts))
     else:
         return ""
tests/integrations/tests/integrations/test_openai.py (2)

1095-1100: @skip_if_no_api_key decorator added; consider adding limit for consistency.

The decorator was added as requested. However, for consistency with Anthropic (line 614: limit=5) and Google tests, consider adding a limit parameter to make the test more deterministic. Note: Per developer feedback, the limit parameter may work differently for OpenAI's models endpoint, so this is optional.


1471-1490: Broaden error handling to catch model availability errors.

The error handling still only checks for "reasoning" or "not supported" in the error string. If the model "openai/gpt-5" becomes unavailable or returns a different error format (e.g., "model not found", "invalid model"), the exception bypasses your fallback and re-raises unexpectedly.

Consider expanding the error check:

-            if "reasoning" in error_str or "not supported" in error_str:
+            if any(term in error_str for term in ["reasoning", "not supported", "model", "not found", "invalid"]):
tests/integrations/tests/integrations/test_anthropic.py (2)

180-192: Guard against sentinel provider/model tuples in cross-provider tests.

Same issue as flagged in past review: get_cross_provider_params_for_scenario can return ("_no_providers_", "_no_model_") when no providers are available. Add a guard at the start of each parametrized test to skip gracefully:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

This applies to all parametrized tests in this file.


727-737: Replace hardcoded model with config-driven approach.

The streaming thinking test still uses the hardcoded "anthropic/claude-sonnet-4-5" which doesn't match official Anthropic model IDs. The non-streaming thinking test (test_15) correctly uses get_model("anthropic", "chat"). Align both tests:

         stream = anthropic_client.messages.create(
-            model="anthropic/claude-sonnet-4-5",
+            model=get_model("anthropic", "chat"),
             max_tokens=16000,
             thinking={
                 "type": "enabled",
                 "budget_tokens": 10000,
             },
             messages=messages,
             stream=True,
         )
🧹 Nitpick comments (14)
core/providers/gemini/transcription.go (1)

81-102: Consider initializing ExtraParams once to reduce repetition.

The code checks and initializes ExtraParams three times (lines 82-84, 90-92, 98-100). While safe, this is repetitive.

Consider initializing once after line 73:

 if bifrostReq.Params == nil {
   bifrostReq.Params = &schemas.TranscriptionParameters{}
 }
+if bifrostReq.Params.ExtraParams == nil {
+  bifrostReq.Params.ExtraParams = make(map[string]interface{})
+}

 // Set prompt if provided
 if promptText != "" {
   bifrostReq.Params.Prompt = &promptText
 }

 // Handle safety settings from request
 if len(request.SafetySettings) > 0 {
-  if bifrostReq.Params.ExtraParams == nil {
-    bifrostReq.Params.ExtraParams = make(map[string]interface{})
-  }
   bifrostReq.Params.ExtraParams["safety_settings"] = request.SafetySettings
 }

 // Handle cached content
 if request.CachedContent != "" {
-  if bifrostReq.Params.ExtraParams == nil {
-    bifrostReq.Params.ExtraParams = make(map[string]interface{})
-  }
   bifrostReq.Params.ExtraParams["cached_content"] = request.CachedContent
 }

 // Handle labels
 if len(request.Labels) > 0 {
-  if bifrostReq.Params.ExtraParams == nil {
-    bifrostReq.Params.ExtraParams = make(map[string]interface{})
-  }
   bifrostReq.Params.ExtraParams["labels"] = request.Labels
 }
tests/integrations/tests/utils/common.py (1)

1739-1747: Remove unused valid_event_types variable.

valid_event_types is computed but never used in assert_valid_responses_streaming_chunk; it can be dropped to silence the Ruff F841 warning and reduce noise.

tests/integrations/tests/utils/parametrize.py (1)

12-16: Modernize optional parameter typing to use | None.

include_providers and exclude_providers are annotated as List[str] = None, which Ruff flags (RUF013). Prefer explicit unions:

-from typing import List, Tuple
+from typing import List, Tuple
@@
 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:
transports/bifrost-http/integrations/genai.go (1)

35-56: Speech vs transcription routing for GenAI looks correct and MIME checks are sane.

Routing GeminiGenerationRequest into embedding vs speech vs transcription vs chat based on IsEmbedding, IsSpeech, and IsTranscription plus the new helpers (isSpeechRequest, isTranscriptionRequest, isAudioMimeType) cleanly separates text‑to‑speech from speech‑to‑text. The audio detection is now strictly based on proper "audio/*" MIME types, which matches Gemini expectations and avoids the previous bare‑string pitfalls.

If you ever need to micro‑optimize, isTranscriptionRequest could read req.IsSpeech instead of recomputing, but that’s purely cosmetic.

Also applies to: 166-181, 183-226, 228-249

tests/integrations/tests/integrations/test_litellm.py (1)

90-93: Confirm LiteLLM always yields OpenAI‑shaped streaming chunks across providers.

test_13_streaming parametrizes over multiple providers/models but still calls:

content, chunk_count, tool_calls_detected = collect_streaming_content(
    stream, "openai", timeout=120  # LiteLLM uses OpenAI format
)

This assumes LiteLLM surfaces streaming chunks with OpenAI choices[0].delta... semantics for all underlying providers you exercise here. If any backend ever returns a non‑OpenAI‑shaped stream, collect_streaming_content(..., "openai", ...) will misinterpret or fail on those chunks.

If you expect such providers now or in the future, consider either:

  • passing a provider‑specific integration tag into collect_streaming_content, or
  • documenting that LiteLLM normalizes all streams to OpenAI’s shape in this test suite.

Also applies to: 421-453

tests/integrations/tests/integrations/test_google.py (1)

214-224: Verify PCM format assumptions for Gemini TTS → WAV conversion.

convert_pcm_to_wav assumes the inline audio from Google/Gemini TTS is 16‑bit mono PCM at 24kHz:

def convert_pcm_to_wav(pcm_data: bytes, channels: int = 1, sample_rate: int = 24000, sample_width: int = 2) -> bytes:
    ...

and the speech tests read inline_data.data directly into this helper before running assert_valid_speech_response. This is likely correct for current Gemini TTS models, but if Google changes the default audio format (e.g., different sample rate, bit depth, or channels), these tests will start failing in non‑obvious ways.

It’s worth double‑checking the current Gemini TTS audio format docs and, if needed, either:

  • derive sample_rate/channels from model metadata, or
  • document in a comment that these constants are tied to the specific "audio_format": "pcm" contract.

Also applies to: 690-856

tests/integrations/config.yml (1)

32-92: Provider-centric config is a good upgrade, but clean up the stray bedrock block and clarify Bedrock key usage.

The new providers, provider_api_keys, provider_scenarios, and scenario_capabilities sections give a much clearer, capability‑driven view of what each provider can do and back the cross‑provider parametrization nicely.

Two small follow‑ups:

  1. Stray bedrock mapping after scenario_capabilities.
    The bedrock: block at lines 238‑244 (with chat, vision, text_completion, alternatives) sits at the top level, separate from the new providers map and effectively duplicates older structure. It doesn’t appear to be read by the new config loader and can be confusing; consider removing it or moving any still‑needed values into the canonical providers.bedrock entry.

  2. Bedrock API key naming.
    provider_api_keys.bedrock points to BEDROCK_API_KEY, while get_api_key("bedrock") in common.py still expects AWS_ACCESS_KEY_ID for Bedrock. If you intend provider_api_keys to gate provider availability, this mismatch means Bedrock may be treated as “disabled” from the provider‑side checks even when AWS creds are set. Either align these names or document the split so future changes don’t accidentally break Bedrock detection.

Also applies to: 93-237, 238-244

tests/integrations/tests/integrations/test_openai.py (3)

529-531: Prefix unused variable with underscore.

The unpacked variable content_tools is never used. Prefix it with an underscore to indicate it's intentionally unused.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "openai", timeout=300)
                 )

1334-1336: Prefix unused variable with underscore.

The unpacked variable content is never used in this test. Prefix it with an underscore.

-        content, chunk_count, tool_calls_detected, event_types = (
+        _content, chunk_count, tool_calls_detected, event_types = (
             collect_responses_streaming_content(stream, timeout=300)
         )

221-232: Guard against sentinel provider/model tuples in cross-provider tests.

The get_cross_provider_params_for_scenario utility returns ("_no_providers_", "_no_model_") when no providers are available. Without a guard, these tests will attempt to call the API with invalid model names and fail instead of gracefully skipping.

Consider adding a guard at the start of each parametrized test:

if provider == "_no_providers_" or model == "_no_model_":
    pytest.skip("No providers configured for this scenario")

This applies to all parametrized tests in the file (test_01 through test_09, test_13, test_32 through test_37).

tests/integrations/tests/integrations/test_anthropic.py (2)

602-604: Prefix unused variable with underscore.

The unpacked variable content_tools is never used. Prefix it with an underscore to indicate it's intentionally unused.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
                 )

758-759: Remove extraneous f-string prefix.

This f-string has no placeholders. Remove the f prefix.

-                            print(f"Thinking block started")
+                            print("Thinking block started")
tests/integrations/tests/utils/config_loader.py (2)

227-228: Use explicit Optional type hint.

PEP 484 prohibits implicit Optional. Update the type hint:

-    def list_models(self, integration: str = None) -> Dict[str, Any]:
+    def list_models(self, integration: Optional[str] = None) -> Dict[str, Any]:

286-286: Remove extraneous f-string prefix.

This f-string has no placeholders:

-        print(f"\n🤖 MODEL CONFIGURATIONS (via providers):")
+        print("\n🤖 MODEL CONFIGURATIONS (via providers):")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f994102 and 0e1e90a.

📒 Files selected for processing (17)
  • core/providers/anthropic/responses.go (4 hunks)
  • core/providers/cohere/responses.go (2 hunks)
  • core/providers/gemini/chat.go (4 hunks)
  • core/providers/gemini/speech.go (2 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
  • core/providers/gemini/types.go (3 hunks)
  • core/providers/gemini/utils.go (2 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/tests/integrations/test_anthropic.py (16 hunks)
  • tests/integrations/tests/integrations/test_google.py (16 hunks)
  • tests/integrations/tests/integrations/test_litellm.py (15 hunks)
  • tests/integrations/tests/integrations/test_openai.py (21 hunks)
  • tests/integrations/tests/utils/common.py (10 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/providers/cohere/responses.go
  • core/providers/gemini/types.go
🧰 Additional context used
🧬 Code graph analysis (7)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (3)
  • ChatAssistantMessageToolCall (483-489)
  • ChatStreamResponseChoice (530-532)
  • ChatNonStreamResponseChoice (524-527)
core/providers/gemini/types.go (4)
  • Role (12-12)
  • Content (876-884)
  • Part (890-914)
  • FunctionCall (1045-1055)
core/providers/anthropic/responses.go (1)
core/providers/anthropic/types.go (2)
  • AnthropicThinking (64-67)
  • AnthropicContentBlock (146-158)
tests/integrations/tests/integrations/test_litellm.py (3)
tests/integrations/tests/utils/common.py (2)
  • get_provider_voice (983-1019)
  • get_provider_voices (1022-1043)
tests/integrations/tests/utils/config_loader.py (2)
  • get_model (138-159)
  • get_model (441-443)
tests/integrations/tests/utils/parametrize.py (1)
  • get_cross_provider_params_for_scenario (12-47)
core/providers/gemini/speech.go (4)
core/providers/gemini/types.go (6)
  • GeminiGenerationRequest (54-72)
  • GenerationConfig (630-696)
  • SpeechConfig (846-855)
  • VoiceConfig (825-828)
  • PrebuiltVoiceConfig (819-822)
  • MultiSpeakerVoiceConfig (840-843)
core/schemas/speech.go (3)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
core/schemas/utils.go (1)
  • ParseModelString (21-34)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (1)
  • Type (777-777)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/config_loader.py (6)
  • get_config (429-434)
  • get_providers_for_scenario (392-408)
  • get_providers_for_scenario (476-478)
  • get_scenario_capability (410-422)
  • get_provider_model (306-325)
  • get_provider_model (456-458)
🪛 Ruff (0.14.5)
tests/integrations/tests/integrations/test_litellm.py

146-146: Unused method argument: test_config

(ARG002)


146-146: Unused method argument: provider

(ARG002)


160-160: Unused method argument: test_config

(ARG002)


160-160: Unused method argument: provider

(ARG002)


177-177: Unused method argument: test_config

(ARG002)


177-177: Unused method argument: provider

(ARG002)


194-194: Unused method argument: test_config

(ARG002)


194-194: Unused method argument: provider

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: provider

(ARG002)


254-254: Unused method argument: test_config

(ARG002)


254-254: Unused method argument: provider

(ARG002)


272-272: Unused method argument: test_config

(ARG002)


272-272: Unused method argument: provider

(ARG002)


283-283: Unused method argument: test_config

(ARG002)


283-283: Unused method argument: provider

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: provider

(ARG002)


310-310: Unused method argument: test_config

(ARG002)


310-310: Unused method argument: provider

(ARG002)


422-422: Unused method argument: test_config

(ARG002)


422-422: Unused method argument: provider

(ARG002)

tests/integrations/tests/integrations/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


252-252: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


286-286: Unused method argument: test_config

(ARG002)


328-328: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


355-355: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


496-496: Unused method argument: test_config

(ARG002)


529-529: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


540-540: Unused method argument: test_config

(ARG002)


1096-1096: Unused method argument: test_config

(ARG002)


1107-1107: Unused method argument: test_config

(ARG002)


1140-1140: Unused method argument: test_config

(ARG002)


1170-1170: Unused method argument: test_config

(ARG002)


1212-1212: Unused method argument: test_config

(ARG002)


1262-1262: Unused method argument: test_config

(ARG002)


1316-1316: Unused method argument: test_config

(ARG002)


1334-1334: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1354-1354: Unused method argument: test_config

(ARG002)


1497-1497: Unused method argument: test_config

(ARG002)


1518-1518: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/config_loader.py

153-156: Avoid specifying long messages outside the exception class

(TRY003)


227-227: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


233-233: Avoid specifying long messages outside the exception class

(TRY003)


236-236: Avoid specifying long messages outside the exception class

(TRY003)


286-286: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/utils/common.py

1702-1702: Avoid specifying long messages outside the exception class

(TRY003)


1724-1726: Avoid specifying long messages outside the exception class

(TRY003)


1739-1739: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1804-1804: Avoid specifying long messages outside the exception class

(TRY003)


1814-1816: Avoid specifying long messages outside the exception class

(TRY003)

tests/integrations/tests/integrations/test_anthropic.py

181-181: Unused method argument: test_config

(ARG002)


195-195: Unused method argument: test_config

(ARG002)


212-212: Unused method argument: test_config

(ARG002)


230-230: Unused method argument: test_config

(ARG002)


257-257: Unused method argument: test_config

(ARG002)


318-318: Unused method argument: test_config

(ARG002)


336-336: Unused method argument: test_config

(ARG002)


361-361: Unused method argument: test_config

(ARG002)


387-387: Unused method argument: test_config

(ARG002)


569-569: Unused method argument: test_config

(ARG002)


602-602: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


611-611: Unused method argument: test_config

(ARG002)


647-647: Unused method argument: test_config

(ARG002)


722-722: Unused method argument: test_config

(ARG002)


758-758: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/integrations/test_google.py

230-230: Unused method argument: test_config

(ARG002)


244-244: Unused method argument: test_config

(ARG002)


265-265: Unused method argument: test_config

(ARG002)


288-288: Unused method argument: test_config

(ARG002)


313-313: Unused method argument: test_config

(ARG002)


348-348: Unused method argument: test_config

(ARG002)


368-368: Unused method argument: test_config

(ARG002)


381-381: Unused method argument: test_config

(ARG002)


392-392: Unused method argument: test_config

(ARG002)


410-410: Unused method argument: test_config

(ARG002)


502-502: Unused method argument: test_config

(ARG002)


550-550: Unused method argument: test_config

(ARG002)


563-563: Unused method argument: test_config

(ARG002)


570-570: Unused method argument: test_config

(ARG002)


594-594: Unused method argument: test_config

(ARG002)


620-620: Unused method argument: test_config

(ARG002)


644-644: Unused method argument: test_config

(ARG002)


668-668: Unused method argument: test_config

(ARG002)


691-691: Unused method argument: test_config

(ARG002)


723-723: Unused method argument: test_config

(ARG002)


777-777: Unused method argument: test_config

(ARG002)


821-821: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/parametrize.py

14-14: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
🔇 Additional comments (22)
core/providers/gemini/chat.go (1)

32-33: LGTM! Tool call correlation logic is well-implemented.

The addition of previousToolCalls tracking elegantly solves the function response correlation problem. The fallback logic (lines 113-120) that searches by function name when an explicit ID is absent ensures robust matching, and updating the tracker after emitting tool calls (lines 200-201) maintains the necessary state for future correlation.

Also applies to: 113-120, 200-201

core/providers/anthropic/responses.go (4)

318-320: Good defensive type handling for thinking parameter.

The code now correctly handles both cases where thinking might already be an *AnthropicThinking pointer or needs to be parsed from a map. The safe type assertions prevent potential panics.


649-649: ContentIndex now correctly set for tool_use blocks.

This addresses the previous review concern about inconsistent ContentIndex handling. Tool use blocks now properly track their index, maintaining consistency with text content blocks.


684-684: ContentIndex correctly set for MCP tool use blocks.

Consistent with the fix at line 649, MCP tool use blocks now also properly track their content index. This ensures uniform index tracking across all content block types.


462-463: Ignore this review comment — the change is correct per Anthropic's API specification.

The Content field in AnthropicMessageResponse is typed as []AnthropicContentBlock (a non-nullable array), not as a pointer or wrapper. Anthropic's official API defines content as an array of content-block objects — not a nullable field. Initializing it to an empty slice []AnthropicContentBlock{} when there are no content blocks is the correct behavior and matches the API contract. You should expect an array (possibly empty) for "content" in responses — not null.

This is not a breaking change; it's a fix that ensures the response adheres to the actual Anthropic API specification.

Likely an incorrect or invalid review comment.

core/providers/gemini/speech.go (1)

39-85: Voice and multi‑speaker mapping logic looks consistent.

The conversion from SpeechConfigschemas.SpeechVoiceInput correctly handles:

  • Single‑speaker VoiceConfig via PrebuiltVoiceConfig.VoiceName.
  • Multi‑speaker MultiSpeakerVoiceConfig.SpeakerVoiceConfigs[]schemas.VoiceConfig, skipping incomplete entries and only allocating when there are speaker configs.

The storage of ResponseModalities into Params.ExtraParams["response_modalities"] is also a reasonable Gemini‑specific escape hatch.

core/providers/gemini/transcription.go (2)

10-23: LGTM!

The model string parsing and Vertex provider normalization with the "google/" prefix is correctly implemented.


178-256: LGTM!

The response conversion functions correctly handle:

  • Nil checks for response structures
  • Safe extraction of usage metadata with pointer checks
  • Proper token type conversions (int to int32)
  • Text concatenation from multiple parts
core/providers/gemini/utils.go (1)

79-101: Type normalization for Gemini schemas looks correct and consistent.

Lowercasing schema.Type and recursively normalizing all "type" fields in convertSchemaToMap should make the tool parameter schemas much more JSON‑Schema/consumer friendly without changing meaning; the helper is side‑effect free and safely falls back to {} on marshal/unmarshal failures.

Also applies to: 103-122, 124-155

tests/integrations/tests/utils/common.py (1)

148-195: Responses API helpers and flat tool shape align with the repo’s conventions.

The new RESPONSES_* fixtures and helpers (convert_to_responses_tools, assert_valid_responses_response, streaming collectors/assertions) look consistent and give good coverage for Responses flows. In particular, convert_to_responses_tools correctly emits the flat {"type": "function", "name": ..., "description": ..., "parameters": ...} format this repo models, rather than the Chat Completions function wrapper. Based on learnings, this matches the intended Responses tool schema here.

Also applies to: 1583-1731

transports/bifrost-http/integrations/router.go (1)

104-107: Speech response converter wiring is correct and fixes the missing write path.

The new SpeechResponseConverter type and RouteConfig field, plus the updated speech branch in handleNonStreamingRequest, now (a) surface converter errors via sendError and (b) actually send the converted payload, while preserving the raw‑audio fallback when no converter is set. This is the expected behavior for integrations that opt into custom speech formats.

Also applies to: 199-216, 510-530

tests/integrations/tests/integrations/test_google.py (1)

138-167: Tool and image helpers for Google GenAI look solid for the current test data.

Lowercasing parameter type fields in convert_to_google_tools and wrapping them in types.Tool(function_declarations=[...]) is a reasonable, minimal schema for the simple WEATHER/CALCULATOR tools used here. Likewise, load_image_from_url’s updated User‑Agent, timeout, and raise_for_status() should make the image tests far less flaky on sources like Wikipedia while still normalizing images into a compact JPEG Part.

Also applies to: 169-212

tests/integrations/tests/integrations/test_openai.py (3)

199-201: Timeout increased to 300s - good for handling longer API calls.

The default timeout of 300 seconds aligns with the PR objective to fix timeout issues in integration tests. This is reasonable for cross-provider testing where response times can vary significantly.


1102-1134: Responses API tests are well-structured with comprehensive validation.

The new Responses API tests (test_32 through test_37) provide good coverage of:

  • Simple text input with content keyword validation
  • System message handling
  • Image input processing
  • Tool calls with proper function call validation
  • Streaming with event type checking
  • Streaming with tools

The provider-specific event checking at line 1283-1286 is appropriately guarded.


1496-1561: Text Completions tests are appropriate for legacy API.

The text completion tests correctly use the legacy gpt-3.5-turbo-instruct model and validate both synchronous and streaming responses. Good coverage for backward compatibility.

tests/integrations/tests/integrations/test_anthropic.py (3)

92-94: Timeout increased to 120s for Anthropic client.

The timeout increase from 30s to 120s aligns with the PR objectives to fix timeout issues in API calls.


502-507: Good defensive null check for response content.

The added check if final_response.content and len(final_response.content) > 0 properly handles cases where Anthropic returns empty content when the tool result is sufficient. This is a valid fix per the PR objectives.


610-644: Comprehensive pagination test for models listing.

The test properly validates:

  • Basic listing with limit
  • Response structure (first_id, last_id, has_more)
  • Forward pagination with after_id
  • Backward pagination with before_id

This is thorough coverage for the list models API.

tests/integrations/tests/utils/config_loader.py (4)

15-23: Good abstraction for integration-to-provider mapping.

The INTEGRATION_TO_PROVIDER_MAP provides a clean mapping between integration names (used by SDK clients) and underlying provider configurations. This enables the cross-provider testing framework.


306-422: Provider-centric methods are well-designed.

The new methods provide a clean API for:

  • get_provider_model(): Retrieving models by provider and capability
  • get_provider_api_key_env(): Getting API key environment variable names
  • is_provider_available(): Checking if a provider has API keys configured
  • get_available_providers(): Listing available providers
  • provider_supports_scenario(): Checking scenario support
  • get_providers_for_scenario(): Getting providers that support a scenario
  • get_scenario_capability(): Mapping scenarios to capabilities

These enable the cross-provider testing framework effectively.


248-269: Validation updated for provider-centric configuration.

The validation now:

  • Requires a providers section
  • Verifies all integrations map to configured providers

This ensures configuration consistency with the new provider abstraction.


456-478: Module-level convenience functions provide clean API.

The convenience functions wrap the ConfigLoader class methods, providing a simple functional interface for common operations. This is consistent with the existing pattern in the file.

Comment thread core/providers/gemini/transcription.go
Comment thread core/providers/gemini/transcription.go
@akshaydeo akshaydeo changed the base branch from 11-22-moves_all_tests_related_to_core_to_corresponding_files to graphite-base/671 November 25, 2025 11:49
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from 0e1e90a to e74c019 Compare November 25, 2025 11:49
@graphite-app graphite-app Bot changed the base branch from graphite-base/671 to main November 25, 2025 11:50
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from e74c019 to fea0509 Compare November 25, 2025 11:50
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from fea0509 to 4d9311d Compare November 25, 2025 21:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
ui/app/_fallbacks/enterprise/components/login/loginView.tsx (1)

51-67: Critical: Missing dependencies in useEffect causes stale closures.

The useEffect dependency array only includes isLoadingIsAuthEnabled, but the effect body references isAuthEnabledError (line 56), isAuthEnabled (line 60), hasValidToken (line 60), and router (line 61). This violates React's exhaustive-deps rule and causes stale closures.

Impact: The effect won't re-run when isAuthEnabled, hasValidToken, or isAuthEnabledError change, preventing the component from responding to auth state changes. This breaks the authentication flow.

Apply this diff to restore the required dependencies:

-	}, [isLoadingIsAuthEnabled]);
+	}, [isLoadingIsAuthEnabled, isAuthEnabledError, isAuthEnabled, hasValidToken, router]);

Note: React 19 with Strict Mode should help catch this during development. Ensure ESLint with react-hooks/exhaustive-deps rule is enabled.

core/providers/gemini/speech.go (1)

94-128: Potential nil pointer dereference on bifrostReq.Input.

Line 100 accesses bifrostReq.Params.ResponseFormat without checking if Params is nil, and line 110 accesses bifrostReq.Input.Input without checking if Input is nil. Both could cause panics.

 func ToGeminiSpeechRequest(bifrostReq *schemas.BifrostSpeechRequest) (*GeminiGenerationRequest, error) {
 	if bifrostReq == nil {
 		return nil, fmt.Errorf("bifrostReq is nil")
 	}
 	// Here we confirm if the response_format is wav or empty string
 	// If its anything else, we will return an error
-	if bifrostReq.Params.ResponseFormat != "" && bifrostReq.Params.ResponseFormat != "wav" {
+	if bifrostReq.Params != nil && bifrostReq.Params.ResponseFormat != "" && bifrostReq.Params.ResponseFormat != "wav" {
 		return nil, fmt.Errorf("gemini does not support response_format: %s. Only wav or empty string is supported which defaults to wav", bifrostReq.Params.ResponseFormat)
 	}
 	// Create the base Gemini generation request
 	geminiReq := &GeminiGenerationRequest{
 		Model: bifrostReq.Model,
 	}
 	// Convert parameters to generation config
 	geminiReq.GenerationConfig.ResponseModalities = []Modality{ModalityAudio}
 	// Convert speech input to Gemini format
-	if bifrostReq.Input.Input != "" {
+	if bifrostReq.Input != nil && bifrostReq.Input.Input != "" {
tests/integrations/tests/test_litellm.py (1)

174-190: Critical: Test method incorrectly named as class name - test will never execute.

The method on line 180 is named TestLiteLLMIntegration which is the class name, not test_01_simple_chat. Pytest won't recognize this as a test method because it doesn't follow the test_* naming convention.

Apply this diff to fix the method name:

     @pytest.mark.parametrize(
         "provider, model",
         get_cross_provider_params_for_scenario(
             "simple_chat", exclude_providers=LITELLM_EXCLUDED_PROVIDERS
         ),
     )
-    def TestLiteLLMIntegration(self, test_config, provider, model):
+    def test_01_simple_chat(self, test_config, provider, model):
         """Test Case 1: Simple chat interaction"""
tests/integrations/tests/utils/common.py (1)

455-515: Bedrock tool calls are missing id while assert_has_tool_calls now requires it.

extract_tool_calls now attaches "id" for OpenAI (choice.message.tool_calls) and Anthropic (tool_use) entries, but the Bedrock branch still builds tool_calls without id. Since assert_has_tool_calls now asserts "id" in tool_call, any Bedrock tool-call tests will fail even when the structure is otherwise correct.

You can synthesize an ID from Bedrock’s toolUse (e.g., toolUseId) to keep tests consistent:

     elif isinstance(response, dict) and "output" in response and "message" in response["output"]:
         message = response["output"]["message"]
         if "content" in message:
             for content in message["content"]:
                 if "toolUse" in content:
                     tool_use = content["toolUse"]
-                    tool_calls.append({
-                        "name": tool_use["name"],
-                        "arguments": tool_use["input"]
-                    })
+                    tool_calls.append(
+                        {
+                            # Bedrock toolUse usually exposes a toolUseId; fall back to empty string if absent.
+                            "id": tool_use.get("toolUseId") or tool_use.get("id", ""),
+                            "name": tool_use["name"],
+                            "arguments": tool_use["input"],
+                        }
+                    )

This keeps assert_has_tool_calls consistent across providers without tightening Bedrock semantics beyond what the service already returns.

Also applies to: 566-571

♻️ Duplicate comments (6)
transports/bifrost-http/integrations/router.go (1)

516-542: PostCallback and nil check added correctly.

The speech response handling now properly:

  1. Executes PostCallback if configured
  2. Checks for nil speechResponse before processing

Minor formatting issue persists: Line 536 has }else { instead of } else {.

-		}else {
+		} else {
tests/integrations/tests/utils/parametrize.py (1)

12-47: Good utility for cross-provider test parametrization.

The function correctly:

  1. Filters providers by scenario support
  2. Applies include/exclude filters
  3. Maps scenarios to capabilities and models
  4. Returns deterministic ordering via sorted()
  5. Provides fallback dummy tuple to avoid pytest collection errors

Type hint improvement needed: Lines 14-15 use implicit Optional which violates PEP 484.

 def get_cross_provider_params_for_scenario(
     scenario: str,
-    include_providers: List[str] = None,
-    exclude_providers: List[str] = None,
+    include_providers: List[str] | None = None,
+    exclude_providers: List[str] | None = None,
 ) -> List[Tuple[str, str]]:
core/providers/gemini/chat.go (1)

456-474: Missing error handling for JSON unmarshaling in streaming path.

Line 461 calls json.Unmarshal without checking the returned error. If toolCall.Function.Arguments contains malformed JSON, the error is silently ignored and argsMap remains empty.

 				// Handle tool calls in streaming
 				if delta.ToolCalls != nil {
 					for _, toolCall := range delta.ToolCalls {
 						argsMap := make(map[string]interface{})
 						if toolCall.Function.Arguments != "" {
-							json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap)
+							if err := json.Unmarshal([]byte(toolCall.Function.Arguments), &argsMap); err != nil {
+								// Log or skip - malformed args shouldn't block the entire tool call
+								continue
+							}
 						}
tests/integrations/tests/test_anthropic.py (1)

736-746: Hardcoded model name persists in streaming thinking test.

This issue was flagged in previous reviews. The hardcoded "anthropic/claude-sonnet-4-5" should be replaced with get_model("anthropic", "chat") to match test_15_extended_thinking and use config-driven model selection.

core/providers/gemini/speech.go (1)

28-36: Text concatenation still missing separators between parts.

This issue was flagged in a previous review. When multiple text parts exist, they are concatenated directly without spaces, which could produce incorrect speech input (e.g., "Helloworld" instead of "Hello world").

tests/integrations/tests/test_google.py (1)

244-268: Missing sentinel provider/model guards in most parametrized tests.

While test_01_simple_chat has the proper sentinel guard (lines 231-232), most other parametrized tests are missing this check. This was flagged in previous reviews. Tests without guards will fail with API errors instead of gracefully skipping when no providers are configured.

Affected tests include: test_02, test_03, test_04, test_05, test_06, test_07, test_08, test_09, test_13, test_14, test_22, test_24, test_25.

🧹 Nitpick comments (32)
.gitignore (1)

43-52: Consolidate duplicate .gitignore patterns.

The newly added Python-specific section contains patterns that already exist earlier in the file:

  • **/__pycache__/** (line 6 and 47)
  • **/venv/ (line 5 and 48)
  • **/.pytest_cache/ (line 50 and 52 — also duplicated within the new section)

Additionally, .venv (line 8, top-level only) and **/.venv/ (line 49, all levels) use slightly different patterns for the same purpose.

Consider removing the duplicates and consolidating patterns for clarity:

- # Python specific
- **/__pycache__/**
- **/venv/
- **/.venv/
- **/.pytest_cache/
- **/.coverage/
- **/.pytest_cache/
+ # Python specific
+ **/.coverage/

This keeps the new .coverage/ pattern while relying on the existing __pycache__, venv, and .pytest_cache entries.

plugins/semanticcache/plugin_edge_cases_test.go (1)

245-279: Image URL host change is fine; consider centralizing test image URLs.

Switching these two image URLs to the new R2 host is behavior‑neutral and looks correct. If this same URL is (or may be) used in other tests, consider extracting it into a shared test constant/fixture so future host or path changes don’t require touching multiple call sites.

tests/integrations/tests/conftest.py (1)

7-19: Logging setup in pytest_configure is reasonable; consider configurability later.

Global logging.basicConfig at ERROR for integration tests is fine and should cut noise. If you ever need more verbose logs for debugging, you might later make the level configurable via an env var (e.g., INTEGRATION_LOG_LEVEL) but it’s not required for this PR.

transports/bifrost-http/handlers/config.go (1)

107-137: Consider also defaulting auth_config when the store exists but no auth row is found.

The new else branch correctly provides a default auth_config when ConfigStore is nil, so callers always see an auth block in that mode. However, when ConfigStore is non‑nil and GetAuthConfig returns nil (no row but no error), auth_config is still omitted from mapConfig.

If your goal is for /api/config to always include auth_config, you may want to add a second else after the if authConfig != nil { ... } to set the same disabled defaults in that case as well.

.github/workflows/test-coverage.yml (1)

10-29: Consider adding explicit permissions.

The workflow uses the default GITHUB_TOKEN permissions. While not critical for this workflow, explicitly declaring minimal permissions follows security best practices.

Add this block after line 9:

+permissions:
+  contents: read
+
 jobs:
   # Check if pipeline should be skipped based on first line of commit message
   check-skip:
core/internal/testutil/speech_synthesis.go (1)

36-59: Test case names are inconsistent with dynamic format selection.

The test case names (e.g., BasicText_Primary_MP3, MediumText_Secondary_MP3) hardcode "MP3" in the name, but the actual format is now dynamically determined by GetProviderDefaultFormat(testConfig.Provider). For Gemini, this returns "wav", making the test names misleading in test output.

Consider using a more generic naming convention:

 {
-    name:           "BasicText_Primary_MP3",
+    name:           "BasicText_Primary",
     text:           TTSTestTextBasic,
     voiceType:      "primary",
     format:         GetProviderDefaultFormat(testConfig.Provider),

Or dynamically include the format in the test name if format-specific naming is desired.

core/internal/testutil/speech_synthesis_stream.go (1)

225-242: Solid audio validation with provider-specific PCM handling.

The validation block properly:

  1. Checks for accumulated audio data
  2. Converts PCM to WAV specifically for Gemini
  3. Validates the audio codec
  4. Provides clear error messages on failure

Note: This pattern is repeated across multiple test functions. Consider extracting to a helper function for maintainability, though acceptable as-is for test code.

transports/bifrost-http/integrations/router.go (2)

210-210: Comment is misleading: SpeechResponseConverter is optional.

The comment says "SHOULD NOT BE NIL" but the implementation (lines 528-542) properly handles the nil case with fallback behavior (returning raw audio with appropriate headers). Consider updating the comment to reflect that it's optional.

-	SpeechResponseConverter        SpeechResponseConverter        // Function to convert BifrostSpeechResponse to integration format (SHOULD NOT BE NIL)
+	SpeechResponseConverter        SpeechResponseConverter        // Function to convert BifrostSpeechResponse to integration format (optional: falls back to raw audio/mpeg)

304-304: Minor: Trailing whitespace.

Lines 304 and 328 have trailing whitespace that could be cleaned up.

tests/integrations/tests/test_langchain.py (2)

702-703: Unconditional skip renders test ineffective.

@pytest.mark.skipif(True, ...) always skips the test, meaning it never runs. If the test is permanently broken, consider:

  1. Using @pytest.mark.xfail(reason="...") to track expected failures while still running the test
  2. Adding a condition for when it should be skipped
  3. Removing the test if it's no longer relevant
-    @pytest.mark.skipif(True, reason="Known flaky test")
+    @pytest.mark.xfail(reason="Gemini streaming via LangChain is unstable", strict=False)

814-830: Silent exception handling makes debugging difficult.

The try-except-pass pattern (lines 829-830) silently swallows all exceptions when testing Gemini, making it impossible to know if the provider failed or why.

Consider logging failures even if not failing the test:

         except Exception:
-            pass
+            import logging
+            logging.getLogger(__name__).debug("Gemini provider test failed", exc_info=True)
tests/integrations/tests/test_anthropic.py (2)

611-613: Unused variable: prefix content_tools with underscore.

The unpacked variable content_tools is never used. Per static analysis, prefix it with an underscore to indicate it's intentionally unused.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
                 )

767-767: Remove extraneous f-string prefix.

The f-string on line 767 has no placeholders. Remove the f prefix.

-                            print(f"Thinking block started")
+                            print("Thinking block started")
core/internal/testutil/audio_validation.go (2)

108-115: Cleanup code is commented out - temp files will accumulate.

The cleanup logic for removing temporary audio files is commented out, which will cause test artifacts to accumulate in the temp directory over time. Consider either:

  1. Re-enabling the cleanup, or
  2. Adding a comment explaining why cleanup is intentionally disabled (e.g., for debugging purposes)

If cleanup should be disabled for debugging, add a comment:

 	// Register cleanup to delete file regardless of test outcome
+	// NOTE: Cleanup disabled to preserve audio files for manual inspection during debugging.
+	// Re-enable for production test runs to prevent temp file accumulation.
 	// t.Cleanup(func() {

354-388: WAV chunk parsing loop may have edge case issues.

The loop condition offset < len(data)-8 could behave unexpectedly if len(data) is less than 8 (though the earlier check requires at least 44 bytes). However, reading chunkSize from potentially malformed data and using it directly could cause issues:

  1. If chunkSize is very large (malformed/malicious data), offset += 8 + chunkSize could overflow or skip beyond the buffer
  2. The bounds check on line 364 offset+8+chunkSize > len(data) only applies when chunkID == "fmt "

Consider adding a sanity check on chunkSize:

 	for offset < len(data)-8 {
 		chunkID := string(data[offset : offset+4])
 		chunkSize := int(binary.LittleEndian.Uint32(data[offset+4 : offset+8]))
+
+		// Sanity check to prevent overflow or reading beyond buffer
+		if chunkSize < 0 || offset+8+chunkSize > len(data) {
+			break
+		}

 		if chunkID == "fmt " {
 			foundFmt = true
-			if offset+8+chunkSize > len(data) {
-				return fmt.Errorf("fmt chunk extends beyond file")
-			}
tests/integrations/tests/test_google.py (2)

169-178: Redundant import and unused variable in load_image_from_url.

Static analysis flagged:

  1. Line 173: base64 is already imported at line 40, redefinition is unnecessary
  2. Line 177: header is unpacked but never used
 def load_image_from_url(url: str):
     """Load image from URL for Google GenAI"""
     from google.genai import types
     import io
-    import base64

     if url.startswith("data:image"):
         # Base64 image - extract the base64 data part
-        header, data = url.split(",", 1)
+        _, data = url.split(",", 1)
         img_data = base64.b64decode(data)

725-776: Multi-speaker test uses hardcoded provider "google" - consider parametrization.

The test_23_speech_generation_multi_speaker test uses hardcoded get_provider_voice("google", ...) calls instead of the provider parameter. This is acceptable since multi-speaker may be Google-specific, but consider adding a comment to clarify this is intentionally Google-only.

tests/integrations/tests/test_litellm.py (2)

130-144: Minor: Use timezone-aware datetime and prefix unused parameter.

Two small issues in this mock function:

  1. datetime.datetime.utcnow() is deprecated in Python 3.12+ in favor of datetime.datetime.now(datetime.timezone.utc)
  2. The request parameter is unused (static analysis hint ARG001)
     def mock_refresh(self, request):
         """Mock refresh that sets a dummy token - Bifrost handles real auth"""
         import datetime
 
         self.token = "dummy-access-token-bifrost-handles-auth"
-        self.expiry = datetime.datetime.utcnow() + datetime.timedelta(hours=1)
+        self.expiry = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(hours=1)

For the unused request parameter, it's part of the signature being mocked, so it should remain but could be prefixed with _ to indicate intentional non-use: def mock_refresh(self, _request):.


506-532: Good timeout increase; prefix unused variable.

The timeout increase from 30s to 120s aligns with the PR objective to fix integration test timeout issues. The content_tools variable on line 526 is unpacked but never used.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(
-                    stream_with_tools, "openai", timeout=120  # LiteLLM uses OpenAI format
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(
+                    stream_with_tools, "openai", timeout=120  # LiteLLM uses OpenAI format
                 )
tests/integrations/tests/utils/config_loader.py (2)

228-247: Fix type hint for optional parameter.

The integration parameter defaults to None but isn't typed as Optional[str], which violates PEP 484.

-    def list_models(self, integration: str = None) -> Dict[str, Any]:
+    def list_models(self, integration: Optional[str] = None) -> Dict[str, Any]:
         """List all models for an integration or all integrations"""

287-296: Remove unnecessary f-string prefix.

Line 287 uses an f-string without any placeholders. As flagged by static analysis (F541).

         # Model configurations
-        print(f"\n🤖 MODEL CONFIGURATIONS (via providers):")
+        print("\n🤖 MODEL CONFIGURATIONS (via providers):")
tests/integrations/tests/test_openai.py (2)

526-537: Prefix unused variable from tuple unpacking.

The content_tools variable is unpacked but never used (RUF059).

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "openai", timeout=300)
                 )

1331-1351: Prefix unused variable from tuple unpacking.

The content variable on line 1334 is unpacked but never used (RUF059).

         # Collect streaming content
-        content, chunk_count, tool_calls_detected, event_types = (
+        _content, chunk_count, tool_calls_detected, event_types = (
             collect_responses_streaming_content(stream, timeout=300)
         )
tests/integrations/tests/utils/common.py (9)

612-638: Broadened image keyword list risks false positives in image assertions.

Adding very generic tokens like "has", "with", "this is", "there is", "there are", and "it is" means assert_valid_image_response can pass even if the model completely ignores the image (e.g., “There is a cat…” for a purely textual answer). Consider tightening to more image-specific terms or requiring at least one strong visual keyword (e.g., image/picture/photo/see/show/depicts) in addition to the generic ones.

Example tweak:

-        "has",
-        "with",
-        "this is",
-        "there is",
-        "there are",
-        "here is",
-        "it is",
+        # Keep keywords that are clearly tied to describing visuals
+        # (avoid extremely generic tokens that appear in most text)

Or change the check to require at least one “core” visual keyword alongside the extended list.


837-840: Empty OpenAI deltas now fully bypass final-chunk validation.

The early return for completely empty deltas means you no longer verify finish_reason on final chunks that carry only termination metadata. If you still care about asserting finish_reason on the last chunk, consider only short‑circuiting non‑final empty deltas:

-        # Ignore completely empty deltas (like Cohere content-start with empty text)
-        if not (has_content or has_tool_calls or has_role):
-            return
-
-        # Allow empty deltas for final chunks (they just signal completion)
-        if not is_final:
+        # Ignore completely empty non-final deltas (like some provider content-start events)
+        if not (has_content or has_tool_calls or has_role) and not is_final:
+            return
+
+        # Allow empty deltas for final chunks (they just signal completion)
+        if not is_final:
             assert (
                 has_content or has_tool_calls or has_role
             ), "OpenAI delta should have content, tool_calls, or role (except for final chunks)"

This keeps robustness for final chunks while remaining tolerant of provider-specific empty deltas.


953-955: Consider excluding input_json_delta.partial_json from user-visible content aggregation.

In collect_streaming_content, you append chunk.delta.partial_json into content_parts while noting it is “not user-visible content”. That can pollute the aggregated content string with raw JSON fragments and make downstream textual assertions harder to reason about.

You can still detect tool calls without mixing partial JSON into user content:

-                elif hasattr(chunk.delta, "type") and chunk.delta.type == "input_json_delta":
-                    content_parts.append(chunk.delta.partial_json)
-                    tool_calls_detected = True
+                elif hasattr(chunk.delta, "type") and chunk.delta.type == "input_json_delta":
+                    # Treat input_json deltas as tool-related, but avoid mixing raw JSON into user-visible text.
+                    tool_calls_detected = True

If you need to assert on the JSON itself, a separate accumulator keyed to tool-call data would be clearer.


1012-1072: Provider voice helpers are sensible; you may want a genai alias.

The get_provider_voice / get_provider_voices helpers correctly normalize provider names and return realistic OpenAI and Gemini voice sets, falling back to OpenAI voices for unknown providers.

Given other helpers in this module already group "google", "gemini", and "genai" together, consider treating "genai" as a Gemini alias here too:

-    elif provider_lower in ["google", "gemini"]:
+    elif provider_lower in ["google", "gemini", "genai"]:
         return {
             "primary": "Kore",
             "secondary": "Puck",
             "tertiary": "Aoede",
         }.get(voice_type, "Kore")

and similarly in get_provider_voices.

Not required for current behavior, but it will avoid surprises if future tests pass "genai" as the provider name.


1444-1459: Google/Gemini streaming transcription validation is tolerant but minimal.

The new branch in assert_valid_streaming_transcription_response safely walks chunk.text and candidate parts for Google/Gemini without raising, but it never asserts the type or non‑emptiness of text_chunk like the OpenAI branch does. If you want symmetric strength, you could add a simple type check:

        if hasattr(chunk, "text"):
            text_chunk = chunk.text
        elif hasattr(chunk, "candidates") and chunk.candidates:
            ...
-        
-        # Note: Google streaming chunks can be empty or contain only metadata
+        # Note: Google streaming chunks can be empty or contain only metadata
+        if text_chunk:
+            assert isinstance(
+                text_chunk, str
+            ), f"Text chunk should be string, got {type(text_chunk)}"

But as-is it’s acceptable if you expect occasional metadata‑only chunks and lean on the collector for final assertions.


1520-1564: Google/Gemini streaming transcription collection matches the new validator.

collect_streaming_transcription_content now correctly accumulates text for Google/Gemini from chunk.text and candidate parts, mirroring the shapes handled in the validator. If you later introduce "genai" as an integration name in tests, consider including it in the integration.lower() in ["google", "gemini"] checks for consistency, but there’s no immediate functional issue.


1612-1624: Responses tool conversion uses the correct flat shape for this repo.

convert_to_responses_tools emits the flat function-tool objects expected by your Responses schema ({"type": "function", "name": ..., "description": ..., "parameters": ...}), which is what the Go schemas and converters consume.

You might slightly expand the docstring to call out that this differs from the Chat Completions "function": { ... } wrapper to prevent future confusion, but no behavioral changes are needed.

Based on learnings, ...


1762-1783: valid_event_types is unused; either use it in the check or drop it.

Ruff correctly notes valid_event_types as unused. You can either:

  • Use it to tighten the warning condition, or
  • Remove it and rely on the existing "response." / "error" heuristic.

Example using it:

-    valid_event_types = [
-        "response.created",
-        "response.output_item.added",
-        "response.content_part.added",
-        "response.output_text.delta",
-        "response.function_call_arguments.delta",
-        "response.completed",
-        "response.error",
-    ]
+    valid_event_types = {
+        "response.created",
+        "response.output_item.added",
+        "response.content_part.added",
+        "response.output_text.delta",
+        "response.function_call_arguments.delta",
+        "response.completed",
+        "response.error",
+    }
@@
-        event_type = chunk.type
-        # Don't fail on unknown event types, just warn
-        if not any(evt in event_type for evt in ["response.", "error"]):
+        event_type = chunk.type
+        # Don't fail on unknown event types, just warn on ones outside the known set
+        if event_type not in valid_event_types:
             print(f"Warning: Unexpected event type: {event_type}")

This both satisfies Ruff and makes the warning more intentional.


1851-1863: Simplify get_content_string attribute access (and satisfy Ruff B009).

The list branch already gates on hasattr(c, "text"), so using getattr(c, "text") is unnecessary and triggers Ruff B009. You can switch to direct attribute access for clarity:

     elif isinstance(content, list):
         parts: List[str] = []
         for c in content:
             if isinstance(c, dict):
                 parts.append(c.get("text", ""))
             elif hasattr(c, "text"):
-                parts.append(getattr(c, "text") or "")
+                parts.append(c.text or "")
         return " ".join(filter(None, parts))

Behavior stays the same while avoiding the lint warning.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e1e90a and 2ddc75a.

⛔ Files ignored due to path filters (10)
  • core/go.sum is excluded by !**/*.sum
  • docs/favicon.ico is excluded by !**/*.ico
  • docs/favicon.png is excluded by !**/*.png
  • docs/media/bifrost-logo-dark.png is excluded by !**/*.png
  • docs/media/bifrost-logo.png is excluded by !**/*.png
  • tests/integrations/uv.lock is excluded by !**/*.lock
  • ui/app/favicon.ico is excluded by !**/*.ico
  • ui/app/favicon.png is excluded by !**/*.png
  • ui/public/bifrost-logo-dark.png is excluded by !**/*.png
  • ui/public/bifrost-logo.png is excluded by !**/*.png
📒 Files selected for processing (59)
  • .github/workflows/npx-publish.yml (1 hunks)
  • .github/workflows/pr-test-notifier.yml (1 hunks)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/snyk.yml (2 hunks)
  • .github/workflows/test-coverage.yml (1 hunks)
  • .gitignore (1 hunks)
  • Makefile (5 hunks)
  • core/go.mod (1 hunks)
  • core/internal/testutil/audio_validation.go (1 hunks)
  • core/internal/testutil/speech_synthesis.go (3 hunks)
  • core/internal/testutil/speech_synthesis_stream.go (14 hunks)
  • core/internal/testutil/utils.go (1 hunks)
  • core/providers/anthropic/responses.go (4 hunks)
  • core/providers/bedrock/responses.go (1 hunks)
  • core/providers/cohere/responses.go (2 hunks)
  • core/providers/gemini/chat.go (4 hunks)
  • core/providers/gemini/gemini.go (4 hunks)
  • core/providers/gemini/speech.go (3 hunks)
  • core/providers/gemini/transcription.go (1 hunks)
  • core/providers/gemini/types.go (55 hunks)
  • core/providers/gemini/utils.go (5 hunks)
  • core/providers/openai/speech.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/vertex/types.go (1 hunks)
  • core/schemas/utils.go (2 hunks)
  • docs/quickstart/gateway/multimodal.mdx (1 hunks)
  • docs/style.css (1 hunks)
  • plugins/semanticcache/plugin_edge_cases_test.go (2 hunks)
  • tests/integrations/.python-version (1 hunks)
  • tests/integrations/Makefile (0 hunks)
  • tests/integrations/README.md (10 hunks)
  • tests/integrations/config.json (1 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/dummy-gcp-credentials.json (1 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/pytest.ini (0 hunks)
  • tests/integrations/requirements.txt (0 hunks)
  • tests/integrations/test_audio.py (0 hunks)
  • tests/integrations/tests/conftest.py (1 hunks)
  • tests/integrations/tests/integrations/__init__.py (0 hunks)
  • tests/integrations/tests/integrations/test_google.py (0 hunks)
  • tests/integrations/tests/test_anthropic.py (20 hunks)
  • tests/integrations/tests/test_bedrock.py (2 hunks)
  • tests/integrations/tests/test_google.py (1 hunks)
  • tests/integrations/tests/test_langchain.py (29 hunks)
  • tests/integrations/tests/test_litellm.py (22 hunks)
  • tests/integrations/tests/test_openai.py (22 hunks)
  • tests/integrations/tests/utils/common.py (16 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • tests/integrations/tests/utils/parametrize.py (1 hunks)
  • transports/bifrost-http/.air.debug.toml (1 hunks)
  • transports/bifrost-http/handlers/config.go (1 hunks)
  • transports/bifrost-http/handlers/session.go (0 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (5 hunks)
  • transports/go.mod (1 hunks)
  • ui/app/_fallbacks/enterprise/components/login/loginView.tsx (3 hunks)
  • ui/components/sidebar.tsx (1 hunks)
  • ui/lib/store/apis/sessionApi.ts (1 hunks)
💤 Files with no reviewable changes (7)
  • tests/integrations/tests/integrations/init.py
  • transports/bifrost-http/handlers/session.go
  • tests/integrations/Makefile
  • tests/integrations/pytest.ini
  • tests/integrations/requirements.txt
  • tests/integrations/test_audio.py
  • tests/integrations/tests/integrations/test_google.py
✅ Files skipped from review due to trivial changes (4)
  • core/providers/openai/speech.go
  • docs/quickstart/gateway/multimodal.mdx
  • docs/style.css
  • ui/lib/store/apis/sessionApi.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • core/providers/anthropic/responses.go
  • core/providers/cohere/responses.go
  • transports/bifrost-http/integrations/genai.go
  • core/providers/gemini/utils.go
  • core/providers/gemini/transcription.go
🧰 Additional context used
🧬 Code graph analysis (12)
core/internal/testutil/utils.go (1)
core/schemas/bifrost.go (1)
  • Gemini (48-48)
core/providers/bedrock/responses.go (3)
core/providers/gemini/types.go (1)
  • Role (13-13)
core/providers/bedrock/types.go (2)
  • BedrockMessageRoleUser (65-65)
  • BedrockMessageRoleAssistant (66-66)
core/schemas/responses.go (1)
  • ResponsesInputMessageRoleUser (328-328)
core/internal/testutil/speech_synthesis.go (1)
core/internal/testutil/utils.go (1)
  • GetProviderDefaultFormat (28-35)
core/providers/gemini/chat.go (2)
core/schemas/chatcompletions.go (3)
  • ChatAssistantMessageToolCall (483-489)
  • ChatStreamResponseChoice (530-532)
  • ChatNonStreamResponseChoice (524-527)
core/providers/gemini/types.go (4)
  • Role (13-13)
  • Content (876-884)
  • Part (890-914)
  • FunctionCall (1045-1055)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/providers/gemini/gemini.go (4)
core/schemas/bifrost.go (1)
  • BifrostContextKey (101-101)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechRequest (94-129)
core/providers/utils/utils.go (1)
  • NewBifrostOperationError (449-460)
core/schemas/provider.go (1)
  • ErrProviderResponseDecode (28-28)
core/providers/gemini/speech.go (7)
core/providers/gemini/types.go (3)
  • GeminiGenerationRequest (55-73)
  • GenerationConfig (631-697)
  • VoiceConfig (826-829)
core/schemas/speech.go (3)
  • BifrostSpeechRequest (9-16)
  • SpeechParameters (43-58)
  • SpeechVoiceInput (65-68)
core/schemas/utils.go (1)
  • ParseModelString (23-36)
core/schemas/models.go (1)
  • Model (109-129)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • Vertex (40-40)
core/providers/gemini/gemini.go (1)
  • BifrostContextKeyResponseFormat (21-21)
core/providers/utils/audio.go (2)
  • ConvertPCMToWAV (32-62)
  • DefaultGeminiPCMConfig (22-28)
tests/integrations/tests/test_bedrock.py (1)
tests/integrations/tests/utils/config_loader.py (5)
  • get_model (139-160)
  • get_model (442-444)
  • get_config (430-435)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
core/internal/testutil/speech_synthesis_stream.go (6)
core/internal/testutil/utils.go (1)
  • GetProviderDefaultFormat (28-35)
core/schemas/speech.go (1)
  • BifrostSpeechStreamResponse (133-138)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • BifrostStream (318-325)
core/providers/utils/audio.go (2)
  • ConvertPCMToWAV (32-62)
  • DefaultGeminiPCMConfig (22-28)
core/internal/testutil/audio_validation.go (1)
  • SaveAndValidateAudio (70-125)
core/internal/testutil/test_retry_framework.go (1)
  • WithSpeechStreamValidationRetry (1988-2126)
tests/integrations/tests/test_langchain.py (1)
tests/integrations/tests/utils/config_loader.py (5)
  • get_model (139-160)
  • get_model (442-444)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
  • get_config (430-435)
tests/integrations/tests/test_google.py (3)
tests/integrations/tests/utils/common.py (1)
  • Config (18-23)
tests/integrations/tests/utils/config_loader.py (4)
  • get_model (139-160)
  • get_model (442-444)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/parametrize.py (1)
tests/integrations/tests/utils/config_loader.py (6)
  • get_config (430-435)
  • get_providers_for_scenario (393-409)
  • get_providers_for_scenario (477-479)
  • get_scenario_capability (411-423)
  • get_provider_model (307-326)
  • get_provider_model (457-459)
🪛 checkmake (0.2.2)
Makefile

[warning] 95-95: Target body for "dev" exceeds allowed length of 5 (37).

(maxbodylength)

🪛 Checkov (3.2.334)
tests/integrations/dummy-gcp-credentials.json

[medium] 5-6: Private Key

(CKV_SECRET_13)

🪛 GitHub Check: CodeQL
.github/workflows/pr-tests.yml

[warning] 20-39: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}

.github/workflows/npx-publish.yml

[warning] 19-37: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}

.github/workflows/test-coverage.yml

[warning] 12-30: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}

🪛 Gitleaks (8.29.0)
tests/integrations/dummy-gcp-credentials.json

[high] 5-5: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

🪛 Ruff (0.14.5)
tests/integrations/tests/test_anthropic.py

183-183: Unused method argument: test_config

(ARG002)


197-197: Unused method argument: test_config

(ARG002)


216-216: Unused method argument: test_config

(ARG002)


234-234: Unused method argument: test_config

(ARG002)


261-261: Unused method argument: test_config

(ARG002)


326-326: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


369-369: Unused method argument: test_config

(ARG002)


395-395: Unused method argument: test_config

(ARG002)


578-578: Unused method argument: test_config

(ARG002)


611-611: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


620-620: Unused method argument: test_config

(ARG002)


656-656: Unused method argument: test_config

(ARG002)


731-731: Unused method argument: test_config

(ARG002)


767-767: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/test_langchain.py

577-577: Do not catch blind exception: Exception

(BLE001)


580-580: Unused method argument: test_config

(ARG002)


636-636: Do not catch blind exception: Exception

(BLE001)


636-636: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


637-637: f-string without any placeholders

Remove extraneous f prefix

(F541)


639-639: Unused method argument: test_config

(ARG002)


699-699: Do not catch blind exception: Exception

(BLE001)


732-732: Do not catch blind exception: Exception

(BLE001)


829-830: try-except-pass detected, consider logging the exception

(S110)


829-829: Do not catch blind exception: Exception

(BLE001)

tests/integrations/tests/test_google.py

173-173: Redefinition of unused base64 from line 40

Remove definition: base64

(F811)


177-177: Unpacked variable header is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


230-230: Unused method argument: test_config

(ARG002)


246-246: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


290-290: Unused method argument: test_config

(ARG002)


315-315: Unused method argument: test_config

(ARG002)


350-350: Unused method argument: test_config

(ARG002)


370-370: Unused method argument: test_config

(ARG002)


383-383: Unused method argument: test_config

(ARG002)


394-394: Unused method argument: test_config

(ARG002)


412-412: Unused method argument: test_config

(ARG002)


448-448: Unused method argument: test_config

(ARG002)


491-491: Unused method argument: test_config

(ARG002)


504-504: Unused method argument: test_config

(ARG002)


552-552: Unused method argument: test_config

(ARG002)


565-565: Unused method argument: test_config

(ARG002)


572-572: Unused method argument: test_config

(ARG002)


596-596: Unused method argument: test_config

(ARG002)


622-622: Unused method argument: test_config

(ARG002)


646-646: Unused method argument: test_config

(ARG002)


670-670: Unused method argument: test_config

(ARG002)


693-693: Unused method argument: test_config

(ARG002)


725-725: Unused method argument: test_config

(ARG002)


779-779: Unused method argument: test_config

(ARG002)


823-823: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/config_loader.py

154-157: Avoid specifying long messages outside the exception class

(TRY003)


228-228: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


234-234: Avoid specifying long messages outside the exception class

(TRY003)


237-237: Avoid specifying long messages outside the exception class

(TRY003)


287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/test_litellm.py

132-132: Unused function argument: request

(ARG001)


136-136: Possible hardcoded password assigned to: "token"

(S105)


180-180: Unused method argument: test_config

(ARG002)


180-180: Unused method argument: provider

(ARG002)


198-198: Unused method argument: test_config

(ARG002)


198-198: Unused method argument: provider

(ARG002)


217-217: Unused method argument: test_config

(ARG002)


217-217: Unused method argument: provider

(ARG002)


239-239: Unused method argument: test_config

(ARG002)


239-239: Unused method argument: provider

(ARG002)


262-262: Unused method argument: test_config

(ARG002)


262-262: Unused method argument: provider

(ARG002)


307-307: Unused method argument: test_config

(ARG002)


307-307: Unused method argument: provider

(ARG002)


330-330: Unused method argument: test_config

(ARG002)


330-330: Unused method argument: provider

(ARG002)


346-346: Unused method argument: test_config

(ARG002)


346-346: Unused method argument: provider

(ARG002)


362-362: Unused method argument: test_config

(ARG002)


362-362: Unused method argument: provider

(ARG002)


384-384: Unused method argument: test_config

(ARG002)


384-384: Unused method argument: provider

(ARG002)


427-427: Unused method argument: test_config

(ARG002)


498-498: Unused method argument: test_config

(ARG002)


498-498: Unused method argument: provider

(ARG002)


526-526: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


535-535: Unused method argument: test_config

(ARG002)


573-573: Do not catch blind exception: Exception

(BLE001)

tests/integrations/tests/utils/common.py

1731-1731: Avoid specifying long messages outside the exception class

(TRY003)


1753-1755: Avoid specifying long messages outside the exception class

(TRY003)


1768-1768: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1833-1833: Avoid specifying long messages outside the exception class

(TRY003)


1843-1845: Avoid specifying long messages outside the exception class

(TRY003)


1861-1861: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

tests/integrations/tests/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


235-235: Unused method argument: test_config

(ARG002)


252-252: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


286-286: Unused method argument: test_config

(ARG002)


328-328: Unused method argument: test_config

(ARG002)


344-344: Unused method argument: test_config

(ARG002)


355-355: Unused method argument: test_config

(ARG002)


366-366: Unused method argument: test_config

(ARG002)


496-496: Unused method argument: test_config

(ARG002)


529-529: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


540-540: Unused method argument: test_config

(ARG002)


1096-1096: Unused method argument: test_config

(ARG002)


1107-1107: Unused method argument: test_config

(ARG002)


1140-1140: Unused method argument: test_config

(ARG002)


1170-1170: Unused method argument: test_config

(ARG002)


1212-1212: Unused method argument: test_config

(ARG002)


1262-1262: Unused method argument: test_config

(ARG002)


1316-1316: Unused method argument: test_config

(ARG002)


1334-1334: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1354-1354: Unused method argument: test_config

(ARG002)


1497-1497: Unused method argument: test_config

(ARG002)


1518-1518: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/parametrize.py

14-14: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

Comment thread .github/workflows/npx-publish.yml
Comment thread .github/workflows/pr-tests.yml
Comment thread core/providers/bedrock/responses.go
Comment thread core/providers/gemini/speech.go
Comment thread tests/integrations/dummy-gcp-credentials.json
Comment thread tests/integrations/tests/test_anthropic.py
Comment thread tests/integrations/tests/test_langchain.py
Comment thread tests/integrations/tests/test_langchain.py
Comment thread tests/integrations/tests/utils/common.py
Comment thread ui/app/_fallbacks/enterprise/components/login/loginView.tsx Outdated
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch 2 times, most recently from 7c9ee84 to 7c937b2 Compare November 25, 2025 21:46
@akshaydeo akshaydeo force-pushed the 10-23-fix_integration_test_cases branch from 7c937b2 to 0d4d2cc Compare November 25, 2025 21:49
Copy link
Copy Markdown
Contributor

akshaydeo commented Nov 25, 2025

Merge activity

  • Nov 25, 9:57 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Nov 25, 9:57 PM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo merged commit 56b3e25 into main Nov 25, 2025
9 of 10 checks passed
@akshaydeo akshaydeo deleted the 10-23-fix_integration_test_cases branch November 25, 2025 21:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
ui/app/_fallbacks/enterprise/components/login/loginView.tsx (1)

50-65: Critical: Missing dependencies in useEffect causes stale closure.

The effect uses isAuthEnabledError, isAuthEnabled, hasValidToken, and router but only includes isLoadingIsAuthEnabled in the dependency array. This violates React's Rules of Hooks and prevents the effect from re-running when these values change. For example, if hasValidToken becomes true after the initial render, the user won't be redirected to /workspace.

Apply this diff to include all dependencies:

-	}, [isLoadingIsAuthEnabled]);
+	}, [isLoadingIsAuthEnabled, isAuthEnabledError, isAuthEnabled, hasValidToken, router]);
core/providers/gemini/gemini.go (1)

384-411: Guard request.Params before reading ResponseFormat to avoid nil dereference.

In Speech, Line 406 assumes request.Params is non‑nil:

ctx = context.WithValue(ctx, BifrostContextKeyResponseFormat, request.Params.ResponseFormat)

If callers omit params in the BifrostSpeechRequest, this will panic. Add a nil‑check:

-	ctx = context.WithValue(ctx, BifrostContextKeyResponseFormat, request.Params.ResponseFormat)
+	if request.Params != nil {
+		ctx = context.WithValue(ctx, BifrostContextKeyResponseFormat, request.Params.ResponseFormat)
+	}

This preserves the new context‑driven formatting without introducing a hard dependency on Params being set.

core/providers/gemini/speech.go (1)

110-125: Guard against nil bifrostReq.Input to avoid panics.

if bifrostReq.Input.Input != "" { ... } will panic when bifrostReq.Input == nil. Nothing in this function guarantees Input is non-nil.

Add a nil check:

-	// Convert speech input to Gemini format
-
-	if bifrostReq.Input.Input != "" {
+	// Convert speech input to Gemini format
+	if bifrostReq.Input != nil && bifrostReq.Input.Input != "" {
 		geminiReq.Contents = []Content{
♻️ Duplicate comments (9)
tests/integrations/dummy-gcp-credentials.json (1)

1-12: Replace embedded PEM with a non-key placeholder

This fixture still contains a realistic PEM-formatted "private_key" value, which secret scanners will treat as a real key and which normalizes committing key material, even if it’s “dummy”.

Consider replacing the value with an obviously invalid placeholder that does not look like a PEM block (and adjusting tests if they currently require PEM structure):

-  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQCY+aj4fvYTj4l9\n...snip...\n-----END PRIVATE KEY-----\n",
+  "private_key": "PRIVATE_KEY_PLACEHOLDER_DO_NOT_USE",

Alternatively, load credentials from an ignored fixture or environment variables instead of committing key-shaped data.

tests/integrations/tests/test_langchain.py (2)

33-35: Remove unused google.cloud.aiplatform_v1beta1.types imports.

endpoint and endpoint_service are not used anywhere in this file; keeping them will trigger lint warnings. Drop them from the import list.


580-637: Tighten Gemini chat test exception handling and fix lint issues.

In test_13_gemini_chat_integration:

except Exception as e:
    pytest.skip(f"Known flaky test")
  • e is unused and the f-string has no interpolation, which Ruff will flag.
  • Blindly catching Exception also hides real regressions.

Prefer at least logging the exception in the skip reason, or narrowing the exception type:

-        except Exception as e:
-            pytest.skip(f"Known flaky test")
+        except Exception as e:
+            pytest.skip(f"Known flaky test: {e}")

This keeps the “known flaky” behavior while making failures more diagnosable and clearing the lint warnings.

tests/integrations/tests/test_anthropic.py (1)

749-840: Replace hard‑coded streaming thinking model with config‑backed model.

test_16_extended_thinking_streaming still hard‑codes:

stream = anthropic_client.messages.create(
    model="anthropic/claude-sonnet-4-5",
    ...
)

This is both:

  • Inconsistent with the non‑streaming thinking test (which uses get_model("anthropic", "chat")), and
  • Likely invalid / out of sync with the Anthropic model IDs defined in tests/integrations/config.yml.

Use the same config‑driven model selection you use elsewhere, e.g.:

-        stream = anthropic_client.messages.create(
-            model="anthropic/claude-sonnet-4-5",
+        stream = anthropic_client.messages.create(
+            model=get_model("anthropic", "chat"),
             max_tokens=16000,
             thinking={
                 "type": "enabled",
                 "budget_tokens": 10000,
             },
             messages=messages,
             stream=True,
         )

(or get_model("anthropic", "thinking") if you add a dedicated capability).

tests/integrations/tests/utils/common.py (1)

1613-1624: Responses tools helper matches the repo’s flat tool shape.

convert_to_responses_tools emits {"type": "function", "name": ..., "description": ..., "parameters": ...}, which aligns with this repo’s Responses schema and provider converters rather than the Chat Completions-style nested function wrapper. That’s the correct shape here; keeping it flat avoids incompatibilities with the existing Go schemas and converters.

Based on learnings, this flat tool format is intentional for the Responses API in this codebase.

core/providers/gemini/speech.go (2)

28-36: Preserve whitespace when concatenating text parts.

textInput += part.Text will merge multiple text parts without any separator (e.g., "Hello" + "world""Helloworld"), which degrades TTS output.

Consider inserting a space (or other separator) between non-empty parts:

-	var textInput string
-	for _, content := range request.Contents {
-		for _, part := range content.Parts {
-			if part.Text != "" {
-				textInput += part.Text
-			}
-		}
-	}
+	var textInput string
+	for _, content := range request.Contents {
+		for _, part := range content.Parts {
+			if part.Text == "" {
+				continue
+			}
+			if textInput != "" {
+				textInput += " "
+			}
+			textInput += part.Text
+		}
+	}

149-161: Use safe context value lookup to avoid panics.

ctx.Value(...).(string) will panic if:

  • the key is missing (Value returns nil), or
  • the stored value is not a string.

It also assumes ctx is always non-nil.

Switch to the ok-idiom and treat missing/invalid values as “no format override”:

-			if len(audioData) > 0 {
-				responseFormat := ctx.Value(BifrostContextKeyResponseFormat).(string)
+			if len(audioData) > 0 {
+				var responseFormat string
+				if ctx != nil {
+					if v, ok := ctx.Value(BifrostContextKeyResponseFormat).(string); ok {
+						responseFormat = v
+					}
+				}
 				// Gemini returns PCM audio (s16le, 24000 Hz, mono)
 				// Convert to WAV for standard playable output format
 				if responseFormat == "wav" {
tests/integrations/tests/test_openai.py (2)

1115-1120: LGTM - decorator added as requested.

The @skip_if_no_api_key("openai") decorator has been properly added. The test correctly validates that models can be listed.

Note: For consistency with Anthropic's test_14_list_models(limit=5) and Google's test_15_list_models(page_size=5), you may consider using limit=5 here as well, but this is optional since the current assertion len(response.data) > 0 is valid.


1503-1522: Error handling could miss model availability errors.

The error check on line 1507 only catches errors containing "reasoning" or "not supported". If the model "openai/gpt-5" becomes unavailable or returns a different error format (e.g., "model not found", "invalid model"), the exception will be re-raised unexpectedly.

This was flagged in a previous review. Consider broadening the check:

if any(term in error_str for term in ["reasoning", "not supported", "model", "not found", "invalid"]):
🧹 Nitpick comments (19)
ui/components/sidebar.tsx (1)

585-594: Branding/header change looks good; margin value is a minor visual nit

The logo + “by Maxim AI” attribution are wired correctly (alt text, routing, external link with noopener noreferrer), and the structure fits the existing header layout well.

Only nit: mt-[11px] on the <span> is a hard-coded tweak that might drift if the header height or font settings change. If you keep tweaking this area later, consider relying on flex alignment or padding (e.g., items-end, pb-*) instead of a magic 11px offset.

.gitignore (1)

43-52: Deduplicate overlapping Python ignore patterns

The new Python block reintroduces patterns already present above (e.g., **/__pycache__/**, **/venv/, **/.pytest_cache/). Functionally fine, but you can drop the duplicates to keep .gitignore leaner.

Makefile (1)

95-132: dev target debug behavior is solid; consider avoiding duplicate workspace setup

The DEBUG-aware dev flow (delve + .env loading + separate debug/non-debug air configs) looks good and is nicely opt-in. Note that setup-workspace runs both as a prerequisite and again inside the recipe:

dev: install-ui install-air setup-workspace $(if $(DEBUG),install-delve)
    ...
    @$(MAKE) setup-workspace >/dev/null

You can likely drop the inner $(MAKE) setup-workspace to avoid redundant work unless there’s a specific reason to re-run it.

tests/integrations/tests/utils/config_loader.py (3)

139-161: Make missing provider/capability failures explicit rather than returning empty strings

get_model() now delegates to get_provider_model(), but get_provider_model() silently returns "" when the provider is missing or a capability isn’t configured:

provider = INTEGRATION_TO_PROVIDER_MAP.get(integration)
...
if "providers" not in self._config:
    return ""
...
if provider not in providers:
    return ""
...
return provider_models.get(capability, "")

That can lead to integration tests calling providers with an empty model name instead of failing fast with a clear configuration error.

Consider tightening this up so configuration issues surface explicitly, e.g.:

     def get_provider_model(self, provider: str, capability: str = "chat") -> str:
-        if "providers" not in self._config:
-            # Fallback to old behavior if providers section doesn't exist
-            return ""
-        
-        providers = self._config["providers"]
-        if provider not in providers:
-            return ""
-        
-        provider_models = providers[provider]
-        return provider_models.get(capability, "")
+        providers = self._config.get("providers") or {}
+        if provider not in providers:
+            raise ValueError(f"Unknown provider: {provider}")
+
+        provider_models = providers[provider]
+        try:
+            return provider_models[capability]
+        except KeyError as exc:
+            raise ValueError(
+                f"Capability '{capability}' not configured for provider '{provider}'"
+            ) from exc

get_model() will then fail with a clear message instead of propagating an empty string into downstream API calls.

Also applies to: 307-327


228-237: Minor type-hint and style cleanups (Ruff: RUF013, F541)

A few small tweaks will keep static analysis quiet and make the types clearer:

  • Explicitly allow None in parameters that default to None:
-    def get_environment_config(self, environment: str = None) -> Dict[str, Any]:
+    def get_environment_config(self, environment: str | None = None) -> Dict[str, Any]:
...
-    def list_models(self, integration: str = None) -> Dict[str, Any]:
+    def list_models(self, integration: str | None = None) -> Dict[str, Any]:
  • After that, you can likely drop the unused Optional import.

  • print(f"\n🤖 MODEL CONFIGURATIONS (via providers):") doesn’t need an f-string; simplify to:

-        print(f"\n🤖 MODEL CONFIGURATIONS (via providers):")
+        print("\n🤖 MODEL CONFIGURATIONS (via providers):")

These are non-functional, but keep Ruff happy and the code a bit cleaner.

Also applies to: 208-219, 287-287


374-424: Expose get_scenario_capability at module level for API symmetry

You added several provider/scenario helpers with module-level wrappers (get_provider_model, is_provider_available, get_providers_for_scenario, etc.), but get_scenario_capability() is only available as a method on ConfigLoader.

For consistency with the rest of this module’s API and to simplify imports in tests, you may want to add a thin wrapper:

 def get_providers_for_scenario(scenario: str) -> List[str]:
     """Convenience function to get providers for scenario"""
     return get_config().get_providers_for_scenario(scenario)
 
+
+def get_scenario_capability(scenario: str) -> str:
+    """Convenience function to get capability type for a scenario"""
+    return get_config().get_scenario_capability(scenario)

Optional, but it keeps the public surface uniform.

Also applies to: 457-479

transports/bifrost-http/integrations/router.go (1)

516-542: Speech non‑streaming path now consistent; consider minor cleanup of style and doc.

The speech branch now correctly mirrors other handlers (PostCallback, nil‑response guard, converter path vs raw audio), which fixes the earlier omission. The only nits:

  • The }else { on Line 536 should be } else { for consistency.
  • The SpeechResponseConverter field comment says “SHOULD NOT BE NIL”, but the code now explicitly supports nil to mean “send raw audio”; aligning the comment with behavior would avoid confusion.
tests/integrations/tests/test_anthropic.py (1)

629-635: Drop or underscore content_tools to satisfy Ruff and clarify intent.

Here:

content_tools, chunk_count_tools, tool_calls_detected_tools = (
    collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
)

content_tools is never used. Either don’t unpack it or assign to _ to make this explicit:

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "anthropic", timeout=300)
                 )
core/internal/testutil/audio_validation.go (1)

17-21: Consider aligning AllowedAudioFormats with supported validators (AAC).

You’ve implemented AAC magic‑byte validation in validateAACMagicBytes and wired it into validateAudioBytesInternal under the "aac" case, but AllowedAudioFormats does not include "aac". As a result, SaveAndValidateAudio will reject AAC blobs even though they can be validated.

If AAC output is expected from any provider, consider adding it:

 var AllowedAudioFormats = map[string]bool{
-	"flac": true, "mp3": true, "mp4": true, "mpeg": true,
-	"mpga": true, "m4a": true, "ogg": true, "wav": true, "webm": true,
+	"flac": true, "mp3": true, "mp4": true, "mpeg": true,
+	"mpga": true, "m4a": true, "ogg": true, "wav": true, "webm": true, "aac": true,
 }

If AAC isn’t actually a supported provider format, it may be clearer to drop the AAC branch from validateAudioBytesInternal instead.

Also applies to: 149-210

tests/integrations/tests/test_google.py (3)

169-188: Tidy imports and unused variables in load_image_from_url.

Two small cleanups:

  • base64 is imported at the module level and re-imported inside this function.
  • header from header, data = url.split(",", 1) is never used.

You can simplify and silence Ruff warnings:

-    from google.genai import types
-    import io
-    import base64
+    from google.genai import types
+    import io
@@
-        # Base64 image - extract the base64 data part
-        header, data = url.split(",", 1)
+        # Base64 image - extract the base64 data part
+        _, data = url.split(",", 1)

585-589: Strengthen list-models assertion to catch misconfiguration.

Right now the test only checks len(response) <= 5, so an empty list would still pass.

If the API is expected to return at least one model when configured, consider:

-        assert response is not None
-        assert len(response) <= 5
+        assert response is not None
+        assert 1 <= len(response) <= 5

(or at least assert len(response) > 0) to detect misconfigured integrations.


229-247: Consider marking test_config as intentionally unused.

test_config is injected but not used here (and in several other tests), which triggers ARG002 from Ruff.

If you want to keep the fixture for future configuration, consider renaming the argument to _test_config:

-    def test_01_simple_chat(self, google_client, test_config, provider, model):
+    def test_01_simple_chat(self, google_client, _test_config, provider, model):

(and similarly for the other methods) to clarify intent and silence the linter.

tests/integrations/tests/test_litellm.py (2)

130-144: Minor: mark request unused in mock_refresh to satisfy Ruff.

mock_refresh(self, request) doesn’t use request, which Ruff flags as ARG001. Since this is a test-only dummy implementation, just mark the parameter as intentionally unused:

-    def mock_refresh(self, request):
+    def mock_refresh(self, _request):

The rest of the mock (dummy token and expiry) is fine for integration tests.


548-550: Avoid binding unused content_tools from collect_streaming_content.

content_tools is never used, which Ruff flags (RUF059). You can discard it explicitly:

-        content_tools, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(
+        _, chunk_count_tools, tool_calls_detected_tools = collect_streaming_content(
             stream_with_tools, "openai", timeout=120  # LiteLLM uses OpenAI format
         )
tests/integrations/tests/utils/common.py (3)

1577-1595: Confirm GEMINI_API_KEY mapping is consistent across test harness.

get_api_key("google") now maps to GEMINI_API_KEY. That’s fine, but only if the rest of the integration test harness (e.g., conftest.py, run_integration_tests.py, docs) also refers to GEMINI_API_KEY as the canonical env var for Google/Gemini.

Please double‑check those call sites and configuration docs to avoid confusion between GOOGLE_API_KEY and GEMINI_API_KEY.


1768-1783: Remove or use valid_event_types in assert_valid_responses_streaming_chunk.

valid_event_types is defined but never used, which Ruff flags (F841). Either:

  • remove the variable altogether, or
  • use it to enforce/validate known event types, e.g.:
-    valid_event_types = [
+    valid_event_types = {
         "response.created",
         "response.output_item.added",
         "response.content_part.added",
         "response.output_text.delta",
         "response.function_call_arguments.delta",
         "response.completed",
         "response.error",
-    ]
+    }
@@
-    if hasattr(chunk, "type"):
-        event_type = chunk.type
-        # Don't fail on unknown event types, just warn
-        if not any(evt in event_type for evt in ["response.", "error"]):
-            print(f"Warning: Unexpected event type: {event_type}")
+    if hasattr(chunk, "type"):
+        event_type = chunk.type
+        if event_type not in valid_event_types:
+            print(f"Warning: Unexpected event type: {event_type}")

1851-1864: Simplify get_content_string attribute access.

You already guard with hasattr(c, "text"), so getattr(c, "text") isn’t adding safety and triggers Ruff B009. You can simplify:

-    elif isinstance(content, list):
-        parts: List[str] = []
-        for c in content:
-            if isinstance(c, dict):
-                parts.append(c.get("text", ""))
-            elif hasattr(c, "text"):
-                parts.append(getattr(c, "text") or "")
-        return " ".join(filter(None, parts))
+    elif isinstance(content, list):
+        parts: List[str] = []
+        for c in content:
+            if isinstance(c, dict):
+                parts.append(c.get("text", ""))
+            elif hasattr(c, "text"):
+                parts.append(c.text or "")
+        return " ".join(filter(None, parts))
tests/integrations/tests/test_openai.py (2)

549-551: Prefix unused unpacked variable with underscore.

content_tools is unpacked but never referenced. Prefix with _ to indicate intentional discard.

-                content_tools, chunk_count_tools, tool_calls_detected_tools = (
+                _content_tools, chunk_count_tools, tool_calls_detected_tools = (
                     collect_streaming_content(stream_with_tools, "openai", timeout=300)
                 )

1366-1368: Prefix unused unpacked variable with underscore.

content is unpacked but never used in this test. Prefix with _ to indicate intentional discard.

         # Collect streaming content
-        content, chunk_count, tool_calls_detected, event_types = (
+        _content, chunk_count, tool_calls_detected, event_types = (
             collect_responses_streaming_content(stream, timeout=300)
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ddc75a and 0d4d2cc.

⛔ Files ignored due to path filters (10)
  • core/go.sum is excluded by !**/*.sum
  • docs/favicon.ico is excluded by !**/*.ico
  • docs/favicon.png is excluded by !**/*.png
  • docs/media/bifrost-logo-dark.png is excluded by !**/*.png
  • docs/media/bifrost-logo.png is excluded by !**/*.png
  • tests/integrations/uv.lock is excluded by !**/*.lock
  • ui/app/favicon.ico is excluded by !**/*.ico
  • ui/app/favicon.png is excluded by !**/*.png
  • ui/public/bifrost-logo-dark.png is excluded by !**/*.png
  • ui/public/bifrost-logo.png is excluded by !**/*.png
📒 Files selected for processing (54)
  • .gitattributes (1 hunks)
  • .github/workflows/npx-publish.yml (1 hunks)
  • .github/workflows/pr-test-notifier.yml (1 hunks)
  • .github/workflows/pr-tests.yml (1 hunks)
  • .github/workflows/snyk.yml (2 hunks)
  • .github/workflows/test-coverage.yml (1 hunks)
  • .gitignore (1 hunks)
  • Makefile (5 hunks)
  • core/go.mod (1 hunks)
  • core/internal/testutil/audio_validation.go (1 hunks)
  • core/internal/testutil/speech_synthesis.go (3 hunks)
  • core/internal/testutil/speech_synthesis_stream.go (14 hunks)
  • core/internal/testutil/utils.go (1 hunks)
  • core/providers/bedrock/responses.go (1 hunks)
  • core/providers/gemini/gemini.go (4 hunks)
  • core/providers/gemini/speech.go (3 hunks)
  • core/providers/gemini/types.go (55 hunks)
  • core/providers/gemini/utils.go (5 hunks)
  • core/providers/openai/speech.go (1 hunks)
  • core/providers/utils/audio.go (1 hunks)
  • core/providers/vertex/types.go (1 hunks)
  • core/schemas/utils.go (2 hunks)
  • docs/quickstart/gateway/multimodal.mdx (1 hunks)
  • docs/style.css (1 hunks)
  • plugins/semanticcache/plugin_edge_cases_test.go (2 hunks)
  • tests/integrations/.python-version (1 hunks)
  • tests/integrations/Makefile (0 hunks)
  • tests/integrations/README.md (10 hunks)
  • tests/integrations/config.json (1 hunks)
  • tests/integrations/config.yml (2 hunks)
  • tests/integrations/dummy-gcp-credentials.json (1 hunks)
  • tests/integrations/pyproject.toml (1 hunks)
  • tests/integrations/pytest.ini (0 hunks)
  • tests/integrations/requirements.txt (0 hunks)
  • tests/integrations/test_audio.py (0 hunks)
  • tests/integrations/tests/conftest.py (1 hunks)
  • tests/integrations/tests/integrations/__init__.py (0 hunks)
  • tests/integrations/tests/test_anthropic.py (20 hunks)
  • tests/integrations/tests/test_bedrock.py (2 hunks)
  • tests/integrations/tests/test_google.py (1 hunks)
  • tests/integrations/tests/test_langchain.py (29 hunks)
  • tests/integrations/tests/test_litellm.py (22 hunks)
  • tests/integrations/tests/test_openai.py (22 hunks)
  • tests/integrations/tests/utils/common.py (16 hunks)
  • tests/integrations/tests/utils/config_loader.py (7 hunks)
  • transports/bifrost-http/.air.debug.toml (1 hunks)
  • transports/bifrost-http/handlers/config.go (1 hunks)
  • transports/bifrost-http/handlers/session.go (0 hunks)
  • transports/bifrost-http/integrations/genai.go (3 hunks)
  • transports/bifrost-http/integrations/router.go (5 hunks)
  • transports/go.mod (1 hunks)
  • ui/app/_fallbacks/enterprise/components/login/loginView.tsx (2 hunks)
  • ui/components/sidebar.tsx (1 hunks)
  • ui/lib/store/apis/sessionApi.ts (1 hunks)
💤 Files with no reviewable changes (6)
  • transports/bifrost-http/handlers/session.go
  • tests/integrations/Makefile
  • tests/integrations/pytest.ini
  • tests/integrations/tests/integrations/init.py
  • tests/integrations/test_audio.py
  • tests/integrations/requirements.txt
✅ Files skipped from review due to trivial changes (1)
  • .gitattributes
🚧 Files skipped from review as they are similar to previous changes (13)
  • core/providers/bedrock/responses.go
  • transports/go.mod
  • tests/integrations/tests/conftest.py
  • docs/style.css
  • core/internal/testutil/speech_synthesis.go
  • plugins/semanticcache/plugin_edge_cases_test.go
  • docs/quickstart/gateway/multimodal.mdx
  • ui/lib/store/apis/sessionApi.ts
  • core/providers/openai/speech.go
  • tests/integrations/pyproject.toml
  • core/schemas/utils.go
  • core/go.mod
  • core/providers/vertex/types.go
🧰 Additional context used
🧬 Code graph analysis (9)
core/internal/testutil/utils.go (1)
core/schemas/bifrost.go (1)
  • Gemini (48-48)
tests/integrations/tests/test_bedrock.py (1)
tests/integrations/tests/utils/config_loader.py (5)
  • get_model (139-160)
  • get_model (442-444)
  • get_config (430-435)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
transports/bifrost-http/integrations/router.go (1)
core/schemas/speech.go (1)
  • BifrostSpeechResponse (22-29)
core/internal/testutil/speech_synthesis_stream.go (6)
core/internal/testutil/utils.go (1)
  • GetProviderDefaultFormat (28-35)
core/schemas/speech.go (1)
  • BifrostSpeechStreamResponse (133-138)
core/schemas/bifrost.go (2)
  • Gemini (48-48)
  • BifrostStream (318-325)
core/providers/utils/audio.go (2)
  • ConvertPCMToWAV (32-62)
  • DefaultGeminiPCMConfig (22-28)
core/internal/testutil/audio_validation.go (1)
  • SaveAndValidateAudio (70-125)
core/internal/testutil/test_retry_framework.go (1)
  • WithSpeechStreamValidationRetry (1988-2126)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (1)
  • Type (778-778)
tests/integrations/tests/test_langchain.py (1)
tests/integrations/tests/utils/config_loader.py (5)
  • get_model (139-160)
  • get_model (442-444)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
  • get_config (430-435)
core/providers/gemini/gemini.go (4)
core/schemas/bifrost.go (1)
  • BifrostContextKey (101-101)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechRequest (94-129)
core/providers/utils/utils.go (1)
  • NewBifrostOperationError (449-460)
core/schemas/provider.go (1)
  • ErrProviderResponseDecode (28-28)
transports/bifrost-http/integrations/genai.go (6)
core/schemas/bifrost.go (3)
  • BifrostRequest (143-153)
  • SpeechRequest (94-94)
  • TranscriptionRequest (96-96)
transports/bifrost-http/integrations/router.go (4)
  • EmbeddingResponseConverter (102-102)
  • ChatResponseConverter (94-94)
  • SpeechResponseConverter (106-106)
  • TranscriptionResponseConverter (110-110)
core/providers/gemini/embedding.go (1)
  • ToGeminiEmbeddingResponse (57-100)
core/providers/gemini/speech.go (1)
  • ToGeminiSpeechResponse (169-192)
core/providers/gemini/transcription.go (1)
  • ToGeminiTranscriptionResponse (216-256)
core/providers/gemini/types.go (2)
  • GeminiGenerationRequest (55-73)
  • GenerationConfig (631-697)
tests/integrations/tests/test_anthropic.py (3)
tests/integrations/tests/utils/config_loader.py (7)
  • get_model (139-160)
  • get_model (442-444)
  • get_config (430-435)
  • get_integration_url (111-130)
  • get_integration_url (438-439)
  • get_provider_model (307-326)
  • get_provider_model (457-459)
tests/integrations/tests/utils/parametrize.py (2)
  • get_cross_provider_params_for_scenario (12-47)
  • format_provider_model (50-65)
tests/integrations/tests/utils/common.py (2)
  • assert_valid_chat_response (518-552)
  • assert_valid_image_response (573-643)
🪛 checkmake (0.2.2)
Makefile

[warning] 95-95: Target body for "dev" exceeds allowed length of 5 (37).

(maxbodylength)

🪛 Checkov (3.2.334)
tests/integrations/dummy-gcp-credentials.json

[medium] 5-6: Private Key

(CKV_SECRET_13)

🪛 GitHub Check: CodeQL
.github/workflows/test-coverage.yml

[warning] 12-30: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}

🪛 Gitleaks (8.29.0)
tests/integrations/dummy-gcp-credentials.json

[high] 5-5: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

🪛 Ruff (0.14.5)
tests/integrations/tests/test_langchain.py

577-577: Do not catch blind exception: Exception

(BLE001)


580-580: Unused method argument: test_config

(ARG002)


636-636: Do not catch blind exception: Exception

(BLE001)


636-636: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


637-637: f-string without any placeholders

Remove extraneous f prefix

(F541)


639-639: Unused method argument: test_config

(ARG002)


699-699: Do not catch blind exception: Exception

(BLE001)


732-732: Do not catch blind exception: Exception

(BLE001)


829-830: try-except-pass detected, consider logging the exception

(S110)


829-829: Do not catch blind exception: Exception

(BLE001)

tests/integrations/tests/test_anthropic.py

183-183: Unused method argument: test_config

(ARG002)


199-199: Unused method argument: test_config

(ARG002)


218-218: Unused method argument: test_config

(ARG002)


238-238: Unused method argument: test_config

(ARG002)


267-267: Unused method argument: test_config

(ARG002)


334-334: Unused method argument: test_config

(ARG002)


354-354: Unused method argument: test_config

(ARG002)


381-381: Unused method argument: test_config

(ARG002)


409-409: Unused method argument: test_config

(ARG002)


594-594: Unused method argument: test_config

(ARG002)


629-629: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


638-638: Unused method argument: test_config

(ARG002)


674-674: Unused method argument: test_config

(ARG002)


749-749: Unused method argument: test_config

(ARG002)


785-785: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/test_google.py

173-173: Redefinition of unused base64 from line 40

Remove definition: base64

(F811)


177-177: Unpacked variable header is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


230-230: Unused method argument: test_config

(ARG002)


246-246: Unused method argument: test_config

(ARG002)


269-269: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


321-321: Unused method argument: test_config

(ARG002)


358-358: Unused method argument: test_config

(ARG002)


380-380: Unused method argument: test_config

(ARG002)


395-395: Unused method argument: test_config

(ARG002)


408-408: Unused method argument: test_config

(ARG002)


428-428: Unused method argument: test_config

(ARG002)


464-464: Unused method argument: test_config

(ARG002)


507-507: Unused method argument: test_config

(ARG002)


520-520: Unused method argument: test_config

(ARG002)


570-570: Unused method argument: test_config

(ARG002)


585-585: Unused method argument: test_config

(ARG002)


592-592: Unused method argument: test_config

(ARG002)


616-616: Unused method argument: test_config

(ARG002)


642-642: Unused method argument: test_config

(ARG002)


666-666: Unused method argument: test_config

(ARG002)


690-690: Unused method argument: test_config

(ARG002)


713-713: Unused method argument: test_config

(ARG002)


747-747: Unused method argument: test_config

(ARG002)


801-801: Unused method argument: test_config

(ARG002)


847-847: Unused method argument: test_config

(ARG002)

tests/integrations/tests/utils/common.py

1731-1731: Avoid specifying long messages outside the exception class

(TRY003)


1753-1755: Avoid specifying long messages outside the exception class

(TRY003)


1768-1768: Local variable valid_event_types is assigned to but never used

Remove assignment to unused variable valid_event_types

(F841)


1833-1833: Avoid specifying long messages outside the exception class

(TRY003)


1843-1845: Avoid specifying long messages outside the exception class

(TRY003)


1861-1861: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)

tests/integrations/tests/utils/config_loader.py

154-157: Avoid specifying long messages outside the exception class

(TRY003)


228-228: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


234-234: Avoid specifying long messages outside the exception class

(TRY003)


237-237: Avoid specifying long messages outside the exception class

(TRY003)


287-287: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/integrations/tests/test_litellm.py

132-132: Unused function argument: request

(ARG001)


136-136: Possible hardcoded password assigned to: "token"

(S105)


180-180: Unused method argument: test_config

(ARG002)


200-200: Unused method argument: test_config

(ARG002)


221-221: Unused method argument: test_config

(ARG002)


245-245: Unused method argument: test_config

(ARG002)


270-270: Unused method argument: test_config

(ARG002)


317-317: Unused method argument: test_config

(ARG002)


342-342: Unused method argument: test_config

(ARG002)


360-360: Unused method argument: test_config

(ARG002)


378-378: Unused method argument: test_config

(ARG002)


402-402: Unused method argument: test_config

(ARG002)


447-447: Unused method argument: test_config

(ARG002)


518-518: Unused method argument: test_config

(ARG002)


548-548: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


557-557: Unused method argument: test_config

(ARG002)


595-595: Do not catch blind exception: Exception

(BLE001)

tests/integrations/tests/test_openai.py

222-222: Unused method argument: test_config

(ARG002)


237-237: Unused method argument: test_config

(ARG002)


256-256: Unused method argument: test_config

(ARG002)


273-273: Unused method argument: test_config

(ARG002)


294-294: Unused method argument: test_config

(ARG002)


338-338: Unused method argument: test_config

(ARG002)


356-356: Unused method argument: test_config

(ARG002)


369-369: Unused method argument: test_config

(ARG002)


382-382: Unused method argument: test_config

(ARG002)


514-514: Unused method argument: test_config

(ARG002)


549-549: Unpacked variable content_tools is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


560-560: Unused method argument: test_config

(ARG002)


1116-1116: Unused method argument: test_config

(ARG002)


1127-1127: Unused method argument: test_config

(ARG002)


1162-1162: Unused method argument: test_config

(ARG002)


1194-1194: Unused method argument: test_config

(ARG002)


1238-1238: Unused method argument: test_config

(ARG002)


1290-1290: Unused method argument: test_config

(ARG002)


1346-1346: Unused method argument: test_config

(ARG002)


1366-1366: Unpacked variable content is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1386-1386: Unused method argument: test_config

(ARG002)


1529-1529: Unused method argument: test_config

(ARG002)


1550-1550: Unused method argument: test_config

(ARG002)

🔇 Additional comments (20)
ui/app/_fallbacks/enterprise/components/login/loginView.tsx (1)

52-64: Past review addressed: Debug console.log statements removed.

The debug console.log statements flagged in the previous review have been successfully removed from lines 52 and 64.

transports/bifrost-http/.air.debug.toml (1)

1-67: Based on my verification, I can now provide the rewritten review:

The precedent of .air.toml being tracked suggests this debug configuration belongs in version control.

This file follows the project's existing pattern—.air.toml is already versioned in the same directory. Adding .air.debug.toml as a companion debug configuration is consistent with this approach and allows the team to maintain reproducible local development setups. The configuration itself is valid and includes reasonable exclusions for build and test artifacts.

Minor clarification suggested:

Consider adding a brief comment to transports/bifrost-http/.air.toml or a section in your contributor guide documenting when to use each configuration (standard vs. debug mode), if one doesn't already exist. This helps new contributors understand the purpose of having both configurations.

tests/integrations/.python-version (1)

1-1: Python version pin looks fine

Pinning integrations to Python 3.12 is reasonable; just keep CI/runtime aligned with this version to avoid surprises.

.github/workflows/npx-publish.yml (1)

17-37: Check-skip gating and permissions look good

The check-skip job’s logic (first-line --skip-pipeline flag) and minimal contents: read permissions, plus the needs/if gating on publish, are sound and keep the workflow least-privileged.

Also applies to: 39-45

tests/integrations/config.json (1)

1-167: Provider config JSON is consistent with the new provider-centric design

The providers block, env-style key references (env.*), and uniform default_request_timeout_in_seconds settings look consistent and align with the provider names used in INTEGRATION_TO_PROVIDER_MAP (e.g., openai, anthropic, gemini, bedrock). I don’t see mismatches here.

tests/integrations/README.md (1)

5-28: uv-centric docs and examples look coherent

The updated Quick Start, uv installation/sync instructions, and uv run pytest examples are consistent with the tests/integrations layout and new config tooling. Nothing blocking here; the docs are thorough and actionable.

Also applies to: 129-166, 235-265, 269-346

core/internal/testutil/utils.go (1)

28-35: LGTM!

The provider-specific default format logic is clean and correctly returns "wav" for Gemini and "mp3" for other providers.

tests/integrations/tests/test_bedrock.py (1)

24-24: LGTM!

The import path changes from ..utils to .utils correctly align with the updated package structure.

Also applies to: 35-35

.github/workflows/pr-tests.yml (1)

18-38: LGTM!

The check-skip job correctly includes explicit permissions (contents: read) at lines 21-22, addressing security best practices.

core/providers/utils/audio.go (1)

32-62: LGTM!

The PCM to WAV conversion logic is correct. The RIFF/WAVE header structure, byte calculations, and little-endian encoding all follow the standard WAV format specification.

core/internal/testutil/speech_synthesis_stream.go (3)

42-42: LGTM!

The changes correctly integrate provider-specific audio format handling and audio buffer accumulation for codec validation. The pattern of accumulating audio chunks and performing PCM-to-WAV conversion for Gemini is sound and consistently applied.

Also applies to: 54-54, 137-137, 176-177, 226-242


286-286: LGTM!

The HD streaming test correctly uses GetProviderDefaultFormat and accumulates audio for validation. The PCM-to-WAV conversion logic for Gemini is properly applied.

Also applies to: 321-321, 344-345, 372-390


437-437: LGTM!

The multi-voice streaming test correctly resets the audio buffer on retry (line 464) and accumulates audio for validation. This ensures fresh validation data for each attempt.

Also applies to: 458-464, 491-492, 526-543

core/providers/gemini/utils.go (1)

51-76: Schema/type normalization and ResponseJSONSchema wiring look solid.

Using ResponseJSONSchema consistently and lowercasing all "type" fields (both at the top level and nested) gives more robust interop with OpenAI‑style structured outputs. The convertTypeToLowerCase + convertSchemaToMap combination is a reasonable, contained way to enforce this.

Also applies to: 78-122

transports/bifrost-http/integrations/genai.go (1)

35-49: GenAI speech/transcription routing and MIME detection are well‑structured.

The new RequestConverter routing for IsSpeech / IsTranscription and the helpers isSpeechRequest / isTranscriptionRequest / isAudioMimeType correctly:

  • Prioritize TTS when both input audio and audio output config are present.
  • Restrict transcription detection to actual audio/* MIME types (case‑insensitive, stripping parameters).
  • Keep non‑speech requests on the chat path.

This should make speech vs transcription dispatch much more reliable without over‑matching.

Also applies to: 161-171, 178-244

core/providers/gemini/types.go (1)

819-902: Voice JSON marshaling/unmarshaling looks consistent with SDK expectations.

The custom PrebuiltVoiceConfig/VoiceConfig marshal/unmarshal logic correctly bridges:

  • incoming snake_case (voice_name, prebuilt_voice_config, voice_config) from the SDK, and
  • outgoing camelCase (voiceName, prebuiltVoiceConfig, voiceConfig) used when sending requests.

Field aliasing via local Alias structs keeps the exported struct tags meaningful while matching the wire format expected by Gemini.

tests/integrations/tests/test_openai.py (4)

1528-1593: LGTM - Text completion tests are well-structured.

The new text completion tests properly validate:

  • Response structure via assert_valid_text_completion_response
  • Streaming behavior with chunk counting
  • Content quality checks with appropriate flexibility

The use of gpt-3.5-turbo-instruct is appropriate for the legacy completions API.


196-201: Timeout increase aligns with PR objectives.

The timeout increase from 30s to 300s addresses the timeout issues mentioned in the PR objectives. This is reasonable for integration tests that may experience variable API latency.


535-558: Good capability-based conditional testing.

The test correctly checks config.provider_supports_scenario(provider, "tool_calls") before attempting tool call streaming, and dynamically retrieves the appropriate tools-capable model. This prevents test failures for providers that don't support tools.


1237-1288: Well-structured tool call validation.

The test properly validates:

  • Function call presence in output
  • Function name matches expected "get_weather"
  • Argument parsing handles both string JSON and dict formats
  • Location argument contains expected value

Comment on lines +14 to +31
check-skip:
runs-on: ubuntu-latest
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit permissions to the check-skip job.

For consistency with pr-tests.yml and to follow security best practices, the check-skip job should declare explicit permissions.

Apply this diff:

   check-skip:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
     outputs:
       should-skip: ${{ steps.check.outputs.should-skip }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
check-skip:
runs-on: ubuntu-latest
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
check-skip:
runs-on: ubuntu-latest
permissions:
contents: read
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
🤖 Prompt for AI Agents
In .github/workflows/pr-test-notifier.yml around lines 14 to 31, the check-skip
job lacks explicit permissions; add a permissions block under the check-skip job
(same location as other jobs) declaring at minimum "contents: read" to align
with pr-tests.yml and follow least-privilege practice so the job can read the
repository commit message safely.

Comment on lines +16 to +34
check-skip:
runs-on: ubuntu-latest
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit permissions to the check-skip job.

The check-skip job should declare explicit permissions to follow security best practices and satisfy CodeQL requirements. The job needs contents: read to checkout the repository and read commit messages.

Apply this diff:

   check-skip:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
     outputs:
       should-skip: ${{ steps.check.outputs.should-skip }}
🤖 Prompt for AI Agents
.github/workflows/snyk.yml around lines 16 to 34: the check-skip job lacks
explicit permissions; add a permissions block to the job with "contents: read"
so the workflow can safely checkout and read commit messages (e.g., insert a
permissions section under the job definition specifying contents: read).

Comment on lines +11 to +28
check-skip:
runs-on: ubuntu-latest
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit permissions to the check-skip job.

The check-skip job lacks explicit permissions, which triggers the CodeQL warning. Add a minimal permissions block to declare only the rights needed to read the repository.

Apply this diff:

   check-skip:
     runs-on: ubuntu-latest
+    permissions:
+      contents: read
     outputs:
       should-skip: ${{ steps.check.outputs.should-skip }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
check-skip:
runs-on: ubuntu-latest
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
check-skip:
runs-on: ubuntu-latest
permissions:
contents: read
outputs:
should-skip: ${{ steps.check.outputs.should-skip }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Check if pipeline should be skipped
id: check
run: |
COMMIT_MESSAGE=$(git log -1 --pretty=%B)
FIRST_LINE=$(echo "$COMMIT_MESSAGE" | head -n 1)
if [[ "$FIRST_LINE" == *"--skip-pipeline"* ]]; then
echo "should-skip=true" >> $GITHUB_OUTPUT
else
echo "should-skip=false" >> $GITHUB_OUTPUT
fi
🤖 Prompt for AI Agents
.github/workflows/test-coverage.yml around lines 11 to 28: the check-skip job
has no explicit permissions which triggers CodeQL warnings; add a minimal
permissions block for repository read access by declaring permissions: contents:
read at the job level (directly under the job name) so the job can read commit
data while limiting other rights.

Comment on lines +73 to +81
if len(audioData) == 0 {
return "", fmt.Errorf("audio data is empty")
}

// Detect audio format from magic bytes
detectedFormat := DetectAudioFormat(audioData)
if detectedFormat == "" {
return "", fmt.Errorf("unable to detect audio format from data (first 16 bytes: %x)", audioData[:min(16, len(audioData))])
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix undefined min helper in SaveAndValidateAudio (compile error).

This line cannot compile:

return "", fmt.Errorf("unable to detect audio format from data (first 16 bytes: %x)", audioData[:min(16, len(audioData))])

There is no min function defined or imported, and Go doesn’t provide a built‑in min here. Replace with an explicit bound:

-	if detectedFormat == "" {
-		return "", fmt.Errorf("unable to detect audio format from data (first 16 bytes: %x)", audioData[:min(16, len(audioData))])
-	}
+	if detectedFormat == "" {
+		max := len(audioData)
+		if max > 16 {
+			max = 16
+		}
+		return "", fmt.Errorf(
+			"unable to detect audio format from data (first 16 bytes: %x)",
+			audioData[:max],
+		)
+	}

This preserves the diagnostic while compiling correctly even for very short buffers.

🤖 Prompt for AI Agents
In core/internal/testutil/audio_validation.go around lines 73-81, the code uses
an undefined min helper in the error message slice expression causing a compile
error; replace that expression with an explicit bound: compute n :=
len(audioData); if n > 16 { n = 16 } and use audioData[:n] in the fmt.Errorf
call so the slice never exceeds the buffer length and the diagnostic still
prints up to the first 16 bytes.

Comment thread Makefile
Comment on lines +571 to +682
test-integrations: ## Run Python integration tests (Usage: make test-integrations [INTEGRATION=openai] [TESTCASE=test_name] [VERBOSE=1])
@echo "$(GREEN)Running Python integration tests...$(NC)"
@if [ ! -d "tests/integrations" ]; then \
echo "$(RED)Error: tests/integrations directory not found$(NC)"; \
exit 1; \
fi; \
if [ -n "$(TESTCASE)" ] && [ -z "$(INTEGRATION)" ]; then \
echo "$(RED)Error: TESTCASE requires INTEGRATION to be specified$(NC)"; \
echo "$(YELLOW)Usage: make test-integrations INTEGRATION=anthropic TESTCASE=test_05_end2end_tool_calling$(NC)"; \
exit 1; \
fi; \
if [ -f .env ]; then \
echo "$(YELLOW)Loading environment variables from .env...$(NC)"; \
set -a; . ./.env; set +a; \
fi; \
BIFROST_STARTED=0; \
BIFROST_PID=""; \
TAIL_PID=""; \
TEST_PORT=$${PORT:-8080}; \
TEST_HOST=$${HOST:-localhost}; \
echo "$(CYAN)Checking if Bifrost is running on $$TEST_HOST:$$TEST_PORT...$(NC)"; \
if curl -s -o /dev/null -w "%{http_code}" http://$$TEST_HOST:$$TEST_PORT/health 2>/dev/null | grep -q "200\|404"; then \
echo "$(GREEN)✓ Bifrost is already running$(NC)"; \
else \
echo "$(YELLOW)Bifrost not running, starting it...$(NC)"; \
./tmp/bifrost-http -host "$$TEST_HOST" -port "$$TEST_PORT" -log-style "$(LOG_STYLE)" -log-level "$(LOG_LEVEL)" -app-dir tests/integrations > /tmp/bifrost-test.log 2>&1 & \
BIFROST_PID=$$!; \
BIFROST_STARTED=1; \
echo "$(YELLOW)Waiting for Bifrost to be ready...$(NC)"; \
echo "$(CYAN)Bifrost logs: /tmp/bifrost-test.log$(NC)"; \
(tail -f /tmp/bifrost-test.log 2>/dev/null | grep -E "error|panic|Error|ERRO|fatal|Fatal|FATAL" --line-buffered &) & \
TAIL_PID=$$!; \
for i in 1 2 3 4 5 6 7 8 9 10; do \
if curl -s -o /dev/null http://$$TEST_HOST:$$TEST_PORT/health 2>/dev/null; then \
echo "$(GREEN)✓ Bifrost is ready (PID: $$BIFROST_PID)$(NC)"; \
break; \
fi; \
if [ $$i -eq 10 ]; then \
echo "$(RED)Failed to start Bifrost$(NC)"; \
echo "$(YELLOW)Bifrost logs:$(NC)"; \
cat /tmp/bifrost-test.log 2>/dev/null || echo "No log file found"; \
[ -n "$$BIFROST_PID" ] && kill $$BIFROST_PID 2>/dev/null; \
[ -n "$$TAIL_PID" ] && kill $$TAIL_PID 2>/dev/null; \
exit 1; \
fi; \
sleep 1; \
done; \
fi; \
TEST_FAILED=0; \
if ! which uv > /dev/null 2>&1; then \
echo "$(YELLOW)uv not found, checking for pytest...$(NC)"; \
if ! which pytest > /dev/null 2>&1; then \
echo "$(RED)Error: Neither uv nor pytest found$(NC)"; \
echo "$(YELLOW)Install uv: curl -LsSf https://astral.sh/uv/install.sh | sh$(NC)"; \
echo "$(YELLOW)Or install pytest: pip install pytest$(NC)"; \
[ $$BIFROST_STARTED -eq 1 ] && [ -n "$$BIFROST_PID" ] && kill $$BIFROST_PID 2>/dev/null; \
[ -n "$$TAIL_PID" ] && kill $$TAIL_PID 2>/dev/null; \
exit 1; \
fi; \
echo "$(CYAN)Using pytest directly$(NC)"; \
if [ -n "$(INTEGRATION)" ]; then \
if [ -n "$(TESTCASE)" ]; then \
echo "$(CYAN)Running $(INTEGRATION) integration test: $(TESTCASE)...$(NC)"; \
cd tests/integrations && pytest tests/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
else \
echo "$(CYAN)Running $(INTEGRATION) integration tests...$(NC)"; \
cd tests/integrations && pytest tests/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
fi; \
else \
echo "$(CYAN)Running all integration tests...$(NC)"; \
cd tests/integrations && pytest $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
fi; \
else \
echo "$(CYAN)Using uv (fast mode)$(NC)"; \
cd tests/integrations && \
if [ ! -f .venv/bin/python ]; then \
echo "$(YELLOW)Installing dependencies with uv...$(NC)"; \
uv venv && uv pip install -r requirements.txt; \
fi; \
if [ -n "$(INTEGRATION)" ]; then \
if [ -n "$(TESTCASE)" ]; then \
echo "$(CYAN)Running $(INTEGRATION) integration test: $(TESTCASE)...$(NC)"; \
uv run pytest tests/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
else \
echo "$(CYAN)Running $(INTEGRATION) integration tests...$(NC)"; \
uv run pytest tests/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
fi; \
else \
echo "$(CYAN)Running all integration tests...$(NC)"; \
uv run pytest $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
fi; \
fi; \
if [ $$BIFROST_STARTED -eq 1 ] && [ -n "$$BIFROST_PID" ]; then \
echo "$(YELLOW)Stopping Bifrost (PID: $$BIFROST_PID)...$(NC)"; \
kill $$BIFROST_PID 2>/dev/null || true; \
[ -n "$$TAIL_PID" ] && kill $$TAIL_PID 2>/dev/null || true; \
wait $$BIFROST_PID 2>/dev/null || true; \
echo "$(GREEN)✓ Bifrost stopped$(NC)"; \
if [ $$TEST_FAILED -eq 1 ]; then \
echo ""; \
echo "$(YELLOW)Last 50 lines of Bifrost logs:$(NC)"; \
tail -50 /tmp/bifrost-test.log 2>/dev/null || echo "No log file found"; \
fi; \
fi; \
echo ""; \
if [ $$TEST_FAILED -eq 1 ]; then \
echo "$(RED)✗ Integration tests failed$(NC)"; \
echo "$(CYAN)Full Bifrost logs: /tmp/bifrost-test.log$(NC)"; \
exit 1; \
else \
echo "$(GREEN)✓ Integration tests complete$(NC)"; \
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix pytest paths in test-integrations to match the actual test layout

Inside test-integrations, you cd tests/integrations and then run pytest against tests/test_$(INTEGRATION).py[...]. Given the repo structure (tests/integrations/tests/integrations/test_openai.py, etc.), these paths will not resolve and the target will fail with “file not found”.

From within tests/integrations, the correct paths are tests/integrations/test_$(INTEGRATION).py[...]. Suggested change:

-            cd tests/integrations && pytest tests/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
+            cd tests/integrations && pytest tests/integrations/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
...
-            cd tests/integrations && pytest tests/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
+            cd tests/integrations && pytest tests/integrations/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
...
-                uv run pytest tests/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
+                uv run pytest tests/integrations/test_$(INTEGRATION).py::$(TESTCASE) $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
...
-                uv run pytest tests/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \
+                uv run pytest tests/integrations/test_$(INTEGRATION).py $(if $(VERBOSE),-v,-q) || TEST_FAILED=1; \

This should make make test-integrations INTEGRATION=openai ... actually hit the intended test files.

🤖 Prompt for AI Agents
In Makefile around lines 571 to 682, the recipe cds into tests/integrations but
then calls pytest with paths like tests/test_$(INTEGRATION).py which are
incorrect (they reference the tests directory twice) and cause "file not found";
change those pytest invocations to use the correct relative paths after the cd
(e.g., run pytest test_$(INTEGRATION).py or pytest
test_$(INTEGRATION).py::$(TESTCASE) when INTEGRATION/TESTCASE are set) so the
test file paths resolve correctly; apply the same fix for all branches that run
pytest or uv run pytest in this block.

Comment on lines +222 to +225
def test_01_simple_chat(self, openai_client, test_config, provider, model):
if provider == "_no_providers_" or model == "_no_model_":
pytest.skip("No providers configured for this scenario")
"""Test Case 1: Simple chat interaction - runs across all available providers"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring placed after code will not be recognized as a docstring.

In Python, a docstring must be the first statement in a function body. Here it appears after the pytest.skip() check, making it an orphan string literal rather than a proper docstring. This pattern is repeated across all parametrized tests in this file (test_01 through test_09, test_13, test_32 through test_37).

Apply this diff to fix the pattern (and similarly for all affected tests):

     @pytest.mark.parametrize("provider,model", get_cross_provider_params_for_scenario("simple_chat"))
     def test_01_simple_chat(self, openai_client, test_config, provider, model):
+        """Test Case 1: Simple chat interaction - runs across all available providers"""
         if provider == "_no_providers_" or model == "_no_model_":
             pytest.skip("No providers configured for this scenario")
-        """Test Case 1: Simple chat interaction - runs across all available providers"""
         response = openai_client.chat.completions.create(
🧰 Tools
🪛 Ruff (0.14.5)

222-222: Unused method argument: test_config

(ARG002)

Comment on lines +130 to 137
}else{
mapConfig["auth_config"] = map[string]any{
"admin_username": "",
"admin_password": "",
"is_enabled": false,
"disable_auth_on_inference": false,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logic gap: auth_config not guaranteed when ConfigStore exists but has no auth config.

The else block only handles the case when ConfigStore == nil, but there's a missing scenario: when ConfigStore != nil AND authConfig == nil (lines 117-129), auth_config won't be set in the response. This contradicts the stated goal of ensuring auth_config is always included in the config map.

Additionally, line 130 has a formatting issue: }else{ should be } else { per Go conventions.

Apply this diff to ensure auth_config is always present:

 		if authConfig != nil {
 			password := ""
 			if authConfig.AdminPassword != "" {
 				password = "<redacted>"
 			}
 			// Password we will hash it
 			mapConfig["auth_config"] = map[string]any{
 				"admin_username":            authConfig.AdminUserName,
 				"admin_password":            password,
 				"is_enabled":                authConfig.IsEnabled,
 				"disable_auth_on_inference": authConfig.DisableAuthOnInference,
 			}
+		} else {
+			// No auth config in store, provide defaults
+			mapConfig["auth_config"] = map[string]any{
+				"admin_username":            "",
+				"admin_password":            "",
+				"is_enabled":                false,
+				"disable_auth_on_inference": false,
+			}
 		}
-	}else{
+	} else {
+		// ConfigStore is nil, provide defaults
 		mapConfig["auth_config"] = map[string]any{
 			"admin_username":            "",
 			"admin_password":            "",
 			"is_enabled":                false,
 			"disable_auth_on_inference": false,
 		}
 	}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In transports/bifrost-http/handlers/config.go around lines 130 to 137, the
current else only sets a default auth_config when ConfigStore == nil, leaving a
gap when ConfigStore != nil but authConfig == nil; update the logic so that
after reading authConfig you always set mapConfig["auth_config"] — if authConfig
is nil populate it with the default map (admin_username:"", admin_password:"",
is_enabled:false, disable_auth_on_inference:false), otherwise populate from
authConfig; also fix formatting to use a space in the else brace (`} else {`) to
follow Go conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants