feat: core tests retry logic enhancements#921
Conversation
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds nil guards for message Content, refines streaming end/latency handling, and introduces an extensive test retry/validation framework. Many tests now use retry/validation wrappers and relax content-length constraints; various changelogs and minor config ordering fixes were applied. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Test as Test Scenario
participant Retry as Retry Wrapper
participant Client as Provider Client
participant Validator as Validator
participant Accum as Accumulator
Note over Test,Retry: New common flow for unary & streaming operations
Test->>Retry: Invoke operation with RetryConfig + Context
Retry->>Client: Attempt request / open stream
Client-->>Retry: response or stream channel / error
alt streaming
Client->>Validator: stream chunks
Validator->>Accum: accumulate content / tool-calls
Validator-->>Retry: StreamingValidationResult (per attempt)
Retry-->>Test: aggregated StreamingValidationResult after retries
else unary
Retry-->>Test: response or final error after retries
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Areas needing extra attention:
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (53)
Comment |
ebe3476 to
8661443
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tests/core-providers/scenarios/utils.go (1)
588-612: Fix potential nil-pointer dereference in GetErrorMessage
GetErrorMessageassumeserr.Erroris non-nil; any*schemas.BifrostErrorwithout anErrorfield will panic when you accesserr.Error.Type,err.Error.Code, orerr.Error.Message. This can crash tests any time a provider returns a structurally different error.Recommend guarding
err.Errorand falling back to the core helper:func GetErrorMessage(err *schemas.BifrostError) string { if err == nil { return "" } - errorType := "" - if err.Type != nil && *err.Type != "" { - errorType = *err.Type - } - - if errorType == "" && err.Error.Type != nil && *err.Error.Type != "" { - errorType = *err.Error.Type - } - - errorCode := "" - if err.Error.Code != nil && *err.Error.Code != "" { - errorCode = *err.Error.Code - } - - errorMessage := err.Error.Message - - errorString := fmt.Sprintf("%s %s: %s", errorType, errorCode, errorMessage) - - return errorString + // If structured error details are missing, defer to core helper. + if err.Error == nil { + return bifrost.GetErrorMessage(err) + } + + errorType := "" + if err.Type != nil && *err.Type != "" { + errorType = *err.Type + } + if errorType == "" && err.Error.Type != nil && *err.Error.Type != "" { + errorType = *err.Error.Type + } + + errorCode := "" + if err.Error.Code != nil && *err.Error.Code != "" { + errorCode = *err.Error.Code + } + + errorMessage := err.Error.Message + if errorMessage == "" { + return bifrost.GetErrorMessage(err) + } + + return fmt.Sprintf("%s %s: %s", errorType, errorCode, errorMessage) }This prevents panics and keeps error formatting consistent with core behavior when detailed fields are absent.
tests/core-providers/scenarios/speech_synthesis.go (1)
254-275: Fix loop variable capture in AllVoiceOptions voice-type subtestsIn
AllVoiceOptions, the loop:for _, voiceType := range voiceTypes { t.Run("VoiceType_"+voiceType, func(t *testing.T) { if os.Getenv("SKIP_PARALLEL_TESTS") != "true" { t.Parallel() } voice := GetProviderVoice(testConfig.Provider, voiceType) ... voiceRetryContext := TestRetryContext{ ScenarioName: "SpeechSynthesis_VoiceType_" + voiceType, ... TestMetadata: map[string]interface{}{ "voice_type": voiceType, "voice": voice, }, }captures the range variable
voiceTypeinside a subtest that runs witht.Parallel. Depending on Go version/scheduling, all subtests can end up using the lastvoiceType, so you may not actually be exercising all three voice types.Capture
voiceTypeper iteration before callingt.Run:- for _, voiceType := range voiceTypes { - t.Run("VoiceType_"+voiceType, func(t *testing.T) { + for _, voiceType := range voiceTypes { + voiceType := voiceType // capture per-iteration value + t.Run("VoiceType_"+voiceType, func(t *testing.T) { if os.Getenv("SKIP_PARALLEL_TESTS") != "true" { t.Parallel() } voice := GetProviderVoice(testConfig.Provider, voiceType) ...This ensures each subtest actually runs with its own intended
voiceType.tests/core-providers/scenarios/tool_calls_streaming.go (1)
94-179: Prevent spurious "default" key entries inResponsesToolCallsmapThe issue is confirmed. When
AccumulateResponsesToolCallis called with all four parameters (itemID, callID, name, arguments) nil or empty—which is theoretically possible during malformed streaming chunks—the code creates an entry under the"default"key. This entry is never cleaned up because the key migration logic at line 146 explicitly skips deletion whenkey == "default". As a result,GetFinalResponsesToolCalls()can return an entry with empty ID, Name, and Arguments.The suggested fix is sound: initialize
key := ""instead of"default", and return early if no identifying information is present.tests/core-providers/scenarios/responses_stream.go (1)
774-813: Align sequence-number validation: remove fatal failure or consolidate duplicate checksThe function at lines 778–781 uses
t.Fatalf("⚠️ Warning:...")which contradicts the warning emoji—a fatal failure should use a clear failure marker. Additionally,validateResponsesStreamingResponse(lines 869–873) performs the identical sequence-number check but appends to an errors slice instead of failing fatally.Change line 779 to use
t.Logfwith a failure emoji (❌) for consistency with other non-fatal checks (lines 793, 796, 799, 806), or consolidate both checks by removing the fatal one and relying on the soft error accumulation invalidateResponsesStreamingResponse.tests/core-providers/scenarios/embedding.go (1)
123-144: Fix invalidrangeover int and handle 2D embedding responses safelyIn
validateEmbeddingSemantics(tests/core-providers/scenarios/embedding.go, lines 123-144):
for i := range responseDataLengthis not valid Go—rangerequires an iterable type (slice, array, map, channel, string), notint. This will fail to compile.- The logic creates an out-of-bounds risk: when
responseDataLengthis reassigned fromlen(Embedding2DArray)(line 129), subsequent indexing intoresponse.Data[i]can panic iflen(response.Data) < responseDataLength(e.g., single 2D embedding entry containing multiple vectors).Replace the loop with explicit handling for both 1D and 2D embedding cases:
- responseDataLength := len(response.Data) - if responseDataLength != len(testTexts) { - if responseDataLength > 0 && response.Data[0].Embedding.Embedding2DArray != nil { - responseDataLength = len(response.Data[0].Embedding.Embedding2DArray) - } - if responseDataLength != len(testTexts) { - t.Fatalf("Expected %d embedding results, got %d", len(testTexts), responseDataLength) - } - } - - for i := range responseDataLength { - vec, extractErr := getEmbeddingVector(response.Data[i]) + dataLen := len(response.Data) + if dataLen == len(testTexts) { + // Standard 1D embeddings: one entry per input + for i := 0; i < dataLen; i++ { + vec, extractErr := getEmbeddingVector(response.Data[i]) + if extractErr != nil { + t.Fatalf("Failed to extract embedding vector for text '%s': %v", testTexts[i], extractErr) + } + if len(vec) == 0 { + t.Fatalf("Embedding vector is empty for text '%s'", testTexts[i]) + } + embeddings[i] = vec + } + } else if dataLen > 0 && response.Data[0].Embedding.Embedding2DArray != nil { + // 2D embeddings: all vectors contained in a single entry + vectors := response.Data[0].Embedding.Embedding2DArray + if len(vectors) != len(testTexts) { + t.Fatalf("Expected %d embedding results, got %d", len(testTexts), len(vectors)) + } + for i := 0; i < len(vectors); i++ { + vec := vectors[i] + if len(vec) == 0 { + t.Fatalf("Embedding vector is empty for text '%s'", testTexts[i]) + } + embeddings[i] = vec + } + } else { + t.Fatalf("Expected %d embedding results, got %d", len(testTexts), dataLen) }
🧹 Nitpick comments (15)
tests/core-providers/scenarios/utils.go (1)
536-563: Reuse default speech retry conditions to avoid weakening retriesYou construct
speechRetryConfigfromDefaultSpeechRetryConfig()but drop itsConditionsby setting an empty slice, so only validation failures and timeouts will trigger retries. IfDefaultSpeechRetryConfigincludes useful conditions (e.g., empty audio, generic errors), consider propagating them:- speechRetryConfig := SpeechRetryConfig{ - MaxAttempts: retryConfig.MaxAttempts, - BaseDelay: retryConfig.BaseDelay, - MaxDelay: retryConfig.MaxDelay, - Conditions: []SpeechRetryCondition{}, - OnRetry: retryConfig.OnRetry, - OnFinalFail: retryConfig.OnFinalFail, - } + speechRetryConfig := SpeechRetryConfig{ + MaxAttempts: retryConfig.MaxAttempts, + BaseDelay: retryConfig.BaseDelay, + MaxDelay: retryConfig.MaxDelay, + Conditions: retryConfig.Conditions, // preserve default speech retry conditions + OnRetry: retryConfig.OnRetry, + OnFinalFail: retryConfig.OnFinalFail, + }This keeps the helper aligned with the main speech retry behavior.
Also applies to: 565-569
tests/core-providers/scenarios/speech_synthesis_stream.go (1)
375-379: Capture per-iteration voice value inside streaming voice testsIn
MultipleVoices_Streaming, the loop:for _, voice := range voices { t.Run("StreamingVoice_"+voice, func(t *testing.T) { ... validationResult := WithSpeechStreamValidationRetry(... func(responseChannel chan *schemas.BifrostStream) SpeechStreamValidationResult { ... streamErrors = append(streamErrors, fmt.Sprintf("Received nil stream response for voice %s", voice)) ... })relies on the range variable
voiceinside nested closures. Witht.Run+t.Parallel, this can lead to all subtests logging the samevoicefor errors, depending on Go version and scheduling.You already introduce
voiceCopy := voicefor the request; reusing that in logs avoids surprises:- for _, voice := range voices { - t.Run("StreamingVoice_"+voice, func(t *testing.T) { + for _, voice := range voices { + voice := voice // capture per-iteration value + t.Run("StreamingVoice_"+voice, func(t *testing.T) { if os.Getenv("SKIP_PARALLEL_TESTS") != "true" { t.Parallel() } voiceCopy := voice ... - if response == nil { - streamErrors = append(streamErrors, fmt.Sprintf("Received nil stream response for voice %s", voice)) + if response == nil { + streamErrors = append(streamErrors, fmt.Sprintf("Received nil stream response for voice %s", voiceCopy)) continue } ... - if response.BifrostSpeechStreamResponse != nil && response.BifrostSpeechStreamResponse.Audio != nil && len(response.BifrostSpeechStreamResponse.Audio) > 0 { - receivedData = true - t.Logf("✅ Received data for voice %s: %d bytes", voice, len(response.BifrostSpeechStreamResponse.Audio)) + if response.BifrostSpeechStreamResponse != nil && response.BifrostSpeechStreamResponse.Audio != nil && len(response.BifrostSpeechStreamResponse.Audio) > 0 { + receivedData = true + t.Logf("✅ Received data for voice %s: %d bytes", voiceCopy, len(response.BifrostSpeechStreamResponse.Audio)) } ... - if !receivedData { - validationErrors = append(validationErrors, fmt.Sprintf("Should receive audio data for voice %s", voice)) + if !receivedData { + validationErrors = append(validationErrors, fmt.Sprintf("Should receive audio data for voice %s", voiceCopy)) }This keeps logs and validation messages aligned with the actual voice used in the request.
Also applies to: 412-469
tests/core-providers/scenarios/speech_synthesis.go (1)
85-113: Speech retry integration across scenarios looks solidUsing
GetTestRetryConfigForScenario→SpeechRetryConfig→WithSpeechTestRetryin the basic, HD, and voice-type tests is consistent with the shared retry framework. Pairing this withSpeechExpectations+ModifyExpectationsForProviderand then doing speech-specific checks (validateSpeechSynthesisSpecific, size thresholds, modelRequested logging) gives good coverage of both transport and content.One nuance: each
SpeechRetryConfigyou construct setsConditions: []SpeechRetryCondition{}. If your default speech retry config defines useful conditions (e.g., empty audio, generic error patterns), consider copying them from the scenario-levelretryConfig/voiceRetryConfiginstead of zeroing them out, so behavior stays aligned with other speech scenarios.Also applies to: 100-112, 116-122, 192-207, 210-218, 222-227, 276-301, 279-301, 304-307, 308-314
tests/core-providers/scenarios/text_completion.go (1)
55-56: TextCompletion retry wiring + relaxed content checks are appropriateSwitching to
TextCompletionRetryConfig+WithTextCompletionTestRetryand usingGetTextCompletionContentaligns text completion with the shared retry framework. Removing strict keyword assertions is the right call for non-deterministic LLMs and keeps the test focused on functional validity viaGetExpectationsForScenario/ModifyExpectationsForProvider.As with the speech configs, note that
Conditionsis set to an empty slice; if you later define text-completion-specific retry conditions, you can pass them here without changing the call sites.Also applies to: 57-65, 67-69, 75-76
tests/core-providers/scenarios/tool_calls_streaming.go (1)
218-342: Add a timeout / context-based select around the chat streaming loop to avoid hangsThe new Chat Completions tool-call streaming test reads from
responseChannelwith a barefor response := range responseChanneland no timeout orselect. If the provider leaves the stream open or never sends any chunks, the test can hang indefinitely despiteWithStreamRetrysucceeding in establishing the channel.To align with the more robust pattern you already use in other streaming tests (e.g.,
ResponsesStreamwithcontext.WithTimeoutand aselecton<-streamCtx.Done()), consider wrapping the chat streaming loop similarly:- for response := range responseChannel { + streamCtx, cancel := context.WithTimeout(ctx, 200*time.Second) + defer cancel() + + for { + select { + case response, ok := <-responseChannel: + if !ok { + goto chatStreamComplete + } + // existing processing… + case <-streamCtx.Done(): + t.Fatal("Timeout waiting for chat completions streaming response with tools") + } + } + + chatStreamComplete:This keeps the test from hanging while preserving the existing response processing and
responseCountguard.tests/core-providers/scenarios/validation_presets.go (1)
15-29: Expectation presets look consistent; be aware of boolean override semantics inCombineExpectationsThe new defaults (e.g.,
ExpectedChoiceCount: 1,ShouldHaveLatency: true, andShouldHaveContent: falsefor tool-only scenarios) line up with how the tests are written and the new streaming validations.One nuance:
CombineExpectationsonly overrides boolean fields when the later expectation’s value istrue, so you can’t currently use it to force a flag back tofalse(e.g., turnShouldHaveContentorShouldHaveLatencyoff). That’s fine if these are intended as global “must be true” invariants, but worth keeping in mind if you later try to combine presets that intentionally relax them.Also applies to: 62-81, 147-163, 166-187, 387-446
tests/core-providers/scenarios/transcription.go (1)
80-112: Retry wiring for TTS and transcription looks solid; consider handling TTS generation errors explicitlyThe new use of
GetTestRetryConfigForScenario+{Speech,Transcription}RetryConfig+With{Speech,Transcription}TestRetrycleanly standardizes retry behavior across all transcription-related tests and keeps the validation logic localized.One small robustness gap: calls to
GenerateTTSAudioForTest(e.g., inAdditionalAudioTests,AllResponseFormats,WithCustomParameters,MultipleLanguages) ignore its error return and assumeaudioDatais valid. If that helper can fail (e.g., provider outage), it would be clearer to:
- Check the error and fail fast with a descriptive message, or
- Surface it through the same retry framework if appropriate.
Otherwise, the new retry integration and enriched error messages look good.
Also applies to: 142-177, 231-269, 313-352, 383-421, 453-492
tests/core-providers/scenarios/responses_stream.go (1)
65-241: Validation callback is thorough; consider using expectations/latency more explicitlyThe new
WithResponsesStreamValidationRetrycallback does a nice job tracking event types, sequence numbers, and assembled content, and it cleanly returns aResponsesStreamValidationResultfor the retry wrapper to act on.A couple of small polish opportunities:
- You build
expectations := GetExpectationsForScenario("ResponsesStream", ...)and tweakShouldContainKeywords, but the resultingexpectationsaren’t currently used in any validation. Either wire them intovalidateResponsesStreamingResponse(e.g., to assert on keywords) or drop this block to avoid confusion.LastLatencyin the returnedResponsesStreamValidationResultis hard-coded to 0 even thoughlastResponseis tracked and its latency is logged. If callers ever want to assert on end-to-end latency, consider populatingLastLatencyfromlastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency.These are non-blocking, but would make the new validation result more self-contained and intention-revealing.
tests/core-providers/scenarios/response_validation.go (1)
276-311: List-models validation is solid; consider aligning latency semantics with other validatorsThe new
ValidateListModelsResponse,validateListModelsFields, andcollectListModelsResponseMetricscover key integrity points: non-emptyData, non-empty IDs, provider and request type checks, optionalexpected_providerandmin_model_count, and basic latency and pagination metrics. This should work well with the new list-models retry flow.One minor consistency point: here you treat
ExtraFields.Latency < 0as invalid and accept0as OK, whereas other validators treat<= 0as an error whenShouldHaveLatencyis true. If a zero latency is effectively “missing/invalid” for tests (as it seems in other paths), aligning the check to<= 0would make behavior consistent across response types.Example tweak:
- if expectations.ShouldHaveLatency { - if response.ExtraFields.Latency < 0 { - result.Passed = false - result.Errors = append(result.Errors, fmt.Sprintf("Invalid latency: %d ms (should be non-negative)", response.ExtraFields.Latency)) - } else { - result.MetricsCollected["latency_ms"] = response.ExtraFields.Latency - } - } + if expectations.ShouldHaveLatency { + if response.ExtraFields.Latency <= 0 { + result.Passed = false + result.Errors = append(result.Errors, fmt.Sprintf("Expected latency information but not present or invalid (got %d ms)", response.ExtraFields.Latency)) + } else { + result.MetricsCollected["latency_ms"] = response.ExtraFields.Latency + } + }If
0is intentionally allowed for list-models, then current code is fine; just be aware it diverges from other validators.Also applies to: 1030-1102
tests/core-providers/scenarios/test_retry_framework.go (6)
163-167: ListModels-specific types are good, but default config API is a bit inconsistentIntroducing
ListModelsRetryConditionandListModelsRetryConfigmirrors the other typed configs and makes list-models tests easier to reason about.Two small inconsistencies worth addressing:
DefaultListModelsRetryConfigreturnsTestRetryConfig, notListModelsRetryConfig, despite the name/comment. That’s fine forGetTestRetryConfigForScenario, but confusing for callers who might expect aListModelsRetryConfigsuitable forWithListModelsTestRetry.
ListModelsRetryConfig.ConditionsandcheckListModelsRetryConditionsare currently unused inWithListModelsTestRetry, so providing conditions has no effect.If you want to keep the generic and specific paths, consider:
- Either renaming
DefaultListModelsRetryConfigto clarify it returnsTestRetryConfigfor scenario-level wiring, or- Adding a parallel helper such as
DefaultListModelsOperationRetryConfig() ListModelsRetryConfigand/or actually invokingcheckListModelsRetryConditionsinsideWithListModelsTestRetrywhen buildingretryReason.This is API polish rather than a correctness bug, but tightening it now will avoid confusion later.
Also applies to: 247-255, 940-957, 1144-1145
258-273: Standardizing default retry configs to 10 attempts with 2s base/10s max backoffYou’ve normalized most of the default
TestRetryConfigcreators (general, tool-calls, multi-tool, image, streaming, conversation, speech, transcription, reasoning, embedding) toMaxAttempts: 10,BaseDelay: 2s,MaxDelay: 10s. That’s consistent with the framework’s goal of being resilient to flaky LLM behavior.The only trade-off is potential test runtime increase in worst-case failure scenarios. If that ever becomes an issue in CI, these values might be good candidates to drive from shared config (e.g.,
config.ComprehensiveTestConfig) rather than hard-coded literals.Also applies to: 770-783, 787-799, 803-818, 820-841, 845-857, 861-873, 875-894, 897-910, 913-922, 925-938
722-764: Timeout detection is robust but “timeout” substring may be a bit broad
timeoutPhrasesplusisTimeoutErrorprovide a pragmatic way to treat gateway/context timeouts as special, checking both the rawError.Messageand the higher-levelGetErrorMessage(err).One thing to be aware of is that the bare
"timeout"phrase could, in theory, produce false positives on unrelated messages that happen to include that word. If you ever see misclassified errors in logs, consider tightening that particular phrase (e.g., to" timeout"or more specific variants) while keeping the others as-is.
968-1100: Dual API retry: behavior is good; consider adding ❌ prefix to timeout retry reasons
WithDualAPITestRetrycorrectly:
- Treats timeouts from either API as a primary retry trigger with exponential backoff.
- Retries when either Chat or Responses validation fails, and logs per-API validation details with
❌prefixes.- On final failure, wraps accumulated errors with a prefixed
❌ dual API test failed …message.One minor consistency issue: the timeout-only
retryReasonstrings ("Chat API timeout error: ...","Responses API timeout error: ...") don’t include an explicit❌prefix, unlike the other retry paths. To keep logs uniform you could do:- retryReason = fmt.Sprintf("Chat API timeout error: %s", GetErrorMessage(chatErr)) + retryReason = fmt.Sprintf("❌ Chat API timeout error: %s", GetErrorMessage(chatErr)) ... - retryReason = fmt.Sprintf("Responses API timeout error: %s", GetErrorMessage(responsesErr)) + retryReason = fmt.Sprintf("❌ Responses API timeout error: %s", GetErrorMessage(responsesErr))This is purely cosmetic but aligns with the rest of the framework’s error messaging.
1174-1831: Per-operation retry wrappers (TextCompletion/Speech/Embedding/Transcription) are consistent but heavily duplicatedThe new
WithTextCompletionTestRetry,WithSpeechTestRetry,WithEmbeddingTestRetry, andWithTranscriptionTestRetry:
- Follow the same pattern as the Chat/Responses wrappers (validate-first, always retry on validation errors, special handling for timeouts).
- Ensure
OnFinalFailmessages include a❌prefix and surface the last validation errors clearly.- Provide operation-specific condition hooks via
checkTextCompletionRetryConditions,checkSpeechRetryConditions,checkEmbeddingRetryConditions, andcheckTranscriptionRetryConditions.Functionally this is solid and matches the documented strategy. The only downside is the amount of near-identical retry-loop code, which may be harder to maintain long-term if you tweak behavior (e.g., adding a new retry reason) and need to change it in several places.
If this framework keeps growing, consider extracting a generic “validate + retry” helper that takes typed adapters (validate func, condition checker, final error builder) to reduce duplication, but this is not urgent.
1843-1965: ListModels retry wrapper matches “always retry on any failure” but doesn’t use its Conditions or ❌ prefixes consistently
WithListModelsTestRetry:
- Retries on any error, nil response, validation failure, or empty
Data, which matches the commentary about being maximally defensive for list-models tests.- Uses
ValidateListModelsResponseto centralize structural checks, and avoids exiting early until either validation passes with no error or attempts are exhausted.Two follow-ups to consider:
Conditions are ignored
Even thoughListModelsRetryConfighas aConditions []ListModelsRetryConditionfield and you’ve addedcheckListModelsRetryConditions, the wrapper never calls it. If you intend these conditions to be usable, you probably want to fold them into theshouldRetry/retryReasoncomputation (even if validation failures remain the primary trigger).Logging consistency
Some messages in this path lack the❌prefix that you’ve standardized elsewhere:
- Timeout branch:
retryReason = fmt.Sprintf("timeout error detected: %s", ...)- Final
OnFinalFail:fmt.Errorf("final error: %s", errorMsg)with no prefix.To keep logs coherent, you could add the prefix in these few places, similar to the other wrappers.
Neither affects correctness, but addressing both would make the list-models path behave more like the rest of the framework and make the new condition type actually useful.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
core/providers/gemini/utils.go(2 hunks)framework/streaming/accumulator.go(3 hunks)plugins/mocker/main.go(3 hunks)tests/core-providers/scenarios/chat_completion_stream.go(2 hunks)tests/core-providers/scenarios/complete_end_to_end.go(0 hunks)tests/core-providers/scenarios/embedding.go(1 hunks)tests/core-providers/scenarios/end_to_end_tool_calling.go(0 hunks)tests/core-providers/scenarios/error_parser.go(2 hunks)tests/core-providers/scenarios/image_base64.go(0 hunks)tests/core-providers/scenarios/image_url.go(0 hunks)tests/core-providers/scenarios/list_models.go(3 hunks)tests/core-providers/scenarios/multi_turn_conversation.go(0 hunks)tests/core-providers/scenarios/multiple_images.go(0 hunks)tests/core-providers/scenarios/reasoning.go(0 hunks)tests/core-providers/scenarios/response_validation.go(10 hunks)tests/core-providers/scenarios/responses_stream.go(6 hunks)tests/core-providers/scenarios/speech_synthesis.go(3 hunks)tests/core-providers/scenarios/speech_synthesis_stream.go(3 hunks)tests/core-providers/scenarios/test_retry_framework.go(28 hunks)tests/core-providers/scenarios/text_completion.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(4 hunks)tests/core-providers/scenarios/tool_calls_streaming.go(4 hunks)tests/core-providers/scenarios/transcription.go(7 hunks)tests/core-providers/scenarios/transcription_stream.go(4 hunks)tests/core-providers/scenarios/utils.go(1 hunks)tests/core-providers/scenarios/validation_presets.go(2 hunks)tests/core-providers/tests.go(1 hunks)transports/bifrost-http/lib/config.go(13 hunks)
💤 Files with no reviewable changes (7)
- tests/core-providers/scenarios/multi_turn_conversation.go
- tests/core-providers/scenarios/image_url.go
- tests/core-providers/scenarios/end_to_end_tool_calling.go
- tests/core-providers/scenarios/multiple_images.go
- tests/core-providers/scenarios/image_base64.go
- tests/core-providers/scenarios/reasoning.go
- tests/core-providers/scenarios/complete_end_to_end.go
🧰 Additional context used
🧬 Code graph analysis (9)
tests/core-providers/scenarios/speech_synthesis.go (4)
tests/core-providers/scenarios/test_retry_framework.go (5)
GetTestRetryConfigForScenario(1118-1150)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)transports/bifrost-http/handlers/inference.go (1)
SpeechRequest(206-210)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/chat_completion_stream.go (2)
tests/core-providers/scenarios/test_retry_framework.go (3)
StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)core/schemas/bifrost.go (3)
BifrostStream(318-325)BifrostError(353-362)ChatCompletionStreamRequest(90-90)
plugins/mocker/main.go (1)
core/providers/gemini/types.go (1)
Content(871-879)
transports/bifrost-http/lib/config.go (2)
core/schemas/mcp.go (1)
MCPConfig(11-13)framework/configstore/clientconfig.go (1)
GovernanceConfig(67-74)
tests/core-providers/scenarios/utils.go (5)
tests/core-providers/scenarios/test_retry_framework.go (5)
DefaultSpeechRetryConfig(860-873)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)tests/core-providers/scenarios/validation_presets.go (2)
SpeechExpectations(148-164)ModifyExpectationsForProvider(301-348)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)core/schemas/bifrost.go (2)
BifrostError(353-362)SpeechRequest(94-94)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/speech_synthesis_stream.go (1)
tests/core-providers/scenarios/test_retry_framework.go (2)
WithSpeechStreamValidationRetry(1990-2128)SpeechStreamValidationResult(1980-1986)
tests/core-providers/scenarios/response_validation.go (4)
core/schemas/models.go (1)
BifrostListModelsResponse(36-45)core/schemas/bifrost.go (3)
BifrostError(353-362)RequestType(83-83)ListModelsRequest(86-86)tests/core-providers/scenarios/error_parser.go (3)
ParseBifrostError(47-104)FormatErrorConcise(353-356)LogError(359-366)plugins/mocker/main.go (1)
Latency(126-130)
tests/core-providers/scenarios/test_retry_framework.go (11)
core/schemas/models.go (1)
BifrostListModelsResponse(36-45)core/schemas/bifrost.go (3)
BifrostError(353-362)ErrorField(371-378)BifrostStream(318-325)tests/core-providers/scenarios/response_validation.go (6)
ResponseExpectations(18-43)ValidateChatResponse(68-111)ValidateTextCompletionResponse(114-154)ValidateSpeechResponse(203-237)ValidateEmbeddingResponse(314-348)ValidateTranscriptionResponse(240-274)core/schemas/chatcompletions.go (1)
BifrostChatResponse(25-40)core/utils.go (1)
GetErrorMessage(239-276)tests/core-providers/scenarios/test_retry_conditions.go (3)
StreamErrorCondition(537-537)IncompleteStreamCondition(582-582)EmptySpeechCondition(667-667)core/schemas/textcompletions.go (1)
BifrostTextCompletionResponse(69-77)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)core/schemas/embedding.go (1)
BifrostEmbeddingResponse(22-28)core/schemas/transcriptions.go (1)
BifrostTranscriptionResponse(16-26)tests/core-providers/scenarios/error_parser.go (2)
ParseBifrostError(47-104)FormatErrorConcise(353-356)
tests/core-providers/scenarios/responses_stream.go (5)
tests/core-providers/scenarios/test_retry_framework.go (3)
StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)core/schemas/bifrost.go (2)
BifrostStream(318-325)BifrostError(353-362)ui/lib/types/logs.ts (1)
BifrostError(223-229)core/schemas/responses.go (6)
ResponsesStreamResponseType(1351-1351)BifrostResponsesStreamResponse(1421-1459)ResponsesStreamResponseTypeOutputTextDelta(1369-1369)ResponsesStreamResponseTypeOutputItemAdded(1363-1363)ResponsesStreamResponseTypeContentPartAdded(1366-1366)ResponsesStreamResponseTypeError(1418-1418)tests/core-providers/scenarios/validation_presets.go (2)
GetExpectationsForScenario(209-294)ModifyExpectationsForProvider(301-348)
🔇 Additional comments (28)
transports/bifrost-http/lib/config.go (2)
1615-1632: LGTM! Persistence-first order improves consistency.The reversal of operations (store update before in-memory update) ensures that the in-memory
Providersmap only reflects successfully persisted state. IfConfigStore.AddProviderfails, the early return prevents the in-memory update, maintaining consistency.Note: If
UpdateEnvKeysfails (line 1623), a warning is logged but execution continues to update the in-memory map (line 1631). This creates a minor inconsistency where EnvKeys in the store may be stale while Providers is updated, though this appears intentional given the warning-level handling.
1615-1632: Clarify relationship to PR objectives.The PR is titled "feat: core tests retry logic enhancements" and the AI summary mentions defensive nil checks and retry frameworks for tests. However, this file contains persistence order changes in
AddProviderwith no visible retry logic, nil checks, or test-related modifications.Please clarify whether this file's changes are:
- Part of a broader refactor unrelated to the PR's stated objectives, or
- Accidentally included in this PR, or
- Related in a way not evident from the diff
framework/streaming/accumulator.go (1)
268-296: LGTM! Robust nil guards prevent panics.The added nil checks for
message.Contentbefore accessingContentStrandContentBlocksare defensive and prevent potential nil pointer dereferences during streaming accumulation. The initialization logic (lines 290-292) ensuresContentis non-nil before assignment, maintaining safe content assembly.core/providers/gemini/utils.go (2)
416-431: LGTM! Nil guards align with broader safety improvements.The nil checks for
message.Contentbefore accessingContentStrandContentBlocksprevent potential nil dereferences during Bifrost-to-Gemini message conversion. This aligns with the nil-safety pattern applied acrossframework/streaming/accumulator.goandplugins/mocker/main.go.
477-493: LGTM! Safe content extraction for tool responses.The nil guard for
message.Contentbefore extracting content for tool responses is consistent with the PR-wide nil-safety improvements. The fallback logic assembles text fromContentBlockswhen present, maintaining safe access patterns.plugins/mocker/main.go (1)
647-723: LGTM! Nil guards in content extraction prevent crashes.The nil checks for
message.Contentbefore accessingContentStrin both single-message (line 671) and multi-message (line 680) paths, as well as inResponsesRequestflows (lines 698-716), are defensive and prevent potential nil dereferences during fast content extraction for mocking rules. This aligns with the nil-safety improvements inframework/streaming/accumulator.goandcore/providers/gemini/utils.go.tests/core-providers/tests.go (1)
66-118: LGTM! Consistent status formatting for test summaries.The shift from emoji-based prefixes (
✅/❌) to bracketed status indicators ([ENABLED]/[SKIPPED]/[TOTAL]) improves consistency and machine-readability in test output without altering functionality.tests/core-providers/scenarios/error_parser.go (1)
378-399: LGTM! Consistent error prefix formatting.The update ensures all
RequireNoErrorfailures include a❌prefix for visual consistency with retry-enabled test paths. The defensive check (line 394) prevents duplicate prefixes if already present.tests/core-providers/scenarios/transcription_stream.go (3)
84-116: LGTM! Retry framework improves TTS generation reliability.The integration of
WithSpeechTestRetrywith a structured retry config and context adds resilience to TTS generation for transcription tests. The expectations are dynamically adjusted per provider viaModifyExpectationsForProvider, which is appropriate for handling provider-specific behavior.
148-166: LGTM! Streaming retry wrapper enhances test stability.The
WithStreamRetrywrapper forTranscriptionStreamRequestprovides structured retry logic with clear context metadata (scenario name, expected behavior, provider/model info), improving resilience against transient streaming failures.
264-266: Verify latency validation relaxation doesn't mask issues.The shift from implicit fatal assertion to a warning when
lastTokenLatency == 0(also at lines 500, 595) reduces test strictness. While this may prevent flaky failures if zero latency is legitimate for some providers or inter-chunk responses, it could also hide scenarios where latency reporting is broken.Confirm that zero latency is expected/acceptable in streaming transcription chunks for the providers under test, or consider logging provider context when this warning triggers to aid debugging:
if lastTokenLatency == 0 { - t.Logf("⚠️ Warning: Last token latency is 0") + t.Logf("⚠️ Warning: Last token latency is 0 (provider: %s, model: %s)", testConfig.Provider, testConfig.TranscriptionModel) }tests/core-providers/scenarios/chat_completion_stream.go (3)
42-60: LGTM! Retry wrapper improves streaming test reliability.The integration of
WithStreamRetryfor chat completion streaming with a detailed retry context (scenario name, expected behavior, metadata) adds structured resilience to handle transient streaming failures.
186-193: Verify content-length validation removal doesn't reduce coverage.The removal of explicit
MinContentLengthandMaxContentLengthexpectations relaxes validation strictness. While this may reduce flaky test failures caused by variable response lengths, it also eliminates bounds checking that could catch truncated or bloated responses.Confirm this relaxation aligns with the test's intent. If length bounds are still important, consider adding them back as warnings rather than hard failures, or verify that other validations (e.g.,
ShouldContainAnyOffor "paris") provide sufficient coverage:// Option: Add length as a warning-level check if needed if len(finalContent) < 50 || len(finalContent) > 1000 { t.Logf("⚠️ Content length outside expected range: %d chars", len(finalContent)) }
241-259: LGTM! Retry framework for streaming with tools.The retry wrapper for streaming with tool calls includes appropriate expectations (
should_have_tool_calls,tool_name: "get_weather") and metadata, enhancing test resilience while maintaining validation rigor.tests/core-providers/scenarios/text_completion_stream.go (4)
52-70: LGTM! Retry framework for text completion streaming.The
WithStreamRetrywrapper forTextCompletionStreamRequestwith a structured retry config and context aligns with the retry framework applied across other streaming test scenarios (chat completion, transcription), improving test stability.
181-187: Verify latency validation relaxation aligns with expected behavior.Similar to the transcription streaming tests, the shift from an implicit fatal assertion to a warning when
Latency <= 0(line 182-183) reduces strictness. Confirm this is intentional and that zero/negative latency is acceptable for inter-chunk responses or specific providers.Consider logging provider and model context when this warning triggers to aid debugging:
if lastResponse.BifrostTextCompletionResponse.ExtraFields.Latency <= 0 { - t.Logf("⚠️ Warning: Last streaming chunk missing latency information (got %d ms)", lastResponse.BifrostTextCompletionResponse.ExtraFields.Latency) + t.Logf("⚠️ Warning: Last streaming chunk missing latency information (got %d ms, provider: %s, model: %s)", + lastResponse.BifrostTextCompletionResponse.ExtraFields.Latency, testConfig.Provider, model) }
253-269: LGTM! Dynamic retry contexts for varied prompts.The use of
fmt.Sprintfto create scenario-specific retry context names (e.g.,TextCompletionStreamVariedPrompts_<name>) provides clear traceability in retry logs. The retry wrapper integration is consistent and appropriate.
395-411: LGTM! Retry framework for parameterized streaming tests.The retry wrapper for parameter tests with dynamic scenario naming maintains the structured retry pattern applied across the test suite, enhancing resilience while preserving test intent.
tests/core-providers/scenarios/list_models.go (2)
31-77: ListModels retry integration looks correct and robustThe new retryConfig/context, expectations, and ListModelsRetryConfig wiring into
WithListModelsTestRetryare coherent, and the post-retry checks (error, nil response, empty data) align with the expectations (min_model_count: 1, latency presence). No correctness issues spotted here.Also applies to: 81-81
131-167: Pagination + next-page retry flow is consistent with base ListModelsThe pagination test’s retry setup (including
min_model_count: 0for sparse pages) and reuse oflistModelsRetryConfigfor the next-page fetch are consistent with the base ListModels path and should make this test resilient to transient issues while still enforcing page-size and structural checks.Also applies to: 187-209
tests/core-providers/scenarios/speech_synthesis_stream.go (1)
218-220: Treating zero last-token latency as a warning is appropriateDowngrading
lastTokenLatency == 0to a warning log (instead of a hard failure) keeps tests from flaking on providers that omit or zero-fill per-chunk latency while still surfacing the anomaly.Also applies to: 343-345
tests/core-providers/scenarios/response_validation.go (3)
46-53: ToolCallExpectation struct looks consistent with downstream validationThe fields here line up with how
validateSingleToolCallinspects required/forbidden args, types, and values, andValidateArgsJSONgates JSON parsing correctly. No issues from a correctness standpoint.
385-449: Converting content expectations from hard failures to warnings is appropriate for LLM nondeterminismThe shift from errors to warnings for keyword presence, forbidden words, and pattern matching in chat/text/responses content validation keeps structural failures as true test blockers while avoiding flakes from natural output variance. The added context (truncated content in messages) is helpful for debugging and the behavior is consistent across all three validators.
Also applies to: 571-636, 715-781
1348-1369: Error logging now consistently prefixes failures with ❌Ensuring each individual validation error line gets a
❌prefix (without double-prefixing) makes logs much easier to scan, and matches the rest of the retry/error formatting in this PR. Implementation is straightforward and side‑effect free.tests/core-providers/scenarios/test_retry_framework.go (4)
16-40: High-level retry framework docs are clear and match implementationThe top-of-file commentary (validation-first, retries on validation failures and timeouts, conditions as secondary) matches how the various
With...TestRetryhelpers now behave. This will be useful for future contributors navigating a fairly involved test harness.
275-425: Chat and Responses retry wrappers correctly prioritize validation failures and timeouts
WithChatTestRetryandWithResponsesTestRetry:
- Validate responses first and immediately return on success.
- Always retry when there are validation errors (regardless of conditions), matching the documented strategy.
- Give timeout errors precedence via
isTimeoutError, and prefix timeout and validation retry reasons with❌for clear logs.- On exhausted attempts, synthesize a
BifrostErrorwith status 400 and a clearly marked❌ Validation failed …message to force test failure, and ensureOnFinalFailmessages also carry the prefix.The loop bounds use
attempt <= config.MaxAttemptsand always break/return, so there’s no risk of infinite retry. This looks correct and consistent with the framework goals.Also applies to: 428-579
581-707: Streaming retry helper now has clear timeout handling and consistent error formatting
WithStreamRetry:
- Always retries on timeout errors (via
isTimeoutError) irrespective of other conditions.- Falls back to condition-based retries and, if none match, still retries on any streaming error, which is reasonable given the flakiness of streams.
- Adds
❌prefixes to both first-attempt error logs and final-failure logs, keeping output consistent with non-stream paths.Given
MaxAttemptsis bounded and backoff is capped, this is safe from a retry-storm perspective.
1980-2287: Streaming validation retry wrappers add useful end-to-end coverage for speech and responses
WithSpeechStreamValidationRetryandWithResponsesStreamValidationRetry:
- Treat stream acquisition errors similarly to
WithStreamRetry(timeouts always retried, condition-based retries next, then a “retry on any error” fallback for streaming).- Add another retry layer around content validation, using
validateStreamto consume the channel and decide whether to retry based on errors / stream errors.- Aggregate all errors and ensure final
OnFinalFailand fallback logs are prefixed with❌, which keeps streaming diagnostics consistent with non-stream tests.The control flow always terminates (bounded by
MaxAttempts), and the separation of concerns (operation vs.validateStream) is clear. No functional issues here.
1c9c6fe to
2b3d485
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/core-providers/scenarios/response_validation.go (1)
46-53: Re‑gating JSON parsing invalidateSingleToolCallcan break existing expectations ifValidateArgsJSONis left falseWith
ValidateArgsJSONadded,validateSingleToolCallonly populatesargswhenexpected.ValidateArgsJSONis true, but the required/forbidden/type/value checks run unconditionally againstargs:
- When
ValidateArgsJSONis false (the zero value for existing presets),argsstaysnil.- All
RequiredArgswill then be treated as missing, and other checks will silently see no values, changing semantics from the prior behavior where arguments were always parsed when possible.If older
ToolCallExpectationvalues weren’t updated to setValidateArgsJSON = true, you’ll see spurious “missing required argument” / type mismatch errors even though the underlying arguments are unchanged.Consider one of the following to avoid surprises:
func validateSingleToolCall(arguments interface{}, expected ToolCallExpectation, choiceIdx, callIdx int, result *ValidationResult) { - var args map[string]interface{} + var args map[string]interface{} - if expected.ValidateArgsJSON { + if expected.ValidateArgsJSON { // current JSON parsing logic... } + + // If we never parsed args and we’re not enforcing JSON validity, skip argument-level checks. + if args == nil && !expected.ValidateArgsJSON { + return + }or, alternatively, always attempt to coerce
argumentsinto amap[string]interface{}(maps, JSON strings, etc.) but only treat JSON parse failures as hard errors whenValidateArgsJSONis true.Also applies to: 1256-1345
tests/core-providers/scenarios/embedding.go (1)
135-144:for i := range responseDataLengthdoes not compile; needs an index‑based loop
responseDataLengthis anint, so:for i := range responseDataLength {won’t compile (
cannot range over responseDataLength (variable of type int)), andembedding.gowon’t build.You already enforce
responseDataLength == len(testTexts)above, so an index loop is sufficient:- for i := range responseDataLength { + for i := 0; i < responseDataLength; i++ { vec, extractErr := getEmbeddingVector(response.Data[i]) if extractErr != nil { t.Fatalf("Failed to extract embedding vector for text '%s': %v", testTexts[i], extractErr) } if len(vec) == 0 { t.Fatalf("Embedding vector is empty for text '%s'", testTexts[i]) } embeddings[i] = vec }tests/core-providers/scenarios/utils.go (1)
588-612: Add nil safety checks for err.Error field before dereferencingThe function assumes
err.Erroris always non-nil after checkingerr != nil, but sinceErroris a pointer field (*ErrorField), it can be nil for sparse error payloads. This causes a panic on any dereference. Additionally, compare with the implementation incore/utils.go(line 239), which properly guards withif err.Error != nilchecks. Apply similar defensive checks at lines 598, 601, and 604. Also trim leading spaces from the formatted string when botherrorTypeanderrorCodeare empty to avoid output like" : message".func GetErrorMessage(err *schemas.BifrostError) string { if err == nil { return "" } errorType := "" if err.Type != nil && *err.Type != "" { errorType = *err.Type } if errorType == "" && err.Error != nil && err.Error.Type != nil && *err.Error.Type != "" { errorType = *err.Error.Type } errorCode := "" if err.Error != nil && err.Error.Code != nil && *err.Error.Code != "" { errorCode = *err.Error.Code } errorMessage := "" if err.Error != nil { errorMessage = err.Error.Message } errorString := fmt.Sprintf("%s %s: %s", errorType, errorCode, errorMessage) return strings.TrimSpace(errorString) }
🧹 Nitpick comments (17)
core/providers/bedrock/bedrock.go (1)
909-927: Optimize latency assignment to avoid redundant write.The logic correctly assigns total streaming latency to the final response. However, for the last response, the
Latencyfield is set twice: first at line 916 withtime.Since(lastChunkTime).Milliseconds(), then overwritten at line 926 withtime.Since(startTime).Milliseconds().Consider this optimization:
for i, finalResponse := range finalResponses { finalResponse.ExtraFields = schemas.BifrostResponseExtraFields{ RequestType: schemas.ResponsesStreamRequest, Provider: providerName, ModelRequested: request.Model, ModelDeployment: deployment, ChunkIndex: chunkIndex, - Latency: time.Since(lastChunkTime).Milliseconds(), } chunkIndex++ lastChunkTime = time.Now() if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { finalResponse.ExtraFields.RawResponse = "{}" // Final event has no payload } if i == len(finalResponses)-1 { finalResponse.ExtraFields.Latency = time.Since(startTime).Milliseconds() + } else { + finalResponse.ExtraFields.Latency = time.Since(lastChunkTime).Milliseconds() } providerUtils.ProcessAndSendResponse(ctx, postHookRunner, providerUtils.GetBifrostResponseForStreamResponse(nil, nil, finalResponse, nil, nil), responseChan) }tests/core-providers/scenarios/test_retry_framework.go (1)
940-957: List‑models retry config has an unused conditions hook; consider either wiring it or simplifying for clarityYou’ve introduced:
ListModelsRetryCondition/ListModelsRetryConfig,DefaultListModelsRetryConfig()returning aTestRetryConfig,WithListModelsTestRetrythat unconditionally retries on any failure (errors, nil response, empty data, validation failures), and never callscheckListModelsRetryConditions.Given that
WithListModelsTestRetryintentionally ignores conditions, theConditions []ListModelsRetryConditionfield andcheckListModelsRetryConditionshelper are currently unused. That’s not incorrect, but it adds surface area without behavior.Either:
- keep the current “retry on any failure” behavior, but drop the unused
Conditions/checkListModelsRetryConditionsbits for simplicity, or- explicitly call
checkListModelsRetryConditionsinsideWithListModelsTestRetryif you plan to introduce more nuanced list‑models retry rules later.Also applies to: 1843-1977
tests/core-providers/scenarios/error_parser.go (1)
379-397: Clarify ❌ handling to truly enforce a prefixThe new comment says “ALWAYS includes ❌ prefix”, but the check uses
strings.Contains, so messages that contain ❌ in the middle won’t be prefixed while still not actually starting with it. If you really want a prefix, considerstrings.HasPrefix(message, "❌")(or trimming leading whitespace first), and optionally mirror this behavior inAssertNoErrorfor consistency.tests/core-providers/scenarios/transcription_stream.go (1)
84-114: Retry‑wrapped TTS generation looks good; consider reusing more configThe TTS retry wiring via
WithSpeechTestRetryis sound and aligns with the new framework. Two small improvements you might consider:
- Tie
SpeechExpectations’min_audio_bytesto the per‑case expectation (e.g., derive fromtc.expectMinBytes) instead of hard‑coding100, to keep validation strength consistent with the main speech tests.- The
TestRetryConfig → SpeechRetryConfigmapping pattern (copying max/base/max delay, hooks, empty conditions) is repeated in several places; a small helper to constructSpeechRetryConfigfrom aTestRetryConfigwould reduce duplication and keep future changes localized.tests/core-providers/scenarios/utils.go (1)
536-569: Solid TTS retry integration; minor reuse opportunitiesThe helper now correctly delegates to
WithSpeechTestRetryand enforces non‑empty audio before writing the temp file, which is a nice robustness boost. Given the sameTestRetryConfig → SpeechRetryConfigconversion andSpeechExpectations(100)pattern appear in multiple places, consider extracting a small shared helper (e.g.,NewSpeechRetryConfigFromTestConfig) and, if you ever want a stricter minimum size here, takingminBytesas a parameter instead of fixing it at 100.tests/core-providers/scenarios/chat_completion_stream.go (2)
43-65: ChatCompletionStream retry integration looks good; watch “paris” sensitivityUsing
StreamingRetryConfig+WithStreamRetryfor the base stream path is a clean upgrade, and the consolidation into a syntheticBifrostChatResponsefor validation is preserved. The only thing to double‑check is the newexpectations.ShouldContainAnyOfentry of"paris": ifValidateChatResponsedoes a case‑sensitive check, completions that say"Paris"(capital P) could spuriously fail. If that isn’t already handled, consider normalizing case before comparison or adding"Paris"alongside"paris".Also applies to: 188-210
242-333: With‑tools streaming retry path is solid; optionally surface structured errorsThe with‑tools streaming scenario now correctly wraps
ChatCompletionStreamRequestinWithStreamRetryand gives clear “after retries” failures with a ❌ prefix andGetErrorMessage, which is a nice improvement. One small enhancement to consider: whenresponse == nil || response.BifrostChatResponse == nil, you could log any attachedresponse.BifrostError(if present) viaFormatErrorConcise(ParseBifrostError(...))beforet.Fatalf, which would make debugging provider‑side failures during streaming with tools much easier.tests/core-providers/scenarios/speech_synthesis.go (2)
85-121: Base speech synthesis retry flow is correct; centralize SpeechRetryConfig constructionThe base
SpeechSynthesissubtests now correctly route throughWithSpeechTestRetrywith per‑scenarioTestRetryConfig, expectations, and informative “after retries” failures, whilevalidateSpeechSynthesisSpecificstill enforces audio size/model checks. Given the sameSpeechRetryConfigconstruction pattern appears here, in the HD test, voice tests, andGenerateTTSAudioForTest, it would be worth extracting a small helper (e.g.,NewSpeechRetryConfig(retryConfig TestRetryConfig)) to avoid repetition and keep retry behavior consistent across all speech scenarios.
192-306: Advanced HD and voice‑type speech tests: good retry usage; consider honoring caller contextThe HD long‑text and
AllVoiceOptionssubtests both useGetTestRetryConfigForScenario+WithSpeechTestRetrycorrectly and add targeted validations (min bytes, model consistency, per‑voice audio checks), which strengthens these paths. The only behavioral nit is that each block creates a freshcontext.Background()(requestCtx) instead of reusing thectxpassed intoRunSpeechSynthesisAdvancedTest, so timeouts/cancellations from the higher‑level test harness won’t propagate into the provider calls; if that context is meaningful in your setup, consider threading it through rather than dropping it.tests/core-providers/scenarios/text_completion.go (1)
52-76: Text completion retry wrapper is wired correctlyThe
TextCompletiontest now cleanly delegates toWithTextCompletionTestRetrywith a dedicatedTextCompletionRetryConfig, contextual metadata, and scenario expectations, which matches the broader retry framework. Given the conditions slice is currently empty, you can keep it as‑is, or later introduceTextCompletionRetryConditions (e.g., for empty/degenerate outputs) without changing call sites.tests/core-providers/scenarios/transcription.go (1)
142-177: Unify repeated transcription retry + error-handling patternAll these blocks follow the same structure (build
TranscriptionRetryConfig+TestRetryContext+ expectations, callWithTranscriptionTestRetry, then wrap the error with a “❌” prefix and do nil/empty checks). That’s good for consistency but fairly repetitive.Consider extracting a small helper (e.g.
runTranscriptionWithRetry(t, client, ctx, scenarioName, request, expectations, testConfig)) that encapsulates:
- retry config/context construction,
WithTranscriptionTestRetrycall, and- the common error / nil / empty-text handling.
That would reduce boilerplate and make future tweaks to retry semantics or error formatting one-liner changes.
Also applies to: 231-269, 313-352, 383-421, 453-492
tests/core-providers/scenarios/tool_calls_streaming.go (3)
17-41: Accumulator design is solid; minor cleanup opportunity in API surfaceThe accumulator logic for both chat and responses streaming:
- sensibly prefers stable keys (ID, then itemID, then name/index),
- handles incremental vs. “done” full-JSON arguments,
- and keeps
ItemIDToKeyin sync when IDs show up later.Two minor points:
AccumulateChatToolCall’schoiceIndexparameter is never used; either remove it or incorporate it into the keying strategy if you expect multiple choices with overlappingtoolCall.Indexvalues.- The JSON “complete vs. incremental” heuristic in
AccumulateResponsesToolCall(replace whenargsStrlooks like{...}and we already have partials) is reasonable, but consider documenting that this assumes the “done” event sends a full object and that arguments are always object-shaped (not arrays), to make the intent explicit.Also applies to: 43-92, 94-179, 181-209
356-663: Responses tool-calls streaming handling is thorough; consider slight simplificationThe Responses streaming test:
- covers all relevant event types (
FunctionCallArgumentsDelta,OutputItemAdded,FunctionCallArgumentsDone),- threads through
callID,name, anditemIDwhere available,- and leverages the accumulator to reconcile partial deltas with final arguments.
A couple of non-blocking thoughts:
- In
FunctionCallArgumentsDelta, the “arguments” source selection (Delta vs. Arguments) and verbose logging are great; you might eventually want to reuse a small helper to avoid repeating the callID/name/itemID probing logic across the three cases.- Given that
GetFinalResponsesToolCallsandvalidateStreamingToolCallsalready enforce that ID/Name/Arguments are present, the explicit per-tool-callif toolCall.ID == "" || ...block is redundant and could be dropped to keep the flow slimmer.Functionally this looks correct and aligns with the new accumulator.
666-714:validateStreamingToolCallsbehavior and JSON validation semanticsThe helper gives strong guarantees (all tool calls must have ID, name, and non-empty arguments) and then attempts to parse arguments as JSON, treating failures as warnings as long as the string isn’t blank. That’s a reasonable compromise for mixed/partial streaming formats.
Given that callers already ensure non-empty arguments before invoking this helper, you could rely solely on the
json.Unmarshal+TrimSpace != ""logic here and avoid some duplication with upstream checks, but it’s not strictly necessary.tests/core-providers/scenarios/validation_presets.go (1)
208-293:GetExpectationsForScenario("ResponsesStream", ...)currently falls back to default and is unused
GetExpectationsForScenariodoesn’t define a"ResponsesStream"case, so the"ResponsesStream"call inresponses_stream.gocurrently just returnsBasicChatExpectations, later modified and then never actually used in that file.If you intend to drive validations off scenario-specific expectations for responses streaming, consider either:
- adding a
"ResponsesStream"case that returnsStreamingExpectations(), or- reusing the existing
"ChatCompletionStream"/StreamingExpectationspath and passing that scenario name fromresponses_stream.go.If expectations won’t be used for responses streaming anymore, you might instead remove the dead
expectationsvariable there to avoid confusion.tests/core-providers/scenarios/responses_stream.go (2)
48-252: Validation-driven retry wrapper is well-structured; a couple of small polish pointsThe move to
WithResponsesStreamValidationRetrywith a dedicated validation closure is a nice improvement:
- The closure covers core failure modes (nil chunk, explicit error events, timeout, excessive chunks) and returns structured errors instead of calling
t.Fatalin most places.- Structural validation is separated (
validateResponsesStreamingStructure+validateResponsesStreamingResponse), and the outer test only inspects the finalResponsesStreamValidationResult, which aligns with the retry framework’s design.Two minor suggestions:
- The
expectations := GetExpectationsForScenario("ResponsesStream", ...)block is currently unused; either wire those expectations into validation (e.g., content checks) or drop the block to avoid dead code.- You already verify latency on the last chunk inside
validateResponsesStreamingResponse; populatingvalidationResult.LastLatencyfromlastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency(instead of0) would make the returned result more informative for logging/metrics.Functionally this looks sound.
815-819:validateResponsesStreamingResponseenforces strong but reasonable streaming guaranteesThe new helper:
- enforces non-empty, “substantial” final content,
- requires that some streaming events and specifically content-related events were seen,
- checks sequence-number monotonicity and last-response structure/provider,
- warns (but doesn’t fail) on missing
created/completedor missing story-like tokens,- and now treats missing latency on the last chunk as a hard error.
This is a good encapsulation of streaming correctness. Just be aware that the latency requirement is now a strict pass/fail gate for all providers using this path; if any provider legitimately omits latency, you’ll need to either relax this based on provider or route those providers through a different expectation preset.
Would you like me to suggest a small adjustment (e.g., making the latency check conditional on provider capabilities or a flag in
testConfig) if you find providers that don’t currently emit latency?Also applies to: 821-917
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
core/providers/bedrock/bedrock.go(2 hunks)core/providers/gemini/utils.go(2 hunks)core/providers/openai/openai.go(1 hunks)framework/streaming/accumulator.go(3 hunks)framework/streaming/responses.go(0 hunks)plugins/mocker/main.go(3 hunks)tests/core-providers/scenarios/chat_completion_stream.go(5 hunks)tests/core-providers/scenarios/complete_end_to_end.go(0 hunks)tests/core-providers/scenarios/embedding.go(1 hunks)tests/core-providers/scenarios/end_to_end_tool_calling.go(0 hunks)tests/core-providers/scenarios/error_parser.go(2 hunks)tests/core-providers/scenarios/image_base64.go(0 hunks)tests/core-providers/scenarios/image_url.go(0 hunks)tests/core-providers/scenarios/list_models.go(3 hunks)tests/core-providers/scenarios/multi_turn_conversation.go(0 hunks)tests/core-providers/scenarios/multiple_images.go(0 hunks)tests/core-providers/scenarios/reasoning.go(0 hunks)tests/core-providers/scenarios/response_validation.go(10 hunks)tests/core-providers/scenarios/responses_stream.go(9 hunks)tests/core-providers/scenarios/speech_synthesis.go(3 hunks)tests/core-providers/scenarios/speech_synthesis_stream.go(1 hunks)tests/core-providers/scenarios/test_retry_framework.go(28 hunks)tests/core-providers/scenarios/text_completion.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(3 hunks)tests/core-providers/scenarios/tool_calls_streaming.go(4 hunks)tests/core-providers/scenarios/transcription.go(7 hunks)tests/core-providers/scenarios/transcription_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(1 hunks)tests/core-providers/scenarios/validation_presets.go(2 hunks)tests/core-providers/tests.go(1 hunks)transports/bifrost-http/lib/config.go(13 hunks)
💤 Files with no reviewable changes (8)
- tests/core-providers/scenarios/reasoning.go
- framework/streaming/responses.go
- tests/core-providers/scenarios/multi_turn_conversation.go
- tests/core-providers/scenarios/complete_end_to_end.go
- tests/core-providers/scenarios/image_base64.go
- tests/core-providers/scenarios/end_to_end_tool_calling.go
- tests/core-providers/scenarios/image_url.go
- tests/core-providers/scenarios/multiple_images.go
🚧 Files skipped from review as they are similar to previous changes (5)
- plugins/mocker/main.go
- tests/core-providers/tests.go
- framework/streaming/accumulator.go
- transports/bifrost-http/lib/config.go
- core/providers/gemini/utils.go
🧰 Additional context used
🧬 Code graph analysis (11)
core/providers/openai/openai.go (1)
core/schemas/responses.go (2)
ResponsesStreamResponseTypeCompleted(1359-1359)ResponsesStreamResponseTypeIncomplete(1361-1361)
core/providers/bedrock/bedrock.go (1)
plugins/mocker/main.go (1)
Latency(126-130)
tests/core-providers/scenarios/text_completion_stream.go (1)
tests/core-providers/scenarios/test_retry_framework.go (3)
StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)
tests/core-providers/scenarios/speech_synthesis.go (4)
tests/core-providers/scenarios/test_retry_framework.go (5)
GetTestRetryConfigForScenario(1118-1150)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)core/schemas/provider.go (1)
Provider(207-234)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/speech_synthesis_stream.go (3)
tests/core-providers/scenarios/test_retry_framework.go (2)
WithSpeechStreamValidationRetry(1990-2128)SpeechStreamValidationResult(1980-1986)core/schemas/bifrost.go (2)
BifrostStream(318-325)BifrostError(353-362)core/schemas/speech.go (1)
BifrostSpeechStreamResponse(133-138)
tests/core-providers/scenarios/responses_stream.go (2)
tests/core-providers/scenarios/test_retry_framework.go (5)
WithResponsesStreamValidationRetry(2141-2290)ResponsesStreamValidationResult(2131-2137)StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)core/schemas/bifrost.go (3)
BifrostStream(318-325)BifrostError(353-362)ResponsesStreamRequest(92-92)
tests/core-providers/scenarios/tool_calls_streaming.go (4)
core/schemas/chatcompletions.go (4)
BifrostChatRequest(11-18)ChatParameters(154-183)ChatTool(201-205)BifrostChatResponse(25-40)tests/core-providers/scenarios/test_retry_framework.go (2)
StreamingRetryConfig(821-841)TestRetryContext(170-175)core/schemas/bifrost.go (2)
BifrostStream(318-325)BifrostError(353-362)core/schemas/responses.go (1)
BifrostResponsesStreamResponse(1421-1459)
tests/core-providers/scenarios/test_retry_framework.go (5)
tests/core-providers/scenarios/response_validation.go (1)
ResponseExpectations(18-43)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)core/utils.go (1)
GetErrorMessage(239-276)tests/core-providers/scenarios/test_retry_conditions.go (4)
StreamErrorCondition(537-537)IncompleteStreamCondition(582-582)EmptySpeechCondition(667-667)EmptyResponseCondition(16-16)tests/core-providers/scenarios/error_parser.go (1)
ParseBifrostError(47-104)
tests/core-providers/scenarios/response_validation.go (3)
core/schemas/models.go (1)
BifrostListModelsResponse(36-45)core/schemas/bifrost.go (3)
BifrostError(353-362)RequestType(83-83)ListModelsRequest(86-86)tests/core-providers/scenarios/error_parser.go (3)
ParseBifrostError(47-104)FormatErrorConcise(353-356)LogError(359-366)
tests/core-providers/scenarios/utils.go (3)
tests/core-providers/scenarios/test_retry_framework.go (5)
DefaultSpeechRetryConfig(860-873)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)tests/core-providers/scenarios/validation_presets.go (2)
SpeechExpectations(148-164)ModifyExpectationsForProvider(301-348)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/embedding.go (4)
tests/core-providers/scenarios/test_retry_framework.go (3)
GetTestRetryConfigForScenario(1118-1150)EmbeddingRetryConfig(238-245)WithEmbeddingTestRetry(1503-1653)tests/core-providers/scenarios/validation_presets.go (2)
EmbeddingExpectations(103-115)ModifyExpectationsForProvider(301-348)core/schemas/bifrost.go (2)
BifrostError(353-362)EmbeddingRequest(93-93)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)
🔇 Additional comments (11)
core/providers/bedrock/bedrock.go (1)
909-927: PR metadata inconsistency detected.The PR title indicates "retry logic enhancements" and references Issue #123 about "Files API Support," but the actual changes modify latency reporting for streaming responses. This mismatch suggests the PR description may be incorrect or incomplete.
Please verify:
- Is this the correct PR description for these changes?
- Should the PR title reflect the actual latency reporting changes?
- Is Issue #123 the correct linked issue, or should it reference a different issue about streaming latency improvements?
core/providers/openai/openai.go (1)
1363-1368: Verify intentional behavior difference and test coverage for Incomplete response handling.The logic change is sound—Incomplete responses should terminate the stream. However, verification uncovered a potential inconsistency:
HandleOpenAIChatCompletionStreaming(line 935) terminates only onCompletedHandleOpenAIResponsesStreaming(line 1363) now terminates onCompletedORIncompletePlease confirm:
- Is this intentional architectural difference between the two handlers, or should they behave consistently?
- Do tests cover the
Incompleteresponse type terminating the stream at line 1363?- Why is
Failedresponse type defined but not triggering stream termination—should it also end the stream?tests/core-providers/scenarios/text_completion_stream.go (1)
5-5: Text completion streaming tests now correctly reuse the streaming retry wrapper
fmtimport is justified by the newfmt.Sprintf‑based scenario names, and both the varied‑prompts and parameters subtests correctly wrapTextCompletionStreamRequestwithWithStreamRetryusingStreamingRetryConfigand richTestRetryContextmetadata. This keeps streaming tests consistent with the global retry strategy without changing validation semantics.Also applies to: 253-270, 395-412
tests/core-providers/scenarios/response_validation.go (1)
402-445: Content keyword/pattern checks becoming warnings + ❌‑prefixing errors is appropriate for flaky LLM outputsDowngrading
ShouldContainKeywords/ShouldContainAnyOf/ forbidden words / regex mismatches to warnings (with truncated actual content) for chat, text completion, and responses, while still failing on truly empty content, is a sensible tradeoff for non‑deterministic LLM behavior. ThelogValidationResultschange that ensures each error line gets a❌prefix keeps logs readable, especially when combined with the new retry framework.Also applies to: 588-633, 734-777, 1355-1360
tests/core-providers/scenarios/embedding.go (1)
75-101: Embedding test now correctly routes through the generic retry frameworkThe embedding scenario now:
- derives a
TestRetryConfigfromGetTestRetryConfigForScenario("Embedding", ...),- maps its attempt/backoff and hooks into an
EmbeddingRetryConfig,- wraps
EmbeddingRequestinWithEmbeddingTestRetrywith provider‑aware expectations.This keeps embedding tests aligned with the rest of the retry/validation framework while still allowing embedding‑specific semantic checks afterward.
Also applies to: 103-109
tests/core-providers/scenarios/test_retry_framework.go (1)
16-41: Retry framework + timeout handling are coherent and consistent across test domainsThe refactored retry framework:
- documents the “validation failures always trigger retries” rule clearly,
- unifies retry configs per domain (chat/text/responses/speech/transcription/embedding/list‑models),
- consistently:
- validates responses first,
- always retries on timeouts via
isTimeoutError,- then uses domain‑specific
check*RetryConditionsonly as secondary signals,- prefixes validation/timeout messages with
❌for easier log scanning,- applies exponential backoff with sensible caps.
WithChatTestRetry,WithResponsesTestRetry,WithStreamRetry, and the dual‑API wrapper all follow this pattern without introducing obvious logic or concurrency issues. This is a solid foundation for stabilizing flaky LLM tests.Also applies to: 275-426, 429-579, 581-690, 940-957, 968-1115
tests/core-providers/scenarios/list_models.go (1)
31-52: List‑models tests correctly leverage the new retry helper and keep strong post‑retry assertionsBoth
RunListModelsTestandRunListModelsPaginationTestnow:
- derive a scenario‑specific
TestRetryConfigviaGetTestRetryConfigForScenario,- map it into a
ListModelsRetryConfig(retry‑on‑any‑failure) forWithListModelsTestRetry,- pass clear
TestRetryContextmetadata,- enforce structural correctness after retries (non‑nil response, non‑empty data where appropriate, page size constraints) and reasonable latency bounds,
- surface final failures with explicit ❌ messages using
GetErrorMessage.Once the
expected_providertype alignment is fixed, this should give robust, low‑flake coverage for both listing and pagination behavior.Also applies to: 131-167, 183-209
tests/core-providers/scenarios/transcription.go (1)
80-112: TTS retry integration looks consistent and safeThe SpeechSynthesis retry wiring (config, context, expectations, and
WithSpeechTestRetrycall) is coherent and maintains clear failure semantics (t.Fatalfon final error plus explicit empty-audio guard). No functional issues from this block on its own.tests/core-providers/scenarios/tool_calls_streaming.go (1)
230-342: Chat tool-calls streaming test uses accumulator and retry correctlyThe chat completions streaming test:
- uses
StreamingRetryConfig+WithStreamRetryappropriately,- accumulates tool calls chunk-by-chunk via the accumulator,
- and asserts non-nil channel plus at-least-one-chunk before validating final tool calls.
The additional per-chunk logging (
hasID/hasName/hasArgsplus lengths) should be very helpful when debugging provider-specific streaming quirks. No functional issues spotted here.tests/core-providers/scenarios/validation_presets.go (1)
15-29: Latency expectation wiring is consistent across presetsAdding
ShouldHaveLatency: trueto the basic, embedding, speech, and transcription presets, and then threading it throughCombineExpectationsensures latency presence is treated as a first-class expectation across scenarios.Given that the validation logic in streaming tests now asserts latency on the last chunk, this alignment looks intentional and consistent. No issues here.
Also applies to: 103-115, 147-188, 387-447
tests/core-providers/scenarios/responses_stream.go (1)
307-327: Consistent use ofStreamingRetryConfig+WithStreamRetryin auxiliary streaming testsFor the
ResponsesStreamWithTools,ResponsesStreamWithReasoning, andResponsesStreamLifecyclesubtests, switching toStreamingRetryConfigwithWithStreamRetrykeeps retry behavior and logging consistent with the rest of the streaming suite, while preserving the existing per-chunk assertions.The retry contexts also capture relevant metadata (provider/model/flags) for debugging. No issues here.
Also applies to: 440-459, 565-583
57aa424 to
78cb6be
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/core-providers/scenarios/transcription_stream.go (1)
176-321: Stream transcription only accumulatesDelta, ignoringText, which can cause false failuresInside the main streaming loop you only append
transcribeData.DeltatofullTranscriptionText, whiletranscribeData.Textis only logged. If a provider streams viaText(or setsTexton final chunks) and never usesDelta,fullTranscriptionTextwill remain empty and the later checks:
if fullTranscriptionText == "" { ... }- word‑match round‑trip validation
will incorrectly fail even though valid transcription data was received.
Consider also appending non-empty
Textwhen buildingfullTranscriptionText, e.g.:- // Handle delta vs complete text chunks - if transcribeData.Delta != nil { - // This is a delta chunk - deltaText := *transcribeData.Delta - fullTranscriptionText += deltaText - t.Logf("✅ Received transcription delta chunk %d with latency %d ms: '%s'", chunkIndex, response.BifrostTranscriptionStreamResponse.ExtraFields.Latency, deltaText) - } + // Handle delta vs complete text chunks + if transcribeData.Delta != nil { + // This is a delta chunk + deltaText := *transcribeData.Delta + fullTranscriptionText += deltaText + t.Logf("✅ Received transcription delta chunk %d with latency %d ms: '%s'", + chunkIndex, + response.BifrostTranscriptionStreamResponse.ExtraFields.Latency, + deltaText, + ) + } else if transcribeData.Text != "" { + // Fallback for providers that stream full text instead of deltas + fullTranscriptionText += transcribeData.Text + }This keeps current delta behavior but also supports text-based streaming providers.
tests/core-providers/scenarios/embedding.go (1)
135-144: Fix invalidrangeand potential out-of-bounds in embedding validation loop
validateEmbeddingSemanticscurrently has:responseDataLength := len(response.Data) // ... for i := range responseDataLength { vec, extractErr := getEmbeddingVector(response.Data[i]) // ... }Two problems:
rangeover anintdoes not compile in Go.responseDataLengthmay be reassigned tolen(response.Data[0].Embedding.Embedding2DArray)whilelen(response.Data)is still 1; indexingresponse.Data[i]fori >= len(response.Data)will panic.A minimal, safer fix is to iterate over the actual slice and ignore
responseDataLengthin the loop:- for i := range responseDataLength { - vec, extractErr := getEmbeddingVector(response.Data[i]) + for i := 0; i < responseDataLength; i++ { + if i >= len(response.Data) { + t.Fatalf("Embedding response shape mismatch: response.Data has length %d but expected at least %d elements", len(response.Data), responseDataLength) + } + vec, extractErr := getEmbeddingVector(response.Data[i])Depending on how 2D embeddings are represented in
EmbeddingData, you may instead want to branch explicitly between “flatDataper input” and “singleDataitem withEmbedding2DArray” and extract vectors from the correct structure, but the current form will not even compile and can panic on 2D cases.
🧹 Nitpick comments (19)
tests/core-providers/tests.go (1)
104-104: Consider removing duplicate status bracketing.The status indicator appears twice on each line: once at the beginning and once after the scenario name. For example, line 104 produces output like
[ENABLED] SUPPORTED: TextCompletion [ENABLED] Configured to run. The second occurrence seems redundant, especially since the summary lines (113-115) only use single bracketing.Apply this diff to simplify the output:
- t.Logf("[ENABLED] SUPPORTED: %-25s [ENABLED] Configured to run", scenario.name) + t.Logf("[ENABLED] SUPPORTED: %-25s Configured to run", scenario.name)- t.Logf("[SKIPPED] UNSUPPORTED: %-25s [SKIPPED] Not supported by provider", scenario.name) + t.Logf("[SKIPPED] UNSUPPORTED: %-25s Not supported by provider", scenario.name)Also applies to: 107-107
core/providers/gemini/utils.go (1)
416-431: Nil-guard fixes for Gemini message content are correct; consider centralizing text extractionThe additional
message.Content != nilchecks around both standard content and tool-response handling remove a real nil-deref risk when Content is absent while preserving existing behavior forContentStrand textContentBlocks. This is a solid fix.Optionally, the repeated “extract text from
ChatMessageContent/ ContentBlocks” logic in these two branches could be pulled into a small helper (e.g.,extractTextFromChatContent(...)) to keep future changes in one place.Also applies to: 477-493
tests/core-providers/scenarios/speech_synthesis.go (1)
279-314: Per‑voice speech synthesis tests are now robustly retriedThe
AllVoiceOptionssubtests now useWithSpeechTestRetrywith a voice‑specificSpeechRetryConfigandTestRetryContext, plus minimal audio‑size thresholds. This should significantly reduce flaky voice‑specific failures while still failing when a voice delivers no or too‑small audio.Consider extracting a small helper to build
SpeechRetryConfigfrom aTestRetryConfigto avoid repeating the same field-copy boilerplate in multiple places.tests/core-providers/scenarios/speech_synthesis_stream.go (1)
375-377: Streaming multi‑voice test: loop capture fix and validation‑driven retries look solidThe
voiceCopy := voicepattern and exclusive use ofvoiceCopyinside the parallel subtest correctly fix the prior loop‑variable capture issue. The newWithSpeechStreamValidationRetryintegration plus the per‑voice validator—which checks for stream errors, presence of audio data, and non‑zeroLastLatency—provides a clear, retry‑aware pass/fail signal per voice. This is a clean and robust upgrade to the streaming speech tests.If you see similar validators emerging elsewhere, it may be worth extracting a reusable “basic speech stream validator” helper that takes a voice label, but it’s fine as‑is for now.
Also applies to: 397-408, 412-469, 471-477
tests/core-providers/scenarios/transcription.go (1)
80-103: Retry integration for speech/transcription tests looks correct; consider deduplicating config mappingThe new use of
GetTestRetryConfigForScenario+WithSpeechTestRetry/WithTranscriptionTestRetryis consistent with the shared framework and the expectations objects, and the error handling withGetErrorMessage+❌prefixing is coherent.One small improvement: every call site manually maps
TestRetryConfig→SpeechRetryConfig/TranscriptionRetryConfigwith identical field copies andConditions: []…{}. A thin helper (e.g.,NewSpeechRetryConfigFromTestConfig(...),NewTranscriptionRetryConfigFromTestConfig(...)) would reduce repetition and make it harder to forget to propagate new fields or conditions in the future.Also applies to: 142-171, 231-269, 313-335, 383-421, 453-492
tests/core-providers/scenarios/transcription_stream.go (1)
147-167: Streaming transcription retry configs use generic defaults for some scenariosFor the main round-trip stream test you call:
retryConfig := GetTestRetryConfigForScenario("TranscriptionStream", testConfig)which maps to
DefaultTranscriptionRetryConfig(). For JSON/lang/prompt variants you use scenario names like"TranscriptionStreamJSON","TranscriptionStreamLang","TranscriptionStreamPrompt"that currently fall through toDefaultTestRetryConfig().Given
StreamingRetryConfig()exists with stream-specific conditions (StreamErrorCondition,IncompleteStreamCondition) and comments explicitly avoidingEmptyResponseConditionfor streams, it may be clearer and more consistent to:
- Either add dedicated cases for these
"TranscriptionStream*"scenario names inGetTestRetryConfigForScenariothat returnStreamingRetryConfig(), or- Bypass
GetTestRetryConfigForScenariohere and callStreamingRetryConfig()directly, as is done in some other streaming tests.Not a blocker, but tightening this mapping would make streaming retry behavior more predictable across scenarios.
Also applies to: 357-373, 448-463, 530-547
tests/core-providers/scenarios/test_retry_framework.go (1)
257-273: Retry framework is coherent; consider tightening streaming configs and error handling assumptionsThe expanded framework (validation‑first retry wrappers, timeout‑aware backoff, unified logging with ❌ prefixes) is internally consistent and a good fit for flaky LLM behavior across chat, responses, embeddings, transcription, and list models.
Two things to keep in mind as this grows:
Streaming configs vs conditions
WithStreamRetry,WithSpeechStreamValidationRetry, andWithResponsesStreamValidationRetryassume thatTestRetryConfig.Conditionsare safe to evaluate in a streaming context (only haveerrandTestRetryContext, no single response object).StreamingRetryConfig()follows that contract (onlyStreamErrorCondition/IncompleteStreamCondition) and explicitly avoidsEmptyResponseCondition.- Ensure all stream callers either:
- Use
StreamingRetryConfig(), or- Provide a
TestRetryConfigwhose conditions are stream‑safe; otherwise, condition evaluators written for non‑stream responses may panic or behave unexpectedly when called withnilresponses.Defensive error parsing for
isTimeoutError
isTimeoutErrornow gates the “ALWAYS retry” path across many wrappers and callsGetErrorMessage(err)fromutils.go. That helper assumeserr.Erroris non‑nil and fully populated.- If any code path produces a
BifrostErrorwith a nilErrorfield, the new, more frequent use ofGetErrorMessage(including inside streaming retry logic) could start panicking tests that previously only loggederr.Error.Messageconditionally.- Consider hardening
GetErrorMessage(and/orisTimeoutError) with additionalerr.Error != nilchecks before accessing nested fields to make the retry framework robust against incomplete error objects.These are refinements rather than blockers; the overall structure and cross‑type consistency of the retry logic look solid.
Also applies to: 581-689, 710-764, 820-841, 896-910, 940-957, 1117-1149, 1843-1966, 1980-2128, 2130-2289
tests/core-providers/scenarios/embedding.go (1)
75-103: Embedding retry wiring is consistent; consider preserving embedding-specific conditionsThe use of
GetTestRetryConfigForScenario("Embedding", ...)andWithEmbeddingTestRetryis consistent with the shared retry framework, and the expectations wiring looks good.Right now, though,
EmbeddingRetryConfigis constructed with an emptyConditionsslice:embeddingRetryConfig := EmbeddingRetryConfig{ MaxAttempts: retryConfig.MaxAttempts, BaseDelay: retryConfig.BaseDelay, MaxDelay: retryConfig.MaxDelay, Conditions: []EmbeddingRetryCondition{}, // ... OnRetry: retryConfig.OnRetry, OnFinalFail: retryConfig.OnFinalFail, }If you intended to reuse any embedding‑specific retry conditions from
DefaultEmbeddingRetryConfig(e.g., empty/invalid embeddings), consider either:
- Exposing them as
[]EmbeddingRetryConditionand populatingConditionshere, or- Removing the field from
EmbeddingRetryConfiguntil you actually need per‑scenario conditions, to avoid implying that something is wired when it isn’t yet.Not blocking, just a clarity/DRY tweak.
tests/core-providers/scenarios/chat_completion_stream.go (2)
43-68: Retry wrapper integration for basic chat streaming looks correct
StreamingRetryConfig+WithStreamRetryare wired correctly and preserve the originalChatCompletionStreamRequestbehavior while adding retries.- Post‑retry checks (
RequireNoError,responseChannel != nil) are sufficient before entering the read loop.- Timeout handling via
streamCtx/selectis good and prevents hangs.You might optionally switch the remaining
t.Fatal(...)s here (Lines 65, 90, 141, 145, 197, 201, 205) tot.Fatalfwith a ❌ prefix for consistency with other tests, but behavior is already fine.Also applies to: 73-88
242-333: Tool‑call streaming with retries is well‑structured; consider adding a timeout guard
- Good use of
StreamingRetryConfig+WithStreamRetryand detailed error reporting viaGetErrorMessageand ❌‑prefixed messages.- Validation that
response != niland that at least one chunk and at least one tool call were seen is solid.- Logging of tool IDs, names, and arguments will be very helpful when debugging provider behavior.
One gap versus
RunChatCompletionStreamTest: this loop has no per‑test timeout context and relies on the channel eventually closing or the globalgo testtimeout. Mirroring thestreamCtxtimeout pattern (like Lines 73‑146 in this file) would prevent hangs if a provider never closes the stream.tests/core-providers/scenarios/text_completion.go (1)
36-38: Text completion retry wiring and validation look sound; minor optional safety
GetTestRetryConfigForScenario("TextCompletion", ...)→TextCompletionRetryConfig→WithTextCompletionTestRetryis consistent with the shared retry framework.- Expectations are passed into the retry helper so validation happens per‑attempt, which is what you want.
GetTextCompletionContent(response)is only called after checkingbifrostErr != nil, which matchesWithTextCompletionTestRetry’s contract of returning a non‑nil response on success.If you ever loosen that contract, adding a defensive
if response == nil { t.Fatalf(...) }before callingGetTextCompletionContentwould make the test more robust, but it’s not required with the current helper semantics.Also applies to: 51-77
tests/core-providers/scenarios/list_models.go (1)
131-152: Pagination + next‑page retry flow is coherent; double‑check semantics for empty pages
- Pagination test now uses the same retry wrapper for both the first page and the next page, with dedicated contexts and expectations.
- For pagination expectations you explicitly allow
"min_model_count": 0, which is sensible whenpage_sizecan exceed the total model count.- Note that
WithListModelsTestRetryalso retries on emptyDataand treats that as a “❌ empty model list” reason; after retries, the test itself does not fail onlen(response.Data) == 0, so a genuinely empty page can still be considered success (with warnings/logs).If the intent is that an empty next page is fully acceptable (not just logged as a final‑failure reason), you might want to relax the “empty model list” retry condition when
min_model_countis 0 for this scenario, to avoid noisy retries in the perfectly‑fine “no more pages” case.Also applies to: 154-167, 183-224
tests/core-providers/scenarios/tool_calls_streaming.go (3)
17-41: Streaming tool‑call accumulator design is reasonable; watch for duplicate entries
- Chat accumulator uses either tool‑call ID (preferred) or index as the key and correctly merges name/ID/arguments across chunks.
- Responses accumulator’s keying logic (itemID →
"item:"+id"mapping, then migrating tocallIDwhen available) is flexible enough to handle different provider event shapes.- Because temporary keys like
"default"or"name:..."are not always removed when migrating tocallID,GetFinalResponsesToolCallscan return duplicates pointing to the same logical call.The duplicates don’t break validation (they all still must have ID/Name/Args), but if you start asserting on counts, consider deduplicating by
IDinGetFinalResponsesToolCallsto keep metrics and logs cleaner.Also applies to: 43-92, 181-209
218-342: Chat Completions streaming tool‑call test is robust; add timeout like other stream tests
StreamingRetryConfig+WithStreamRetryusage is consistent, and you validate:
- at least one streaming response,
- presence of at least one tool call,
- completeness of each final tool call (ID, name, arguments) via
validateStreamingToolCalls.- Per‑chunk logging (ID/name/args presence + lengths) is excellent for debugging partial or malformed provider events.
As with
RunChatCompletionStreamTest, thisfor range responseChannelloop lacks a timeout and could hang if the stream never closes or stalls. Adopting acontext.WithTimeout+selectpattern here (and in the Responses test below) would improve resilience of the test suite.
350-663: Responses API streaming tool‑call accumulation and validation are thorough
- The test now:
- Uses
WithStreamRetrywith a dedicated context and expectations.- Handles all three relevant event types (
FunctionCallArgumentsDelta,OutputItemAdded,FunctionCallArgumentsDone) and unifies arguments via the accumulator.- Logs rich debug info for each chunk (which fields are present, where they came from, argument lengths, IDs, names, item IDs).
validateStreamingToolCallsenforces that every final tool call has non‑empty ID/name/arguments and attempts to parse arguments as JSON while only warning on parse failures, which is an appropriate compromise for streaming partials.Same as above, this loop relies on channel closure or global test timeout; adding an explicit per‑test timeout would prevent hangs when a provider misbehaves.
Also applies to: 666-714
tests/core-providers/scenarios/response_validation.go (1)
45-53: Extended ToolCallExpectation + validateSingleToolCall look coherent; be aware ofValidateArgsJSONsemantics
- New fields (
ArgumentTypes,ArgumentValues,ValidateArgsJSON) give you fine‑grained control over argument validation.validateSingleToolCallhandles multiple shapes forarguments(map, map[interface{}]interface{}, string, []byte) and only attempts JSON parsing whenValidateArgsJSONis true, which is appropriate.Note that even when
ValidateArgsJSONis false, the later checks forRequiredArgs,ForbiddenArgs,ArgumentTypes, andArgumentValuesstill operate onargs, which will benilunless you’ve manually provided a parsed map. In practice that’s fine as long as you only set those expectations whenValidateArgsJSONis true; if you plan to support non‑JSON argument formats, you may want to short‑circuit those checks whenValidateArgsJSONis false.Also applies to: 1255-1345
tests/core-providers/scenarios/responses_stream.go (3)
224-241:expectationsfor ResponsesStream are built but never usedThe
expectationsvalue created here is not fed into any validation or retry logic, which can be misleading and adds a bit of noise.If these expectations are no longer part of the streaming validation surface, consider removing this block to keep the callback focused and avoid implying additional assertions that do not actually run.
- // Enhanced validation expectations for responses streaming - expectations := GetExpectationsForScenario("ResponsesStream", testConfig, map[string]interface{}{}) - expectations = ModifyExpectationsForProvider(expectations, testConfig.Provider) - expectations.ShouldContainKeywords = append(expectations.ShouldContainKeywords, []string{"paris"}...) // Should include story elements -
774-813: Sequence-number check usesFatalfbut is labeled as a warning and bypasses retry
validateResponsesStreamingStructurecallst.Fatalfwhen sequence numbers regress, but the log string is prefixed with “⚠️ Warning”, and this hard-fails the test rather than surfacing as a validation failure throughWithResponsesStreamValidationRetry.If the intent is to keep this a hard failure, it would be clearer either to adjust the message to indicate an error or to move this check into the validation-result path so the retry wrapper can decide whether to retry or fail.
815-917: Streaming response validation is thorough; consider softening the latency requirement if neededThe
validateResponsesStreamingResponsechecks cover content presence, event types, sequence ordering, provider consistency, and latency on the last chunk. That’s a good level of scrutiny.One potential tuning: the requirement that
ExtraFields.Latency > 0will cause every provider that doesn’t populate latency to fail validation (and thus be retried / eventually fail). If any providers legitimately omit or zero this field, you might want to downgrade this to a logged warning instead of a hard error:- if lastResponse != nil && lastResponse.BifrostResponsesStreamResponse != nil { - if lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency <= 0 { - errors = append(errors, fmt.Sprintf("Last streaming chunk missing latency information (got %d ms)", lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency)) - } else { - t.Logf("✅ Total streaming latency: %d ms", lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency) - } - } + if lastResponse != nil && lastResponse.BifrostResponsesStreamResponse != nil { + if lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency <= 0 { + t.Logf("⚠️ Warning: Last streaming chunk missing latency information (got %d ms)", lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency) + } else { + t.Logf("✅ Total streaming latency: %d ms", lastResponse.BifrostResponsesStreamResponse.ExtraFields.Latency) + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (34)
core/changelog.md(1 hunks)core/providers/bedrock/bedrock.go(2 hunks)core/providers/gemini/utils.go(2 hunks)core/providers/openai/openai.go(1 hunks)framework/changelog.md(1 hunks)framework/streaming/accumulator.go(3 hunks)framework/streaming/responses.go(0 hunks)plugins/mocker/main.go(3 hunks)tests/core-providers/scenarios/chat_completion_stream.go(5 hunks)tests/core-providers/scenarios/complete_end_to_end.go(0 hunks)tests/core-providers/scenarios/embedding.go(1 hunks)tests/core-providers/scenarios/end_to_end_tool_calling.go(0 hunks)tests/core-providers/scenarios/error_parser.go(2 hunks)tests/core-providers/scenarios/image_base64.go(0 hunks)tests/core-providers/scenarios/image_url.go(0 hunks)tests/core-providers/scenarios/list_models.go(3 hunks)tests/core-providers/scenarios/multi_turn_conversation.go(0 hunks)tests/core-providers/scenarios/multiple_images.go(0 hunks)tests/core-providers/scenarios/reasoning.go(0 hunks)tests/core-providers/scenarios/response_validation.go(10 hunks)tests/core-providers/scenarios/responses_stream.go(9 hunks)tests/core-providers/scenarios/speech_synthesis.go(3 hunks)tests/core-providers/scenarios/speech_synthesis_stream.go(2 hunks)tests/core-providers/scenarios/test_retry_framework.go(28 hunks)tests/core-providers/scenarios/text_completion.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(3 hunks)tests/core-providers/scenarios/tool_calls_streaming.go(4 hunks)tests/core-providers/scenarios/transcription.go(7 hunks)tests/core-providers/scenarios/transcription_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(1 hunks)tests/core-providers/scenarios/validation_presets.go(2 hunks)tests/core-providers/tests.go(1 hunks)transports/bifrost-http/lib/config.go(13 hunks)transports/changelog.md(1 hunks)
💤 Files with no reviewable changes (8)
- framework/streaming/responses.go
- tests/core-providers/scenarios/reasoning.go
- tests/core-providers/scenarios/image_base64.go
- tests/core-providers/scenarios/image_url.go
- tests/core-providers/scenarios/end_to_end_tool_calling.go
- tests/core-providers/scenarios/complete_end_to_end.go
- tests/core-providers/scenarios/multi_turn_conversation.go
- tests/core-providers/scenarios/multiple_images.go
✅ Files skipped from review due to trivial changes (1)
- transports/changelog.md
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/core-providers/scenarios/validation_presets.go
- core/providers/openai/openai.go
- tests/core-providers/scenarios/error_parser.go
- framework/streaming/accumulator.go
- plugins/mocker/main.go
🧰 Additional context used
🧬 Code graph analysis (11)
tests/core-providers/scenarios/utils.go (5)
tests/core-providers/scenarios/test_retry_framework.go (5)
DefaultSpeechRetryConfig(860-873)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)tests/core-providers/scenarios/validation_presets.go (1)
SpeechExpectations(148-164)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)core/schemas/bifrost.go (2)
BifrostError(353-362)SpeechRequest(94-94)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/text_completion_stream.go (1)
tests/core-providers/scenarios/test_retry_framework.go (3)
StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)
tests/core-providers/scenarios/text_completion.go (5)
tests/core-providers/scenarios/test_retry_framework.go (3)
TextCompletionRetryConfig(198-205)TextCompletionRetryCondition(134-137)WithTextCompletionTestRetry(1175-1325)plugins/mocker/main.go (1)
Conditions(89-94)core/schemas/textcompletions.go (1)
BifrostTextCompletionResponse(69-77)core/schemas/bifrost.go (2)
BifrostError(353-362)TextCompletionRequest(87-87)tests/core-providers/scenarios/utils.go (2)
GetErrorMessage(588-612)GetTextCompletionContent(341-354)
tests/core-providers/scenarios/embedding.go (4)
tests/core-providers/scenarios/test_retry_framework.go (5)
GetTestRetryConfigForScenario(1118-1150)TestRetryContext(170-175)EmbeddingRetryConfig(238-245)EmbeddingRetryCondition(158-161)WithEmbeddingTestRetry(1503-1653)core/schemas/embedding.go (1)
BifrostEmbeddingResponse(22-28)core/schemas/bifrost.go (1)
BifrostError(353-362)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)
tests/core-providers/scenarios/transcription_stream.go (7)
tests/core-providers/scenarios/test_retry_framework.go (5)
GetTestRetryConfigForScenario(1118-1150)TestRetryContext(170-175)SpeechRetryConfig(218-225)SpeechRetryCondition(146-149)WithSpeechTestRetry(1328-1478)core/schemas/provider.go (1)
Provider(207-234)tests/core-providers/scenarios/validation_presets.go (1)
SpeechExpectations(148-164)core/schemas/speech.go (1)
BifrostSpeechResponse(22-29)core/schemas/bifrost.go (2)
BifrostError(353-362)SpeechRequest(94-94)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)core/utils.go (1)
GetErrorMessage(239-276)
tests/core-providers/scenarios/chat_completion_stream.go (3)
tests/core-providers/scenarios/test_retry_framework.go (3)
StreamingRetryConfig(821-841)TestRetryContext(170-175)WithStreamRetry(582-690)core/schemas/bifrost.go (3)
BifrostStream(318-325)BifrostError(353-362)ChatCompletionStreamRequest(90-90)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)
tests/core-providers/scenarios/response_validation.go (3)
core/schemas/models.go (1)
BifrostListModelsResponse(36-45)core/schemas/bifrost.go (3)
BifrostError(353-362)RequestType(83-83)ListModelsRequest(86-86)tests/core-providers/scenarios/error_parser.go (3)
ParseBifrostError(47-104)FormatErrorConcise(353-356)LogError(359-366)
tests/core-providers/scenarios/list_models.go (7)
tests/core-providers/scenarios/test_retry_framework.go (5)
GetTestRetryConfigForScenario(1118-1150)TestRetryContext(170-175)ListModelsRetryConfig(248-255)ListModelsRetryCondition(164-167)WithListModelsTestRetry(1846-1966)core/schemas/provider.go (1)
Provider(207-234)tests/core-providers/scenarios/response_validation.go (1)
ResponseExpectations(18-43)plugins/mocker/main.go (1)
Conditions(89-94)core/schemas/models.go (2)
BifrostListModelsResponse(36-45)BifrostListModelsRequest(23-34)core/schemas/bifrost.go (2)
BifrostError(353-362)ListModelsRequest(86-86)tests/core-providers/scenarios/utils.go (1)
GetErrorMessage(588-612)
core/providers/gemini/utils.go (1)
core/providers/gemini/types.go (2)
Content(871-879)Part(885-909)
tests/core-providers/scenarios/speech_synthesis_stream.go (3)
tests/core-providers/scenarios/test_retry_framework.go (2)
WithSpeechStreamValidationRetry(1990-2128)SpeechStreamValidationResult(1980-1986)core/schemas/bifrost.go (2)
BifrostStream(318-325)BifrostError(353-362)core/schemas/speech.go (1)
BifrostSpeechStreamResponse(133-138)
transports/bifrost-http/lib/config.go (2)
core/schemas/mcp.go (1)
MCPConfig(11-13)framework/configstore/clientconfig.go (1)
GovernanceConfig(67-74)
🔇 Additional comments (17)
tests/core-providers/tests.go (1)
113-115: LGTM! Clean structured logging format.The bracketed status indicators for the configuration summary are clear and consistent, providing both machine-parseable structure and human readability.
framework/changelog.md (1)
1-2: Changelog entries look consistent and scoped correctlyThe new/updated entries accurately describe the corresponding framework changes and follow the existing bullet style.
core/changelog.md (1)
1-4: Core changelog entries correctly summarize the fixesThe added lines clearly document Bedrock/Gemini/OpenAI streaming fixes and provider test enhancements; no further adjustments needed.
core/providers/bedrock/bedrock.go (1)
909-927: Final Bedrock ResponsesStream latency now correctly reflects total stream durationUsing
time.Since(startTime)only for the lastfinalResponsewhile keeping per-responseLatencybased onlastChunkTimealigns the “last chunk” metrics with overall stream time without disturbing earlier chunk timing. This looks correct and backward-compatible with existing chunk-level semantics.tests/core-providers/scenarios/speech_synthesis.go (2)
85-113: Base speech synthesis test’s retry integration looks correctWrapping the core
SpeechRequestinWithSpeechTestRetrywith aSpeechRetryConfigderived fromGetTestRetryConfigForScenario("SpeechSynthesis", …)plusSpeechExpectationsis consistent with the retry framework and keeps scenario metadata (TestRetryContext) rich for debugging. The error handling and subsequent validations are well-ordered.Also applies to: 116-123
210-243: HD long-text speech synthesis scenario uses the retry framework safelyThe HD variant correctly reuses the retry config pattern, validates
speechResponseandAudiobefore size checks, and logs model mismatches as warnings instead of hard failures. The explicitaudioSizethreshold and logging make this scenario robust and debuggable.tests/core-providers/scenarios/text_completion_stream.go (1)
3-5: Text-completion streaming retry wiring looks solidThe new use of
StreamingRetryConfig+TestRetryContextwithWithStreamRetryfor the varied-prompts and parameterized streaming tests is consistent with the base streaming test and the shared retry framework. Scenario names and metadata are descriptive, and existing read/validation loops remain unchanged.No issues from my side here.
Also applies to: 253-268, 395-412
tests/core-providers/scenarios/chat_completion_stream.go (1)
187-201: Adding “paris” expectation is safe given content checks are now warnings
expectations.ShouldContainAnyOf = append(..., "paris")relies onShouldContainAnyOfnow only generating warnings (not hard failures) invalidateChatContent. This preserves test stability while still surfacing when providers omit the expected city mention.tests/core-providers/scenarios/list_models.go (1)
31-52: ListModels retry + expectations are wired correctly and fix priorexpected_providertyping
- Retry config/context and
WithListModelsTestRetryusage are consistent with the shared retry framework.ResponseExpectationsnow sets"expected_provider": string(testConfig.Provider), which matches the string assertion invalidateListModelsFieldsand addresses the earlier bug where the provider check was effectively disabled.- Post‑retry guards on
bifrostErr,response, andlen(response.Data) == 0clearly surface hard failures after retries.Looks good as a robust, retry‑aware ListModels test.
Also applies to: 53-81
tests/core-providers/scenarios/response_validation.go (3)
384-449: Converting strict content keyword/pattern checks to warnings aligns with non‑deterministic LLM behavior
- For chat, text completion, and Responses content validators, missing
ShouldContainKeywords/ShouldContainAnyOf, presence ofShouldNotContainWords, and pattern mismatches now add warnings instead of failing the test.- Empty content and structural failures are still treated as hard errors.
This strikes a good balance: tests still fail on clearly bad responses but won’t flake when providers phrase things differently than expected.
Also applies to: 570-636, 715-781
276-311: List‑models validation function and helpers are consistent with retry usage; provider check now works
ValidateListModelsResponsefollows the same pattern as other validators: early‑out on error/nil response, then field validation + metrics + logging.validateListModelsFieldsenforces:
- non‑empty
Dataand non‑empty model IDs (for the base ListModels test),- provider match using
expected_provider(string) againstresponse.ExtraFields.Provider,- correct
RequestType == schemas.ListModelsRequest,- non‑negative latency when
ShouldHaveLatency,- optional
min_model_countconstraint.collectListModelsResponseMetricsrecords useful flags/metrics (model_count, next_page_token, provider, request_type, latency presence), which integrate nicely with the broader metrics framework.The earlier type mismatch for
expected_provider(string vsModelProvider) is now resolved at the call sites by passingstring(testConfig.Provider), so this check will actually execute.Also applies to: 1030-1093, 1095-1102
1347-1370: Error logging now guarantees ❌‑prefixed lines, which improves retry diagnostics
logValidationResultsstill logs failures viat.Logf(nott.Errorf), which is important for allowing retries.- Each error line is normalized to include a leading ❌, ensuring consistency even when individual validators omit the prefix.
This should make it much easier to scan CI logs for failures and integrates nicely with the retry framework’s use of ❌‑prefixed reasons.
tests/core-providers/scenarios/responses_stream.go (5)
48-213: ResponsesStream validation wrapper integration looks correctThe use of
WithResponsesStreamValidationRetrywith a per-chunk reader, safety cap (>500 chunks), and timeout-based failure path is coherent and should work well with the new retry framework; nothing blocking here from a correctness perspective.
244-252: Final validation aggregation and error formatting look goodCombining
validationResult.ErrorsandvalidationResult.StreamErrorsand then normalizing the message with a single❌prefix gives clear failure output while preserving individual error messages; no issues here.
307-327: AdoptingWithStreamRetryfor ResponsesStreamWithTools is appropriateRouting the tools streaming scenario through
WithStreamRetrywith a populatedTestRetryContextaligns it with the broader retry framework and should help reduce flakiness without changing the assertion semantics below.
440-460: Reasoning stream now consistently goes through the retry wrapperUsing
WithStreamRetryfor the reasoning stream test (with reasoning-specific flags inExpectedBehavior/TestMetadata) is a clean integration and keeps this scenario consistent with other streaming tests.
565-583: Lifecycle stream test’s migration toWithStreamRetrylooks solidThe lifecycle scenario now benefits from the common streaming retry behavior while preserving the existing, detailed lifecycle assertions below; this is a straightforward and correct application of the retry helper.
78cb6be to
dd8cd0c
Compare
Merge activity
|

Summary
Type of change
Affected areas
How to test
Describe the steps to validate this change. Include commands and expected outcomes.
If adding new configs or environment variables, document them here.
Screenshots/Recordings
If UI changes, add before/after screenshots or short clips.
Breaking changes
If yes, describe impact and migration instructions.
Related issues
Link related issues and discussions. Example: Closes #123
Security considerations
Note any security implications (auth, secrets, PII, sandboxing, etc.).
Checklist
docs/contributing/README.mdand followed the guidelines