Skip to content

test: add async job TTL integration tests#2861

Merged
akshaydeo merged 3 commits intomainfrom
04-20-tests_async_inference_e2e_tests_added
Apr 20, 2026
Merged

test: add async job TTL integration tests#2861
akshaydeo merged 3 commits intomainfrom
04-20-tests_async_inference_e2e_tests_added

Conversation

@Pratham-Mishra04
Copy link
Copy Markdown
Collaborator

Summary

Adds a dedicated end-to-end test suite for Bifrost's async inference feature, covering all /v1/async/* endpoints and the integration route header mechanism (x-bf-async / x-bf-async-id).

Changes

  • Added tests/async/ as a standalone Go module with six test files:
    • submit_test.go — verifies all 11 async endpoints return 202 with a well-formed job envelope and that immediate polling returns the correct status code
    • lifecycle_test.go — polls jobs to terminal state, validates response shape for completed jobs, confirms 404 for non-existent jobs and wrong-endpoint-type polling
    • auth_test.go — covers VK scoping (same key succeeds, different key returns 404, missing key on VK-scoped job returns 404, public jobs are accessible without a VK), and all three auth header formats (Authorization: Bearer, x-api-key, x-goog-api-key)
    • ttl_test.go — validates default TTL (~3600s), custom TTL via x-bf-async-job-result-ttl, fallback to default for invalid/zero/negative values, and 404 after TTL expiry
    • validation_test.go — confirms rejection of stream=true, stream_format=sse on speech, malformed JSON, missing required fields, missing multipart files, and wrong HTTP method
    • integration_route_test.go — tests x-bf-async submit and x-bf-async-id retrieve on the /openai/v1/responses integration route, including stream rejection and VK isolation
  • Added fixtures_test.go with endpoint fixtures for all 11 async endpoints, model overrides via ASYNC_*_MODEL environment variables, and generated test media (sample PNG, sample MP3 reference)
  • Added helpers_test.go with shared HTTP helpers, config sourced from environment variables, and a TestMain health-check gate that skips the suite when the gateway is unreachable
  • Added README.md documenting how to run the tests, all environment variables, model override keys, and per-file coverage descriptions
  • Tests run in two modes — global (no VK) and with_vk — wherever VK presence is not the focus of the test

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (React)
  • Docs

How to test

# From the repo root — requires a running Bifrost gateway on localhost:8080
cd tests/async
go test ./... -timeout 300s

# With virtual keys to enable VK-scoped auth and cross-VK isolation tests
BIFROST_VK=sk-bf-... BIFROST_ALT_VK=sk-bf-... go test ./... -timeout 300s

# Override the gateway URL or individual models
BIFROST_BASE_URL=http://my-gateway:8080 \
ASYNC_CHAT_COMPLETION_MODEL=openai/gpt-4o \
go test ./... -timeout 300s

Environment variables:

Variable Default Description
BIFROST_BASE_URL http://localhost:8080 Bifrost gateway URL
BIFROST_VK Primary virtual key; enables VK-mode tests
BIFROST_ALT_VK Second virtual key; enables cross-VK isolation tests
BIFROST_POLL_TIMEOUT 30s Max time to wait for a job to reach terminal state
BIFROST_POLL_INTERVAL 500ms Polling cadence
BIFROST_INTEGRATION_PATH /openai/v1/responses Override integration route path
BIFROST_INTEGRATION_MODEL openai/gpt-4o-mini Override model for integration route tests
ASYNC_*_MODEL see README Per-endpoint model overrides

If the gateway is unreachable at startup, the entire suite exits cleanly with a skip message rather than failing.

Screenshots/Recordings

N/A

Breaking changes

  • No

Related issues

N/A

Security considerations

Virtual keys passed via BIFROST_VK and BIFROST_ALT_VK are read from environment variables and never logged or written to disk. Cross-VK isolation tests confirm that job IDs cannot be accessed across VK boundaries.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Documentation

    • Added an end-to-end async testing guide with configuration, run modes (VK/global), per-endpoint notes, and gateway health/skip behavior.
  • Tests

    • Added comprehensive async e2e suite and helpers covering submit/poll, lifecycle, integration-route async behavior, VK auth/isolation, TTL/expiry, validation (JSON/multipart/stream/model), multipart media cases, and result-shape assertions.
  • Bug Fixes

    • Async create/retrieve now return clearer, explicit HTTP status codes and improved internal error classification.

Walkthrough

Adds a comprehensive end-to-end async inference test suite (README, go.mod, helpers, fixtures, and many test files) covering submission, polling, VK auth scoping, integration-route async flow, TTL/expiry, validation, and lifecycle across endpoints; and adjusts router/logstore error paths to distinguish internal vs not-found failures.

Changes

Cohort / File(s) Summary
Documentation & Module
tests/async/README.md, tests/async/go.mod
New README describing running async E2E tests and env vars; new module file for the test directory (go 1.26.2).
Shared Helpers & Fixtures
tests/async/helpers_test.go, tests/async/fixtures_test.go
New test harness and fixtures: env/config parsing, TestMain health check, AsyncJobResponse type and helpers, HTTP submit/poll/multipart helpers, endpoint case factory, model overrides, and sample audio/PNG helpers.
Auth / VK Scoping Tests
tests/async/auth_test.go
Adds VK authorization matrix tests for submit/poll across header forms (Authorization, x-api-key, x-goog-api-key) covering present/absent and matching/mismatching VK combinations.
Integration-Route Async Tests
tests/async/integration_route_test.go
Tests integration-route async behavior via x-bf-async / x-bf-async-id on the Responses path: create/poll flows, non-existent job 4xx, stream rejection, and VK isolation checks.
Lifecycle & Submit Tests
tests/async/lifecycle_test.go, tests/async/submit_test.go
Covers submission (202/pending envelope), poll-to-terminal across endpoints/modes, non-existent/wrong-endpoint 404s, and per-endpoint result-shape validations.
TTL & Expiry Tests
tests/async/ttl_test.go
Validates result TTL semantics (default, custom header, invalid/zero/negative fallbacks) and expiry-to-404 behavior after TTL.
Validation Tests
tests/async/validation_test.go
Extensive request validation: reject stream:true (JSON and multipart), malformed JSON, missing multipart parts, missing model, and wrong-method handling for poll endpoints.
Router Error Handling & Helper
transports/bifrost-http/integrations/router.go, transports/bifrost-http/integrations/utils.go
Router now returns explicit HTTP status codes for streaming-create (400) and job-retrieval errors (500 for internal, 404 for not found) and adds newBifrostErrorWithCode helper.
Logstore Error Sentinel
framework/logstore/errors.go, framework/logstore/asyncjob.go
Introduces exported ErrJobInternal sentinel and updates RetrieveJob to wrap internal store errors with ErrJobInternal (preserves ErrNotFound behavior), enabling router to map to 500 vs 404.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Router as Router
    participant Exec as AsyncExecutor
    participant Store as LogStore

    Client->>Router: GET /v1/async/.../{id}
    Router->>Exec: RetrieveJob(id)
    Exec->>Store: FindAsyncJobByID(id)
    alt Store returns job
        Store-->>Exec: job
        Exec-->>Router: job (nil error)
        Router-->>Client: 200 OK + job
    else Store returns ErrNotFound
        Store-->>Exec: ErrNotFound
        Exec-->>Router: ErrNotFound
        Router-->>Client: 404 Not Found
    else Store returns internal error
        Store-->>Exec: internal error
        Exec-->>Router: wraps as ErrJobInternal
        Router-->>Client: 500 Internal Server Error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through tests with eager paws,
Submitting jobs and checking laws.
VKs guarded, TTLs timed just right,
Polling till day turns into night.
Async bunnies celebrate the light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The PR title is overly narrow, focusing only on TTL tests when the changeset adds a comprehensive async job test suite covering 11 endpoints, auth/VK scoping, validation, lifecycle, integration routes, and more. Clarify the title to reflect the scope—e.g., 'test: add comprehensive async job e2e test suite' or 'test: add async inference end-to-end tests'—to accurately represent all 11 async endpoints and test categories added.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description follows the template structure, clearly documents all six test files, environment variables, and running instructions with concrete examples.
Docstring Coverage ✅ Passed Docstring coverage is 95.52% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 04-20-tests_async_inference_e2e_tests_added

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Confidence Score: 5/5

Safe to merge; all findings are minor style/consistency suggestions with no blocking issues.

Both production code changes are minimal and correct. The test suite is comprehensive and prior review feedback has been fully addressed (UUID segment-length validation, shortTTL increased to 10s, relative path replacing runtime.Caller, pollIntegration sending {}). Only two P2 suggestions remain.

transports/bifrost-http/integrations/utils.go line 420 (zero-TTL guard inconsistency)

Important Files Changed

Filename Overview
tests/async/helpers_test.go Core HTTP helpers and TestMain health-check gate; well-structured with good config sourcing from env vars
tests/async/fixtures_test.go Endpoint fixtures for all 11 async endpoints; sampleAudio uses relative path (fixing the prior runtime.Caller issue), samplePNG generates in-memory image
tests/async/ttl_test.go TTL test suite covering default, custom, invalid/zero/negative, and expiry; shortTTL raised to 10s addressing the prior race concern
tests/async/auth_test.go Comprehensive VK isolation and multi-auth-header tests; all VK-dependent tests guard with cfg.VK checks
tests/async/submit_test.go Submit tests for all 11 endpoints with full UUID segment-length validation and immediate poll status checks
tests/async/lifecycle_test.go Lifecycle tests including terminal state, non-existent job 404, result shape checks, and wrong-endpoint-type 404
tests/async/validation_test.go Validation tests for stream rejection, malformed JSON, missing required fields, and wrong HTTP method across all endpoint types
tests/async/integration_route_test.go Integration route tests for x-bf-async / x-bf-async-id headers with UUID validation and VK isolation; pollIntegration correctly sends {} body
framework/logstore/asyncjob.go AsyncJobExecutor and cleaner logic; well-guarded with resultTTL <= 0 defaulting and panic recovery in background goroutine
framework/logstore/errors.go Simple sentinel errors for async job store; no issues
transports/bifrost-http/integrations/utils.go getResultTTLFromHeaderWithDefault uses < 0 guard instead of <= 0, inconsistent with SubmitJob's own guard; functionally correct today but a maintenance hazard

Reviews (5): Last reviewed commit: "tests: async inference e2e tests added" | Re-trigger Greptile

Comment thread tests/async/ttl_test.go
Comment thread tests/async/fixtures_test.go Outdated
Comment thread tests/async/submit_test.go
Comment thread tests/async/integration_route_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (3)
tests/async/fixtures_test.go (1)

119-121: Cache generated/read fixture bytes to reduce repeated work across tests.

sampleAudio() and samplePNG() are re-executed every endpointCases() call; caching once per process will speed up and stabilize heavy test runs.

♻️ Suggested refactor
 import (
 	"bytes"
 	"image"
 	"image/color"
 	"image/png"
 	"os"
 	"path/filepath"
 	"runtime"
+	"sync"
 )
+
+var (
+	sampleAudioOnce sync.Once
+	sampleAudioData []byte
+	sampleAudioErr  error
+
+	samplePNGOnce sync.Once
+	samplePNGData []byte
+	samplePNGErr  error
+)

 func sampleAudio() []byte {
-	_, thisFile, _, _ := runtime.Caller(0)
-	mediaPath := filepath.Join(filepath.Dir(thisFile), "..", "..", "core", "internal", "llmtests", "scenarios", "media", "sample.mp3")
-	data, err := os.ReadFile(mediaPath)
-	if err != nil {
-		panic("sampleAudio: cannot read " + mediaPath + ": " + err.Error())
-	}
-	return data
+	sampleAudioOnce.Do(func() {
+		_, thisFile, _, _ := runtime.Caller(0)
+		mediaPath := filepath.Join(filepath.Dir(thisFile), "..", "..", "core", "internal", "llmtests", "scenarios", "media", "sample.mp3")
+		sampleAudioData, sampleAudioErr = os.ReadFile(mediaPath)
+	})
+	if sampleAudioErr != nil {
+		panic("sampleAudio: " + sampleAudioErr.Error())
+	}
+	return sampleAudioData
 }

 func samplePNG() []byte {
-	img := image.NewRGBA(image.Rect(0, 0, 256, 256))
-	white := color.RGBA{R: 255, G: 255, B: 255, A: 255}
-	for y := range 256 {
-		for x := range 256 {
-			img.Set(x, y, white)
-		}
-	}
-	var buf bytes.Buffer
-	if err := png.Encode(&buf, img); err != nil {
-		panic("samplePNG: encode failed: " + err.Error())
-	}
-	return buf.Bytes()
+	samplePNGOnce.Do(func() {
+		img := image.NewRGBA(image.Rect(0, 0, 256, 256))
+		white := color.RGBA{R: 255, G: 255, B: 255, A: 255}
+		for y := range 256 {
+			for x := range 256 {
+				img.Set(x, y, white)
+			}
+		}
+		var buf bytes.Buffer
+		samplePNGErr = png.Encode(&buf, img)
+		samplePNGData = buf.Bytes()
+	})
+	if samplePNGErr != nil {
+		panic("samplePNG: encode failed: " + samplePNGErr.Error())
+	}
+	return samplePNGData
 }

Also applies to: 146-147, 161-162, 196-221

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/async/fixtures_test.go` around lines 119 - 121, The tests repeatedly
call sampleAudio() and samplePNG() within endpointCases(), causing redundant
work; change sampleAudio() and samplePNG() to populate and return cached byte
slices (e.g., package-level vars like cachedSampleAudio, cachedSamplePNG) using
sync.Once or an init helper so the heavy decoding/read happens once per process,
then update endpointCases() to use these cached variables instead of calling the
generator functions repeatedly; ensure function names sampleAudio, samplePNG and
endpointCases are updated accordingly and keep thread-safety with sync.Once if
tests run in parallel.
tests/async/lifecycle_test.go (1)

43-55: Add a default assertion for unexpected status values.

Right now, a non-completed/failed status can slip through without an explicit failure in this switch.

Suggested patch
 					switch job.Status {
 					case "completed":
 						if len(job.Result) == 0 || string(job.Result) == "null" {
 							t.Error("completed job must have a non-null result")
 						}
 					case "failed":
 						if len(job.Error) == 0 || string(job.Error) == "null" {
 							t.Error("failed job must have a non-null error")
 						}
 						if job.StatusCode == 0 {
 							t.Error("failed job must carry a non-zero status_code")
 						}
+					default:
+						t.Errorf("unexpected terminal status %q", job.Status)
 					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/async/lifecycle_test.go` around lines 43 - 55, The switch handling
job.Status currently only checks "completed" and "failed" and can silently
ignore unexpected statuses; add a default case in the switch that calls t.Errorf
(or t.Fatalf) to fail the test for any unrecognized job.Status and include the
unexpected job.Status (and optionally job.ID) in the error message so tests fail
loudly when new/invalid statuses appear.
tests/async/validation_test.go (1)

188-188: Avoid live external URL in missing-model validation input.

Line 188 introduces an unnecessary external dependency for a test that only asserts 400 on missing model.

Suggested patch
-					"image_url": envOr("ASYNC_OCR_IMAGE_URL", "https://pestworldcdn-dcf2a8gbggazaghf.z01.azurefd.net/media/561791/carpenter-ant4.jpg"),
+					"image_url": envOr("ASYNC_OCR_IMAGE_URL", "https://example.com/test-image.jpg"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/async/validation_test.go` at line 188, The test injects a live external
URL via envOr for the "image_url" field which creates an unnecessary external
dependency; change the value in tests/async/validation_test.go to a stable local
or neutral placeholder (e.g., a local test asset path like
"file://testdata/fixture.jpg" or a non-network placeholder like
"https://example.com/image.jpg") instead of envOr("ASYNC_OCR_IMAGE_URL", "...");
update the JSON input where "image_url" is set (refer to the "image_url" key and
the envOr call) so the missing-model validation still triggers a 400 without
making network requests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/async/auth_test.go`:
- Around line 39-41: The current test polling checks (e.g., the conditional that
tests `if code == http.StatusNotFound`) treat any non-404 response as success
and miss error statuses like 500; create a small test helper named
`assertPollSuccess(t *testing.T, code int, body string)` and replace each `if
code == http.StatusNotFound` (and the similar checks at the other locations)
with calls to `assertPollSuccess` that assert the status code is one of the
expected success codes (http.StatusOK or http.StatusAccepted) and fail
otherwise, including the response body in the error message so regressions are
caught reliably.

In `@tests/async/fixtures_test.go`:
- Around line 185-188: The default OCR test input currently uses an external CDN
URL (the "image_url" value in the "document" map) which causes flaky tests;
change the envOr fallback to point to a local bundled test fixture instead of
the third‑party URL. Replace the string default in the envOr call used for
"image_url" with a path or file:// URL to a repository test asset (or a data URI
/ embedded fixture) and ensure the test harness loads local files accordingly;
update the usage around the "document" map and the envOr helper so tests no
longer rely on the external CDN.

In `@tests/async/helpers_test.go`:
- Around line 99-100: The code currently discards the error returned by
http.NewRequest when building requests (e.g., the call that assigns to req, _),
which masks problems like malformed cfg.BaseURL; change these to capture the
error (req, err := http.NewRequest(...)) and immediately handle it (e.g., fail
the test with t.Fatalf("http.NewRequest failed: %v", err) or return the error)
before calling req.Header.Set; apply the same fix to the other instances
mentioned (the calls around lines creating requests at the other sites
referenced) so all http.NewRequest uses (variables req and err, using path, raw,
cfg.BaseURL) check and handle err instead of discarding it.
- Around line 224-229: Replace uses of http.DefaultClient.Do and http.Get with
an explicit http.Client that sets a reasonable Timeout (e.g., create client :=
&http.Client{Timeout: <duration>}) and call client.Do(req) so requests can’t
hang; also ensure every response body is closed after use (close resp.Body in
TestMain’s health-check and any helper that calls io.ReadAll on resp.Body)
rather than leaking it—look for the resp variable and the TestMain and helper
functions in tests/async/helpers_test.go to apply these changes.

In `@tests/async/integration_route_test.go`:
- Around line 124-126: Replace the loose "if code < 400" checks with a focused
helper that asserts the response is a 4xx client error: add a test helper
function assert4xx(t *testing.T, code int, body string, context string) that
calls t.Helper() and fails unless 400 <= code && code < 500, including the
response body in the failure message, then update the three places currently
using "if code < 400" (the checks around lines where the integration route
non-existent-job assertions occur) to call assert4xx(t, code, body,
"non-existent job on integration route") (or an appropriate context string) so
tests fail on 5xx responses instead of treating them as acceptable.

In `@tests/async/README.md`:
- Line 45: The table row for ASYNC_OCR_IMAGE_URL has three pipe-separated cells
but the model-overrides table expects two columns; update the row in
tests/async/README.md by combining the current second and third cells into a
single description cell (or remove the extra cell) so the row has exactly two
columns—identify and edit the row containing the `ASYNC_OCR_IMAGE_URL` entry to
correct the pipe count and ensure the description includes the CDN URL and
purpose in one cell.

In `@tests/async/submit_test.go`:
- Around line 60-63: The test should assert the submit response status is 202
before proceeding to polling: after calling submitCase(t, ec, mode.headers) and
before checking submitted.ID, add a check that submitted indicates HTTP 202 (use
submitted.StatusCode == http.StatusAccepted or submitted.Status == "202
Accepted"/"202" depending on the response type) and call t.Fatalf with a clear
message if it is not 202; keep the existing ID check afterwards. This targets
the submitCase result named submitted to fail fast on submit errors.

In `@tests/async/ttl_test.go`:
- Around line 121-127: Replace the fixed Sleep with deadline-based polling: loop
until a deadline (e.g., start := time.Now(); timeout :=
time.Duration(shortTTL+10)*time.Second or a fixed extra buffer) and repeatedly
call pollOnce(t, pollPath, mode.headers) at short intervals (e.g., 100-200ms)
until it returns http.StatusNotFound or the deadline passes; if the deadline
passes without a 404, fail the test with t.Fatalf/t.Errorf. Use the existing
symbols pollOnce, pollPath, mode.headers, shortTTL and http.StatusNotFound to
implement the timeout-based poll and remove the hard sleep.

In `@tests/async/validation_test.go`:
- Around line 294-296: The test currently ignores the error from http.NewRequest
and calls http.DefaultClient.Do with no timeout, risking CI hangs; update the
request to use a context with timeout (context.WithTimeout) and either
http.NewRequestWithContext or attach the context to the request, check and
handle the error returned by http.NewRequest, defer cancel the context, and use
the context-aware request when calling the HTTP client (or set a client timeout)
so the test fails fast on transport stalls; reference the existing
http.NewRequest/http.DefaultClient.Do calls and cfg.BaseURL in your changes.

---

Nitpick comments:
In `@tests/async/fixtures_test.go`:
- Around line 119-121: The tests repeatedly call sampleAudio() and samplePNG()
within endpointCases(), causing redundant work; change sampleAudio() and
samplePNG() to populate and return cached byte slices (e.g., package-level vars
like cachedSampleAudio, cachedSamplePNG) using sync.Once or an init helper so
the heavy decoding/read happens once per process, then update endpointCases() to
use these cached variables instead of calling the generator functions
repeatedly; ensure function names sampleAudio, samplePNG and endpointCases are
updated accordingly and keep thread-safety with sync.Once if tests run in
parallel.

In `@tests/async/lifecycle_test.go`:
- Around line 43-55: The switch handling job.Status currently only checks
"completed" and "failed" and can silently ignore unexpected statuses; add a
default case in the switch that calls t.Errorf (or t.Fatalf) to fail the test
for any unrecognized job.Status and include the unexpected job.Status (and
optionally job.ID) in the error message so tests fail loudly when new/invalid
statuses appear.

In `@tests/async/validation_test.go`:
- Line 188: The test injects a live external URL via envOr for the "image_url"
field which creates an unnecessary external dependency; change the value in
tests/async/validation_test.go to a stable local or neutral placeholder (e.g., a
local test asset path like "file://testdata/fixture.jpg" or a non-network
placeholder like "https://example.com/image.jpg") instead of
envOr("ASYNC_OCR_IMAGE_URL", "..."); update the JSON input where "image_url" is
set (refer to the "image_url" key and the envOr call) so the missing-model
validation still triggers a 400 without making network requests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0aa133d-eb34-4cee-9e37-6077869acc08

📥 Commits

Reviewing files that changed from the base of the PR and between 7f7228a and f311b40.

📒 Files selected for processing (10)
  • tests/async/README.md
  • tests/async/auth_test.go
  • tests/async/fixtures_test.go
  • tests/async/go.mod
  • tests/async/helpers_test.go
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/validation_test.go

Comment thread tests/async/auth_test.go Outdated
Comment thread tests/async/fixtures_test.go
Comment thread tests/async/helpers_test.go Outdated
Comment thread tests/async/helpers_test.go Outdated
Comment thread tests/async/integration_route_test.go Outdated
Comment thread tests/async/README.md Outdated
Comment thread tests/async/submit_test.go Outdated
Comment thread tests/async/ttl_test.go Outdated
Comment thread tests/async/validation_test.go Outdated
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 04-20-fix_ocr_logging_fixes branch from 7f7228a to 75d28ed Compare April 20, 2026 14:05
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 04-20-tests_async_inference_e2e_tests_added branch 2 times, most recently from f311b40 to 8b96adb Compare April 20, 2026 14:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/async/helpers_test.go (1)

46-55: Treat non-positive poll durations as invalid env input.

0 or negative values can cause immediate timeout or aggressive polling loops. Falling back to defaults improves resilience for misconfigured CI env vars.

♻️ Proposed hardening
 func parseDuration(s string, fallback time.Duration) time.Duration {
 	if s == "" {
 		return fallback
 	}
 	d, err := time.ParseDuration(s)
-	if err != nil {
+	if err != nil || d <= 0 {
 		return fallback
 	}
 	return d
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/async/helpers_test.go` around lines 46 - 55, The parseDuration function
currently accepts zero or negative durations as valid; change its validation so
that parseDuration(s string, fallback time.Duration) treats parsed durations <=
0 as invalid and returns fallback instead. Update the logic in parseDuration to
parse the string with time.ParseDuration, then if err != nil OR d <= 0 return
fallback; otherwise return d, referencing the parseDuration function to locate
the change and keep the existing string-empty guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/async/validation_test.go`:
- Around line 303-305: The test currently treats any non-404/405 as passing by
only failing on 200/202, which masks server errors; change the assertion in
tests/async/validation_test.go around the POST poll path check to explicitly
assert that resp.StatusCode is either http.StatusNotFound (404) or
http.StatusMethodNotAllowed (405) and fail the test for any other status,
updating the t.Errorf message to report the unexpected status code (use
resp.StatusCode) so only the expected router outcomes pass; locate the check
using resp.StatusCode and t.Errorf to make the change.

---

Nitpick comments:
In `@tests/async/helpers_test.go`:
- Around line 46-55: The parseDuration function currently accepts zero or
negative durations as valid; change its validation so that parseDuration(s
string, fallback time.Duration) treats parsed durations <= 0 as invalid and
returns fallback instead. Update the logic in parseDuration to parse the string
with time.ParseDuration, then if err != nil OR d <= 0 return fallback; otherwise
return d, referencing the parseDuration function to locate the change and keep
the existing string-empty guard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a50cb4c4-10c5-418f-87fe-37b28acfe523

📥 Commits

Reviewing files that changed from the base of the PR and between f311b40 and 8b96adb.

📒 Files selected for processing (10)
  • tests/async/README.md
  • tests/async/auth_test.go
  • tests/async/fixtures_test.go
  • tests/async/go.mod
  • tests/async/helpers_test.go
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/validation_test.go
✅ Files skipped from review due to trivial changes (4)
  • tests/async/README.md
  • tests/async/go.mod
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/async/fixtures_test.go
  • tests/async/submit_test.go
  • tests/async/auth_test.go
  • tests/async/ttl_test.go

Comment thread tests/async/validation_test.go Outdated
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 04-20-tests_async_inference_e2e_tests_added branch from 8b96adb to b24af34 Compare April 20, 2026 14:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/async/helpers_test.go`:
- Line 147: The multipart writer Close() error is ignored at the end of the
test; capture and handle it to avoid silent malformed bodies by checking the
return value of w.Close() and failing the test on error (e.g. replace the bare
w.Close() with if err := w.Close(); err != nil { t.Fatalf("closing multipart
writer: %v", err) }). Ensure you reference the multipart writer variable w and
the test handler t so any failure aborts the test.
- Line 244: The test currently ignores errors from io.ReadAll(resp.Body) which
can hide transport/truncation issues; update the read to capture the error
(body, err := io.ReadAll(resp.Body)) and fail fast on error (e.g. if err != nil
{ t.Fatalf("read response body: %v", err) } or use your test helper/assertion
library) so any body-read failure in the test using resp is reported
immediately; locate the read usage of io.ReadAll(resp.Body) in
tests/async/helpers_test.go to apply this change.

In `@transports/bifrost-http/integrations/router.go`:
- Around line 1539-1543: The current handler always maps any error from
executor.RetrieveJob to 404; change the logic in the router where
executor.RetrieveJob(bifrostCtx, jobID, vkValue, config.GetHTTPRequestType(ctx))
is called so it examines the returned error and only creates a NotFound bifrost
error when the error indicates "job not found or expired" (or matches the
executor's not-found sentinel/message); for any other error produce an internal
server error (use newBifrostErrorWithCode(..., "failed to retrieve async job",
fasthttp.StatusInternalServerError)) and pass that to g.sendError instead.
Ensure you reference the same symbols: executor.RetrieveJob, g.sendError,
newBifrostErrorWithCode.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bffb444d-5002-4f0d-a0c4-5255819c7eb0

📥 Commits

Reviewing files that changed from the base of the PR and between 8b96adb and b24af34.

📒 Files selected for processing (12)
  • tests/async/README.md
  • tests/async/auth_test.go
  • tests/async/fixtures_test.go
  • tests/async/go.mod
  • tests/async/helpers_test.go
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/validation_test.go
  • transports/bifrost-http/integrations/router.go
  • transports/bifrost-http/integrations/utils.go
✅ Files skipped from review due to trivial changes (4)
  • tests/async/go.mod
  • tests/async/ttl_test.go
  • tests/async/README.md
  • tests/async/auth_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/async/submit_test.go
  • tests/async/fixtures_test.go
  • tests/async/lifecycle_test.go
  • tests/async/integration_route_test.go

Comment thread tests/async/helpers_test.go Outdated
Comment thread tests/async/helpers_test.go Outdated
Comment thread transports/bifrost-http/integrations/router.go
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 04-20-tests_async_inference_e2e_tests_added branch from b24af34 to f4e589d Compare April 20, 2026 15:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/async/fixtures_test.go (1)

184-186: ⚠️ Potential issue | 🟠 Major

Avoid a third-party URL as the default OCR fixture.

This makes the async suite flaky for reasons unrelated to Bifrost: CDN outages, network policy differences, and remote content drift. Please switch the default to a bundled/local asset or require ASYNC_OCR_IMAGE_URL explicitly. The same fallback is duplicated in tests/async/validation_test.go, so centralizing it in one helper would remove both copies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/async/fixtures_test.go` around lines 184 - 186, The default OCR fixture
uses an external CDN URL via envOr("ASYNC_OCR_IMAGE_URL", "...") which makes
tests flaky; replace the fallback with a bundled/local test asset (e.g., a file
under tests/testdata or fixtures) or require ASYNC_OCR_IMAGE_URL to be set (no
remote default), and remove the duplicate fallback in
tests/async/validation_test.go by centralizing the logic into a single helper
(e.g., create a new helper function GetAsyncOCRImageURL or Update envOr to
accept a sentinel to force missing) and update fixtures_test.go and
validation_test.go to call that helper instead of inlining the same envOr(...)
default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@transports/bifrost-http/integrations/router.go`:
- Around line 1542-1547: The 404 branch is leaking the original error (which may
reveal VK/type mismatch details) by passing err into newBifrostErrorWithCode;
update the else branch handling the RetrieveJob result so it does not forward
the original err to sendError/newBifrostErrorWithCode — instead pass a sanitized
error (e.g., a freshly created error with the message "job not found or expired"
or nil) so callers only see the generic 404 message; modify the else branch
where sendError is called (involving RetrieveJob, sendError,
newBifrostErrorWithCode, and logstore.ErrJobInternal) to use the masked error.

---

Duplicate comments:
In `@tests/async/fixtures_test.go`:
- Around line 184-186: The default OCR fixture uses an external CDN URL via
envOr("ASYNC_OCR_IMAGE_URL", "...") which makes tests flaky; replace the
fallback with a bundled/local test asset (e.g., a file under tests/testdata or
fixtures) or require ASYNC_OCR_IMAGE_URL to be set (no remote default), and
remove the duplicate fallback in tests/async/validation_test.go by centralizing
the logic into a single helper (e.g., create a new helper function
GetAsyncOCRImageURL or Update envOr to accept a sentinel to force missing) and
update fixtures_test.go and validation_test.go to call that helper instead of
inlining the same envOr(...) default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 742ae991-5b0d-4730-88ab-0821fbaf48c8

📥 Commits

Reviewing files that changed from the base of the PR and between b24af34 and f4e589d.

📒 Files selected for processing (14)
  • framework/logstore/asyncjob.go
  • framework/logstore/errors.go
  • tests/async/README.md
  • tests/async/auth_test.go
  • tests/async/fixtures_test.go
  • tests/async/go.mod
  • tests/async/helpers_test.go
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/validation_test.go
  • transports/bifrost-http/integrations/router.go
  • transports/bifrost-http/integrations/utils.go
✅ Files skipped from review due to trivial changes (6)
  • transports/bifrost-http/integrations/utils.go
  • tests/async/go.mod
  • framework/logstore/errors.go
  • tests/async/README.md
  • tests/async/submit_test.go
  • tests/async/auth_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/async/lifecycle_test.go
  • tests/async/ttl_test.go

Comment thread transports/bifrost-http/integrations/router.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
framework/logstore/asyncjob.go (1)

57-64: ⚠️ Potential issue | 🟠 Major

Propagate ErrJobInternal to the regular async poll handler too.

This new wrapper only fixes callers that explicitly check errors.Is(err, logstore.ErrJobInternal). transports/bifrost-http/handlers/asyncinference.go:508-512 still turns every RetrieveJob error into a 404 and sends err.Error() to the client, so store failures on the regular /v1/async/*/{id} path remain misclassified and now expose the wrapped backend message.

Suggested follow-up in the async handler
+import "errors"
+
 job, err := h.executor.RetrieveJob(bifrostCtx, jobID, getVirtualKeyFromContext(bifrostCtx), operationType)
 if err != nil {
-	SendError(ctx, fasthttp.StatusNotFound, err.Error())
+	if errors.Is(err, logstore.ErrJobInternal) {
+		SendError(ctx, fasthttp.StatusInternalServerError, "failed to retrieve async job")
+	} else {
+		SendError(ctx, fasthttp.StatusNotFound, "job not found or expired")
+	}
 	return
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@framework/logstore/asyncjob.go` around lines 57 - 64, RetrieveJob currently
wraps store errors as ErrJobInternal but the async HTTP handler that calls
RetrieveJob still maps every error to 404 and returns err.Error() to clients;
update that handler to check errors.Is(err, logstore.ErrJobInternal) (or
ErrJobInternal) and return a 500/appropriate server error instead of 404, and
avoid echoing the wrapped backend error to the client (return a generic
"internal server error retrieving job" message). Concretely: in the async
inference HTTP handler where RetrieveJob is called, replace the unconditional
404 branch with an errors.Is check for ErrJobInternal to produce a 500, keep the
not-found branch (errors.Is(err, ErrNotFound)) as 404, and change the response
body to a safe, non-sensitive error string rather than err.Error().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/async/helpers_test.go`:
- Around line 46-55: The parseDuration helper currently returns parsed durations
even if they are zero or negative; update parseDuration(s string, fallback
time.Duration) so that after calling time.ParseDuration it also checks if d <= 0
and returns fallback in that case (in addition to returning fallback for empty
string and parse errors), ensuring parseDuration always returns a positive
duration for use by pollUntilTerminal and timeout logic.

---

Outside diff comments:
In `@framework/logstore/asyncjob.go`:
- Around line 57-64: RetrieveJob currently wraps store errors as ErrJobInternal
but the async HTTP handler that calls RetrieveJob still maps every error to 404
and returns err.Error() to clients; update that handler to check errors.Is(err,
logstore.ErrJobInternal) (or ErrJobInternal) and return a 500/appropriate server
error instead of 404, and avoid echoing the wrapped backend error to the client
(return a generic "internal server error retrieving job" message). Concretely:
in the async inference HTTP handler where RetrieveJob is called, replace the
unconditional 404 branch with an errors.Is check for ErrJobInternal to produce a
500, keep the not-found branch (errors.Is(err, ErrNotFound)) as 404, and change
the response body to a safe, non-sensitive error string rather than err.Error().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97b45962-1269-4460-a87b-dbb81b9901fb

📥 Commits

Reviewing files that changed from the base of the PR and between f4e589d and 6d6ca5e.

📒 Files selected for processing (14)
  • framework/logstore/asyncjob.go
  • framework/logstore/errors.go
  • tests/async/README.md
  • tests/async/auth_test.go
  • tests/async/fixtures_test.go
  • tests/async/go.mod
  • tests/async/helpers_test.go
  • tests/async/integration_route_test.go
  • tests/async/lifecycle_test.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/validation_test.go
  • transports/bifrost-http/integrations/router.go
  • transports/bifrost-http/integrations/utils.go
✅ Files skipped from review due to trivial changes (4)
  • transports/bifrost-http/integrations/utils.go
  • tests/async/go.mod
  • tests/async/README.md
  • tests/async/fixtures_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • framework/logstore/errors.go
  • transports/bifrost-http/integrations/router.go
  • tests/async/submit_test.go
  • tests/async/ttl_test.go
  • tests/async/lifecycle_test.go

Comment thread tests/async/helpers_test.go
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 20, 2026
Copy link
Copy Markdown
Contributor

akshaydeo commented Apr 20, 2026

Merge activity

  • Apr 20, 6:20 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 20, 6:23 PM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo changed the base branch from 04-20-fix_ocr_logging_fixes to graphite-base/2861 April 20, 2026 18:22
@akshaydeo akshaydeo changed the base branch from graphite-base/2861 to main April 20, 2026 18:23
@akshaydeo akshaydeo dismissed coderabbitai[bot]’s stale review April 20, 2026 18:23

The base branch was changed.

@akshaydeo akshaydeo merged commit 8e6f652 into main Apr 20, 2026
6 of 14 checks passed
@akshaydeo akshaydeo deleted the 04-20-tests_async_inference_e2e_tests_added branch April 20, 2026 18:23
dominictayloruk pushed a commit to dominictayloruk/bifrost that referenced this pull request Apr 21, 2026
## Summary

Adds a dedicated end-to-end test suite for Bifrost's async inference feature, covering all `/v1/async/*` endpoints and the integration route header mechanism (`x-bf-async` / `x-bf-async-id`).

## Changes

- Added `tests/async/` as a standalone Go module with six test files:
  - `submit_test.go` — verifies all 11 async endpoints return 202 with a well-formed job envelope and that immediate polling returns the correct status code
  - `lifecycle_test.go` — polls jobs to terminal state, validates response shape for completed jobs, confirms 404 for non-existent jobs and wrong-endpoint-type polling
  - `auth_test.go` — covers VK scoping (same key succeeds, different key returns 404, missing key on VK-scoped job returns 404, public jobs are accessible without a VK), and all three auth header formats (`Authorization: Bearer`, `x-api-key`, `x-goog-api-key`)
  - `ttl_test.go` — validates default TTL (~3600s), custom TTL via `x-bf-async-job-result-ttl`, fallback to default for invalid/zero/negative values, and 404 after TTL expiry
  - `validation_test.go` — confirms rejection of `stream=true`, `stream_format=sse` on speech, malformed JSON, missing required fields, missing multipart files, and wrong HTTP method
  - `integration_route_test.go` — tests `x-bf-async` submit and `x-bf-async-id` retrieve on the `/openai/v1/responses` integration route, including stream rejection and VK isolation
- Added `fixtures_test.go` with endpoint fixtures for all 11 async endpoints, model overrides via `ASYNC_*_MODEL` environment variables, and generated test media (sample PNG, sample MP3 reference)
- Added `helpers_test.go` with shared HTTP helpers, config sourced from environment variables, and a `TestMain` health-check gate that skips the suite when the gateway is unreachable
- Added `README.md` documenting how to run the tests, all environment variables, model override keys, and per-file coverage descriptions
- Tests run in two modes — global (no VK) and `with_vk` — wherever VK presence is not the focus of the test

## Type of change

- [ ] Bug fix
- [ ] Feature
- [ ] Refactor
- [ ] Documentation
- [x] Chore/CI

## Affected areas

- [x] Core (Go)
- [x] Transports (HTTP)
- [ ] Providers/Integrations
- [ ] Plugins
- [ ] UI (React)
- [ ] Docs

## How to test

```sh
# From the repo root — requires a running Bifrost gateway on localhost:8080
cd tests/async
go test ./... -timeout 300s

# With virtual keys to enable VK-scoped auth and cross-VK isolation tests
BIFROST_VK=sk-bf-... BIFROST_ALT_VK=sk-bf-... go test ./... -timeout 300s

# Override the gateway URL or individual models
BIFROST_BASE_URL=http://my-gateway:8080 \
ASYNC_CHAT_COMPLETION_MODEL=openai/gpt-4o \
go test ./... -timeout 300s
```

**Environment variables:**

| Variable | Default | Description |
|---|---|---|
| `BIFROST_BASE_URL` | `http://localhost:8080` | Bifrost gateway URL |
| `BIFROST_VK` | — | Primary virtual key; enables VK-mode tests |
| `BIFROST_ALT_VK` | — | Second virtual key; enables cross-VK isolation tests |
| `BIFROST_POLL_TIMEOUT` | `30s` | Max time to wait for a job to reach terminal state |
| `BIFROST_POLL_INTERVAL` | `500ms` | Polling cadence |
| `BIFROST_INTEGRATION_PATH` | `/openai/v1/responses` | Override integration route path |
| `BIFROST_INTEGRATION_MODEL` | `openai/gpt-4o-mini` | Override model for integration route tests |
| `ASYNC_*_MODEL` | see README | Per-endpoint model overrides |

If the gateway is unreachable at startup, the entire suite exits cleanly with a skip message rather than failing.

## Screenshots/Recordings

N/A

## Breaking changes

- [x] No

## Related issues

N/A

## Security considerations

Virtual keys passed via `BIFROST_VK` and `BIFROST_ALT_VK` are read from environment variables and never logged or written to disk. Cross-VK isolation tests confirm that job IDs cannot be accessed across VK boundaries.

## Checklist

- [ ] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [ ] I verified the CI pipeline passes locally if applicable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants