feat(cosmos): add semantic rerank API#37981
Conversation
Implements the Semantic Rerank feature (ported from .NET SDK PR #5445) that enables users to rerank documents using the Cosmos DB Inference Service for semantic relevance scoring. New features: - Container.semanticRerank() public method for reranking documents - InferenceService internal class managing HTTP calls to inference endpoint - SemanticRerankResult, RerankScore, and SemanticRerankOptions types - inferenceEndpoint option in CosmosClientOptions - Separate AAD-authenticated pipeline with inference scope The inference service uses a dedicated HTTP pipeline with its own AAD scope (https://dbinference.azure.com/.default) and does not share the main SDK request pipeline or retry policies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The update-snippets tool requires all ts code blocks in JSDoc to have a snippet name (e.g. \\\ s snippet:Name). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add integration test for semantic rerank feature that mirrors the .NET SDK SemanticRerankingIntegrationTests. Tests against the inferencee2etest Cosmos DB account with full-text search query followed by semantic reranking. Verifies: rerank scores, result ordering, latency, and token usage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed RerankScore.document type from Record<string, unknown> to string to match actual inference API response format - Updated unit test mocks to use string documents - Rewrote integration tests with simple rerank test that works against semantic-reranker-test.eastus2.dbinference.azure.com - Added second test for reranking without returnDocuments - Moved full-text-search + rerank test to skipped (requires pre-existing data) - Both live integration tests pass against the real inference endpoint Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed RerankScore.document type from Record<string, unknown> to string
to match actual inference API response
- Updated unit test mocks to use string documents
- Rewrote integration tests:
- Test 1: Simple rerank with hardcoded docs (passes live)
- Test 2: Rerank without returnDocuments (passes live)
- Test 3: Full e2e FTS query + rerank using pre-created rerank-test/products
container on semantic-reranker-test account
- Removed describe.only to avoid breaking other tests
- Added forceQueryPlan + 10s delay for FTS index readiness
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The shared vitest config enables fakeTimers (setTimeout, Date) which caused integration tests to hang. Fixed by overriding fakeTimers in vitest.int.config.ts with toFake: []. Also replaced the FTS-based test 3 with a standard Cosmos DB query to avoid a pre-existing vitest/SDK incompatibility with FullTextSearch queries under vitest's module transform pipeline. The test still exercises the full E2E flow: upsert query semantic rerank verify. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rengthen unit tests - Revert vitest.int.config.ts fakeTimers override (not needed with standard query) - Add vi.useRealTimers() in integration test beforeAll for localized timer fix - Strengthen env var fallback and precedence unit tests to verify resolved URL - Remove outdated FTS incompatibility comment from integration test JSDoc Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new semantic reranking capability to @azure/cosmos by introducing Container.semanticRerank() backed by a new InferenceService that calls the Cosmos DB Inference Service using an AAD-authenticated HTTP pipeline (separate from the data-plane pipeline).
Changes:
- Introduces
InferenceService+ new public types (SemanticRerankOptions,SemanticRerankResult,RerankScore) and wiresContainer.semanticRerank()throughClientContext. - Adds client configuration (
CosmosClientOptions.inferenceEndpoint) and disposes the lazily created inference service onCosmosClient.dispose(). - Adds unit + integration coverage and updates API review + spelling dictionary.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmosdb/cosmos/src/inference/InferenceService.ts | New AAD-authenticated pipeline and POST call implementation for semantic rerank. |
| sdk/cosmosdb/cosmos/src/inference/SemanticRerankOptions.ts | New options bag for rerank requests. |
| sdk/cosmosdb/cosmos/src/inference/SemanticRerankResult.ts | New public result types for rerank responses. |
| sdk/cosmosdb/cosmos/src/inference/index.ts | Re-exports inference-related public types. |
| sdk/cosmosdb/cosmos/src/client/Container/Container.ts | Adds Container.semanticRerank() public API + docs snippet. |
| sdk/cosmosdb/cosmos/src/ClientContext.ts | Lazily creates/uses InferenceService and exposes semanticRerank() + disposal hook. |
| sdk/cosmosdb/cosmos/src/CosmosClient.ts | Disposes inference service on client disposal. |
| sdk/cosmosdb/cosmos/src/CosmosClientOptions.ts | Adds inferenceEndpoint?: string option. |
| sdk/cosmosdb/cosmos/src/index.ts | Exports the new public inference types from the package entrypoint. |
| sdk/cosmosdb/cosmos/test/internal/unit/inference/inferenceService.spec.ts | Unit tests for request payload/response parsing and constructor validation. |
| sdk/cosmosdb/cosmos/test/internal/unit/inference/semanticRerank.spec.ts | Unit tests for container-level semanticRerank entrypoints/errors. |
| sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts | New live integration tests for rerank behavior (requires AAD + inference endpoint). |
| sdk/cosmosdb/cosmos/test/snippets.spec.ts | Adds the docs snippet used in Container.semanticRerank() TSDoc. |
| sdk/cosmosdb/cosmos/review/cosmos-node.api.md | API review updates for the new public surface area. |
| .vscode/cspell.json | Adds “rerank” to spelling dictionary. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
niteshvijay1995
left a comment
There was a problem hiding this comment.
Code Review — Validated Against Cosmos SDK Patterns
I verified each finding against the actual patterns in the Cosmos DB SDK codebase. Two initially flagged items (OperationOptions and RestError) were withdrawn because the Cosmos SDK uses its own conventions (SharedOptions and ErrorResponse). The remaining findings below are confirmed valid.
niteshvijay1995
left a comment
There was a problem hiding this comment.
Code Review — Validated Against Cosmos SDK Patterns
I verified each finding against the actual patterns in the Cosmos DB SDK codebase. Two initially flagged items (OperationOptions and RestError) were withdrawn because the Cosmos SDK uses its own conventions (SharedOptions and ErrorResponse). The remaining findings below are confirmed valid.
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-js/sessions/dfbcd509-d8d8-4cb0-97df-b70cb3fdf949 Co-authored-by: aditishree1 <141712869+aditishree1@users.noreply.github.com>
fee8b3a to
3a833d7
Compare
…ons to Record<string, unknown> - Rename first parameter from rerankContext to context across all method signatures - Convert SemanticRerankOptions from typed interface to Record<string, unknown> type alias - Simplify buildPayload to pass options through as-is (no camelCase to snake_case) - Strip abortSignal from payload (request-level option, not service payload) - Document known service options in Container.ts JSDoc with snake_case keys - Document document_type values as 'string' or 'json' - Update all unit tests, integration tests, and snippets to use snake_case keys Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…buildHeaders - Move INFERENCE_BASE_PATH, INFERENCE_USER_AGENT, INFERENCE_DEFAULT_SCOPE, INFERENCE_DEFAULT_TIMEOUT_MS, INFERENCE_ENDPOINT_ENV_VAR to Constants object in common/constants.ts - Use StatusCodes.Ok from common/statusCodes.ts in parseResponse - Extract header setup into private buildHeaders(request) method Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ferenceService Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…changes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add inferenceEndpoint option to CosmosClientOptions for browser/portal scenarios where environment variables are not available. Client options take priority over the AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT environment variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -317,6 +317,7 @@ export class ClientContext { | |||
| diagnosticNode: DiagnosticNodeInternal; | |||
| partitionKeyRangeId?: string; | |||
| }): Promise<Response_2<T & Resource>>; | |||
| semanticRerank(context: string, documents: string[], options?: SemanticRerankOptions): Promise<SemanticRerankResult>; | |||
| @@ -317,6 +317,7 @@ export class ClientContext { | |||
| diagnosticNode: DiagnosticNodeInternal; | |||
| partitionKeyRangeId?: string; | |||
| }): Promise<Response_2<T & Resource>>; | |||
| semanticRerank(context: string, documents: string[], options?: SemanticRerankOptions): Promise<SemanticRerankResult>; | |||
There was a problem hiding this comment.
check if context can be renamed.
| @@ -700,6 +701,11 @@ export const Constants: { | |||
| DefaultEncryptionCacheTimeToLiveInSeconds: number; | |||
| EncryptionCacheRefreshIntervalInMs: number; | |||
| RequestTimeoutForReadsInMs: number; | |||
| InferenceBasePath: string; | |||
There was a problem hiding this comment.
can we have a separate class for InferenceConstants.
| @@ -824,6 +831,7 @@ export interface CosmosClientOptions { | |||
| diagnosticLevel?: CosmosDbDiagnosticLevel; | |||
| endpoint?: string; | |||
| httpClient?: HttpClient; | |||
| inferenceEndpoint?: string; | |||
| @@ -2133,6 +2141,13 @@ export interface RequestOptions extends SharedOptions { | |||
| urlConnection?: string; | |||
| } | |||
|
|
|||
| // @public | |||
| export interface RerankScore { | |||
| document: string | null; | |||
| @@ -2462,6 +2488,8 @@ export interface StatusCodesType { | |||
| // (undocumented) | |||
| MethodNotAllowed: 405; | |||
| // (undocumented) | |||
| MultipleChoices: 300; | |||
There was a problem hiding this comment.
if internally handling it, then do not add it here
| @@ -304,6 +304,13 @@ export const Constants = { | |||
| EncryptionCacheRefreshIntervalInMs: 60000, // 1 minute | |||
|
|
|||
| RequestTimeoutForReadsInMs: 2000, // 2 seconds | |||
|
|
|||
| // Inference Service | |||
| InferenceBasePath: "/inference/semanticReranking", | |||
There was a problem hiding this comment.
move to InferenceConstants
| @@ -11,6 +11,9 @@ export interface StatusCodesType { | |||
| Accepted: 202; | |||
| NoContent: 204; | |||
| MultiStatus: 207; | |||
|
|
|||
| // Redirection | |||
| MultipleChoices: 300; | |||
| * Disposes the InferenceService if it was created. | ||
| * @internal | ||
| */ | ||
| public disposeInferenceService(): void { |
| if (!this.inferenceService) { | ||
| this.inferenceService = new InferenceService(this.cosmosClientOptions); | ||
| } | ||
| return this.inferenceService; |
|
Add Diagnostics for inference |
| /** | ||
| * Sets the required HTTP headers on an inference service request. | ||
| */ | ||
| private buildHeaders(request: PipelineRequest): void { |
There was a problem hiding this comment.
nit: rename to SetHeaders
| private parseResponse(response: PipelineResponse): SemanticRerankResult { | ||
| if (response.status < StatusCodes.Ok || response.status >= StatusCodes.MultipleChoices) { | ||
| let serviceCode: string | number = response.status; | ||
| let serviceMessage = `Semantic rerank request failed with status ${response.status}`; |
There was a problem hiding this comment.
This will reset on like 177
| } catch { | ||
| // If parsing fails, fall back to raw body text | ||
| serviceMessage += `: ${response.bodyAsText}`; | ||
| } |
| throw errorResponse; | ||
| } | ||
|
|
||
| const body = JSON.parse(response.bodyAsText || "{}"); |
There was a problem hiding this comment.
Handle null bodyAsText as error
| } | ||
| } catch { | ||
| // If parsing fails, fall back to raw body text | ||
| serviceMessage += `: ${response.bodyAsText}`; |
|
|
||
| // Parse the error payload to surface the service's code, message, and details | ||
| try { | ||
| const errorBody = JSON.parse(response.bodyAsText || "{}"); |
There was a problem hiding this comment.
can we use bodyAsText directly in ErrorResponse.message
| const body = JSON.parse(response.bodyAsText || "{}"); | ||
|
|
||
| const rerankScores: RerankScore[] = []; | ||
| if (Array.isArray(body.Scores)) { |
There was a problem hiding this comment.
throw if it is not an array
Packages impacted by this PR
@azure/cosmos
Issues associated with this PR
Describe the problem that is addressed by this PR
Adds container.semanticRerank() — a method that sends documents to the Cosmos DB Inference Service for semantic reranking using AI models. Given a query and a list of document strings, the service returns relevance scores so the most semantically relevant documents surface first. The feature introduces an InferenceService class under
src/inference/that manages its own AAD-authenticated HTTP pipeline, separate from the main Cosmos DB data plane.Flow
Container.semanticRerank() → ClientContext → InferenceService → HTTP POST to inference endpoint
The service is lazily initialized on the first semanticRerank() call, shared across all containers via ClientContext, and cleaned up when CosmosClient.dispose() is called.
The inference endpoint is configured via the AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT environment variable (consistent with .NET and Python SDKs).
Example
dotnet PR: Azure/azure-cosmos-dotnet-v3#5445
Are there test cases added in this PR? (If not, why?)
Yes