Skip to content

feat(cosmos): add semantic rerank API#37981

Open
aditishree1 wants to merge 24 commits into
mainfrom
feature/cosmos-semantic-rerank
Open

feat(cosmos): add semantic rerank API#37981
aditishree1 wants to merge 24 commits into
mainfrom
feature/cosmos-semantic-rerank

Conversation

@aditishree1
Copy link
Copy Markdown
Member

@aditishree1 aditishree1 commented Apr 6, 2026

Packages impacted by this PR

@azure/cosmos

Issues associated with this PR

Describe the problem that is addressed by this PR

Adds container.semanticRerank() — a method that sends documents to the Cosmos DB Inference Service for semantic reranking using AI models. Given a query and a list of document strings, the service returns relevance scores so the most semantically relevant documents surface first. The feature introduces an InferenceService class under src/inference/ that manages its own AAD-authenticated HTTP pipeline, separate from the main Cosmos DB data plane.

Flow
Container.semanticRerank() → ClientContext → InferenceService → HTTP POST to inference endpoint

The service is lazily initialized on the first semanticRerank() call, shared across all containers via ClientContext, and cleaned up when CosmosClient.dispose() is called.

The inference endpoint is configured via the AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT environment variable (consistent with .NET and Python SDKs).

Example

 const client = new CosmosClient({
   endpoint: "https://myaccount.documents.azure.com:443/",
   aadCredentials: new DefaultAzureCredential(),
 });


 // Set inference endpoint via environment variable
 // AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT=https://myaccount.eastus2.dbinference.azure.com

 const container = client.database("mydb").container("products");

 // FTS query
 const { resources } = await container.items
   .query(
     `SELECT c.id, c.name, c.description FROM c
      WHERE FullTextContains(c.description, 'home gym')
      ORDER BY RANK FullTextScore(c.description, 'home gym')`,
   )
   .fetchAll();

 const documents = resources.map((item) => JSON.stringify(item));

 //  Rerank by semantic relevance
 const result = await container.semanticRerank(
   "affordable home gym with pull-up bar",
   documents,
   { returnDocuments: true, topK: 5 },
 );

 // result.rerankScores → [{ index: 1, score: 0.99, document: "..." }, ...]

dotnet PR: Azure/azure-cosmos-dotnet-v3#5445

Are there test cases added in this PR? (If not, why?)

Yes

Implements the Semantic Rerank feature (ported from .NET SDK PR #5445) that
enables users to rerank documents using the Cosmos DB Inference Service for
semantic relevance scoring.

New features:
- Container.semanticRerank() public method for reranking documents
- InferenceService internal class managing HTTP calls to inference endpoint
- SemanticRerankResult, RerankScore, and SemanticRerankOptions types
- inferenceEndpoint option in CosmosClientOptions
- Separate AAD-authenticated pipeline with inference scope

The inference service uses a dedicated HTTP pipeline with its own AAD scope
(https://dbinference.azure.com/.default) and does not share the main SDK
request pipeline or retry policies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

API Change Check

APIView identified API level changes in this PR and created the following API reviews

@azure/cosmos

Aditishree . and others added 8 commits April 6, 2026 15:10
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The update-snippets tool requires all ts code blocks in JSDoc
to have a snippet name (e.g. \\\	s snippet:Name).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add integration test for semantic rerank feature that mirrors the .NET SDK
SemanticRerankingIntegrationTests. Tests against the inferencee2etest Cosmos DB
account with full-text search query followed by semantic reranking.

Verifies: rerank scores, result ordering, latency, and token usage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed RerankScore.document type from Record<string, unknown> to string
  to match actual inference API response format
- Updated unit test mocks to use string documents
- Rewrote integration tests with simple rerank test that works against
  semantic-reranker-test.eastus2.dbinference.azure.com
- Added second test for reranking without returnDocuments
- Moved full-text-search + rerank test to skipped (requires pre-existing data)
- Both live integration tests pass against the real inference endpoint

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed RerankScore.document type from Record<string, unknown> to string
  to match actual inference API response
- Updated unit test mocks to use string documents
- Rewrote integration tests:
  - Test 1: Simple rerank with hardcoded docs (passes live)
  - Test 2: Rerank without returnDocuments (passes live)
  - Test 3: Full e2e FTS query + rerank using pre-created rerank-test/products
    container on semantic-reranker-test account
- Removed describe.only to avoid breaking other tests
- Added forceQueryPlan + 10s delay for FTS index readiness

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The shared vitest config enables fakeTimers (setTimeout, Date) which
caused integration tests to hang. Fixed by overriding fakeTimers in
vitest.int.config.ts with toFake: [].

Also replaced the FTS-based test 3 with a standard Cosmos DB query
to avoid a pre-existing vitest/SDK incompatibility with FullTextSearch
queries under vitest's module transform pipeline. The test still
exercises the full E2E flow: upsert  query  semantic rerank  verify.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rengthen unit tests

- Revert vitest.int.config.ts fakeTimers override (not needed with standard query)
- Add vi.useRealTimers() in integration test beforeAll for localized timer fix
- Strengthen env var fallback and precedence unit tests to verify resolved URL
- Remove outdated FTS incompatibility comment from integration test JSDoc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@aditishree1 aditishree1 marked this pull request as ready for review April 10, 2026 04:34
Copilot AI review requested due to automatic review settings April 10, 2026 04:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new semantic reranking capability to @azure/cosmos by introducing Container.semanticRerank() backed by a new InferenceService that calls the Cosmos DB Inference Service using an AAD-authenticated HTTP pipeline (separate from the data-plane pipeline).

Changes:

  • Introduces InferenceService + new public types (SemanticRerankOptions, SemanticRerankResult, RerankScore) and wires Container.semanticRerank() through ClientContext.
  • Adds client configuration (CosmosClientOptions.inferenceEndpoint) and disposes the lazily created inference service on CosmosClient.dispose().
  • Adds unit + integration coverage and updates API review + spelling dictionary.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
sdk/cosmosdb/cosmos/src/inference/InferenceService.ts New AAD-authenticated pipeline and POST call implementation for semantic rerank.
sdk/cosmosdb/cosmos/src/inference/SemanticRerankOptions.ts New options bag for rerank requests.
sdk/cosmosdb/cosmos/src/inference/SemanticRerankResult.ts New public result types for rerank responses.
sdk/cosmosdb/cosmos/src/inference/index.ts Re-exports inference-related public types.
sdk/cosmosdb/cosmos/src/client/Container/Container.ts Adds Container.semanticRerank() public API + docs snippet.
sdk/cosmosdb/cosmos/src/ClientContext.ts Lazily creates/uses InferenceService and exposes semanticRerank() + disposal hook.
sdk/cosmosdb/cosmos/src/CosmosClient.ts Disposes inference service on client disposal.
sdk/cosmosdb/cosmos/src/CosmosClientOptions.ts Adds inferenceEndpoint?: string option.
sdk/cosmosdb/cosmos/src/index.ts Exports the new public inference types from the package entrypoint.
sdk/cosmosdb/cosmos/test/internal/unit/inference/inferenceService.spec.ts Unit tests for request payload/response parsing and constructor validation.
sdk/cosmosdb/cosmos/test/internal/unit/inference/semanticRerank.spec.ts Unit tests for container-level semanticRerank entrypoints/errors.
sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts New live integration tests for rerank behavior (requires AAD + inference endpoint).
sdk/cosmosdb/cosmos/test/snippets.spec.ts Adds the docs snippet used in Container.semanticRerank() TSDoc.
sdk/cosmosdb/cosmos/review/cosmos-node.api.md API review updates for the new public surface area.
.vscode/cspell.json Adds “rerank” to spelling dictionary.

Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts
Comment thread sdk/cosmosdb/cosmos/test/internal/unit/inference/inferenceService.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/snippets.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Aditishree . and others added 3 commits April 10, 2026 10:24
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@aditishree1 aditishree1 changed the title feat(cosmos): add semantic rerank API for Cosmos DB Inference Service feat(cosmos): add semantic rerank API Apr 10, 2026
Copy link
Copy Markdown
Member

@niteshvijay1995 niteshvijay1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Validated Against Cosmos SDK Patterns

I verified each finding against the actual patterns in the Cosmos DB SDK codebase. Two initially flagged items (OperationOptions and RestError) were withdrawn because the Cosmos SDK uses its own conventions (SharedOptions and ErrorResponse). The remaining findings below are confirmed valid.

Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts
Comment thread sdk/cosmosdb/cosmos/test/internal/unit/inference/inferenceService.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts
Comment thread sdk/cosmosdb/cosmos/test/snippets.spec.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/SemanticRerankResult.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Copy link
Copy Markdown
Member

@niteshvijay1995 niteshvijay1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Validated Against Cosmos SDK Patterns

I verified each finding against the actual patterns in the Cosmos DB SDK codebase. Two initially flagged items (OperationOptions and RestError) were withdrawn because the Cosmos SDK uses its own conventions (SharedOptions and ErrorResponse). The remaining findings below are confirmed valid.

Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts
Comment thread sdk/cosmosdb/cosmos/test/internal/unit/inference/inferenceService.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts Outdated
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts
Comment thread sdk/cosmosdb/cosmos/test/snippets.spec.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/SemanticRerankResult.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
@aditishree1
Copy link
Copy Markdown
Member Author

@copilot

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-js/sessions/dfbcd509-d8d8-4cb0-97df-b70cb3fdf949

Co-authored-by: aditishree1 <141712869+aditishree1@users.noreply.github.com>
Comment thread sdk/cosmosdb/cosmos/review/cosmos-node.api.md Outdated
Comment thread sdk/cosmosdb/cosmos/review/cosmos-node.api.md Outdated
Comment thread sdk/cosmosdb/cosmos/review/cosmos-node.api.md Outdated
Comment thread sdk/cosmosdb/cosmos/review/cosmos-node.api.md Outdated
Comment thread sdk/cosmosdb/cosmos/review/cosmos-node.api.md Outdated
@aditishree1
Copy link
Copy Markdown
Member Author

@copilot

…ons to Record<string, unknown>

- Rename first parameter from rerankContext to context across all method signatures
- Convert SemanticRerankOptions from typed interface to Record<string, unknown> type alias
- Simplify buildPayload to pass options through as-is (no camelCase to snake_case)
- Strip abortSignal from payload (request-level option, not service payload)
- Document known service options in Container.ts JSDoc with snake_case keys
- Document document_type values as 'string' or 'json'
- Update all unit tests, integration tests, and snippets to use snake_case keys

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/InferenceService.ts Outdated
Comment thread sdk/cosmosdb/cosmos/src/inference/SemanticRerankOptions.ts
Comment thread sdk/cosmosdb/cosmos/test/public/integration/semanticRerank.spec.ts
Aditishree . and others added 4 commits April 22, 2026 11:19
…buildHeaders

- Move INFERENCE_BASE_PATH, INFERENCE_USER_AGENT, INFERENCE_DEFAULT_SCOPE,
  INFERENCE_DEFAULT_TIMEOUT_MS, INFERENCE_ENDPOINT_ENV_VAR to Constants object
  in common/constants.ts
- Use StatusCodes.Ok from common/statusCodes.ts in parseResponse
- Extract header setup into private buildHeaders(request) method

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ferenceService

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…changes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add inferenceEndpoint option to CosmosClientOptions for browser/portal
scenarios where environment variables are not available. Client options
take priority over the AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT
environment variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@@ -317,6 +317,7 @@ export class ClientContext {
diagnosticNode: DiagnosticNodeInternal;
partitionKeyRangeId?: string;
}): Promise<Response_2<T & Resource>>;
semanticRerank(context: string, documents: string[], options?: SemanticRerankOptions): Promise<SemanticRerankResult>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Remove

@@ -317,6 +317,7 @@ export class ClientContext {
diagnosticNode: DiagnosticNodeInternal;
partitionKeyRangeId?: string;
}): Promise<Response_2<T & Resource>>;
semanticRerank(context: string, documents: string[], options?: SemanticRerankOptions): Promise<SemanticRerankResult>;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if context can be renamed.

@@ -700,6 +701,11 @@ export const Constants: {
DefaultEncryptionCacheTimeToLiveInSeconds: number;
EncryptionCacheRefreshIntervalInMs: number;
RequestTimeoutForReadsInMs: number;
InferenceBasePath: string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a separate class for InferenceConstants.

@@ -824,6 +831,7 @@ export interface CosmosClientOptions {
diagnosticLevel?: CosmosDbDiagnosticLevel;
endpoint?: string;
httpClient?: HttpClient;
inferenceEndpoint?: string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it for now?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm it with Sajee

@@ -2133,6 +2141,13 @@ export interface RequestOptions extends SharedOptions {
urlConnection?: string;
}

// @public
export interface RerankScore {
document: string | null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null or undefined

@@ -2462,6 +2488,8 @@ export interface StatusCodesType {
// (undocumented)
MethodNotAllowed: 405;
// (undocumented)
MultipleChoices: 300;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if internally handling it, then do not add it here

@@ -304,6 +304,13 @@ export const Constants = {
EncryptionCacheRefreshIntervalInMs: 60000, // 1 minute

RequestTimeoutForReadsInMs: 2000, // 2 seconds

// Inference Service
InferenceBasePath: "/inference/semanticReranking",
Copy link
Copy Markdown
Member

@niteshvijay1995 niteshvijay1995 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to InferenceConstants

@@ -11,6 +11,9 @@ export interface StatusCodesType {
Accepted: 202;
NoContent: 204;
MultiStatus: 207;

// Redirection
MultipleChoices: 300;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid if possible

* Disposes the InferenceService if it was created.
* @internal
*/
public disposeInferenceService(): void {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant

if (!this.inferenceService) {
this.inferenceService = new InferenceService(this.cosmosClientOptions);
}
return this.inferenceService;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ??

@amanrao23
Copy link
Copy Markdown
Member

Add Diagnostics for inference

/**
* Sets the required HTTP headers on an inference service request.
*/
private buildHeaders(request: PipelineRequest): void {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to SetHeaders

private parseResponse(response: PipelineResponse): SemanticRerankResult {
if (response.status < StatusCodes.Ok || response.status >= StatusCodes.MultipleChoices) {
let serviceCode: string | number = response.status;
let serviceMessage = `Semantic rerank request failed with status ${response.status}`;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will reset on like 177

} catch {
// If parsing fails, fall back to raw body text
serviceMessage += `: ${response.bodyAsText}`;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid try catch

throw errorResponse;
}

const body = JSON.parse(response.bodyAsText || "{}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle null bodyAsText as error

}
} catch {
// If parsing fails, fall back to raw body text
serviceMessage += `: ${response.bodyAsText}`;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use +=


// Parse the error payload to surface the service's code, message, and details
try {
const errorBody = JSON.parse(response.bodyAsText || "{}");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use bodyAsText directly in ErrorResponse.message

const body = JSON.parse(response.bodyAsText || "{}");

const rerankScores: RerankScore[] = [];
if (Array.isArray(body.Scores)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw if it is not an array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants