Skip to content

Comments

Semantic Rerank: Adds Semantic Rerank API#5445

Merged
microsoft-github-policy-service[bot] merged 43 commits intomasterfrom
users/nalutripician/semanticRerank
Nov 24, 2025
Merged

Semantic Rerank: Adds Semantic Rerank API#5445
microsoft-github-policy-service[bot] merged 43 commits intomasterfrom
users/nalutripician/semanticRerank

Conversation

@NaluTripician
Copy link
Contributor

@NaluTripician NaluTripician commented Oct 13, 2025

Pull Request Template

Description

This pull request introduces a new semantic reranking feature to the Azure Cosmos DB .NET SDK, enabling users to rerank documents using an inference service that leverages Azure Active Directory (AAD) authentication. The main changes include the addition of the InferenceService class, new API surface for semantic reranking, and appropriate integration into the SDK's authorization and client context infrastructure. Notably, this functionality is only available when using AAD authentication.

Semantic Reranking Feature Integration:

  • Added the InferenceService class, which handles communication with the Cosmos DB Inference Service for semantic reranking, including HTTP client configuration, payload construction, and response handling. This service enforces AAD authentication and manages its own authorization and disposal.
  • Introduced a new public (under PREVIEW) or internal API SemanticRerankAsync to the Container class, allowing users to rerank a list of documents based on a context/query string. This is implemented in ContainerInlineCore and routed through the client context. [1] [2]
  • To use this feature, the environment variable "AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_ENDPOINT", must be set with the inference endpoint from the service.
  • Additionally, the environment variable "AZURE_COSMOS_SEMANTIC_RERANKER_INFERENCE_SERVICE_MAX_CONNECTION_LIMIT", can be set to change the inference client's max connection limit.

Authorization and Token Handling Updates:

  • Extended the AuthorizationTokenProvider abstraction and its implementations to support a new method, AddInferenceAuthorizationHeaderAsync, which is only valid for AAD-based token providers. Non-AAD providers throw a NotImplementedException for this method. [1] [2] [3] [4] [5] [6]

Client Context and Resource Management:

  • Updated ClientContextCore and CosmosClientContext to manage the lifecycle of the InferenceService, including creation, caching, and disposal. Added methods for invoking semantic reranking and for retrieving or creating the inference service instance. [1] [2] [3] [4] [5] [6]

Dependency Updates:

  • Added a dependency on the Azure.Identity package in the test project to support AAD authentication scenarios.
    Please delete options that are not relevant.

Example

//Sample code to demonstrate Semantic Reranking
// Assume 'container' is an instance of Cosmos.Container
// This example queries items from a fitness store with full-text search and then reranks them semantically.

string search_text = "integrated pull-up bar";

string queryString = $@"
    SELECT TOP 15 c.id, c.Name, c.Brand, c.Description
    FROM c
    WHERE FullTextContains(c.Description, ""{search_text}"")
    ORDER BY RANK FullTextScore(c.Description, ""{search_text}"")
    ";

string reranking_context = "most economical with multiple pulley adjustmnets and ideal for home gyms";

List<string> documents = new List<string>();
FeedIterator<dynamic> resultSetIterator = container.GetItemQueryIterator<dynamic>(
    new QueryDefinition(queryString),
    requestOptions: new QueryRequestOptions()
    {
        MaxItemCount = 15,
    });

while (resultSetIterator.HasMoreResults)
{
    FeedResponse<dynamic> response = await resultSetIterator.ReadNextAsync();
    foreach (JsonElement item in response)
    {
        documents.Add(item.ToString());
    }
}

Dictionary<string, dynamic> options = new Dictionary<string, dynamic>
{
    { "return_documents", true },
    { "top_k", 10 },
    { "batch_size", 32 },
    { "sort", true }
};

SemanticRerankResult results = await container.SemanticRerankAsync(
    reranking_context,
    documents,
    options);

// get the best resulting document from the query
results.RerankScores.First().Document;
// or the index of the document in the original list
results.RerankScores.First().Index;
// or the reranking score 
results.RerankScores.First().Score;

// get the latency information from the reranking operation
Dictonary<string, object. latencyInfo = results.Latency;

// get the token usage information from the reranking operation
Dictonary<string, object> tokenUseageInfo = results.TokenUseage;
  • [] New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #IssueNumber

@NaluTripician NaluTripician marked this pull request as draft October 13, 2025 17:52
@NaluTripician NaluTripician marked this pull request as ready for review October 22, 2025 22:37
milismsft
milismsft previously approved these changes Oct 22, 2025
Copy link

@milismsft milismsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to address the potential multiple background tasks related to the Interference object (and proper dispose of that task as well) :-)

Copy link
Member

@aayush3011 aayush3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NaluTripician LGTM, added the comments, that we discussed offline.

This was referenced Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants