Skip to content

Query: Adds ability to choose global vs local/focused statistics for FullTextScore #5582

Merged
microsoft-github-policy-service[bot] merged 8 commits intomasterfrom
users/ndeshpan/ftsLocalStatistics
Feb 6, 2026
Merged

Query: Adds ability to choose global vs local/focused statistics for FullTextScore #5582
microsoft-github-policy-service[bot] merged 8 commits intomasterfrom
users/ndeshpan/ftsLocalStatistics

Conversation

@neildsh
Copy link
Copy Markdown
Contributor

@neildsh neildsh commented Jan 30, 2026

Enabling users to choose global vs local/focused statistics for FullTextScore

Why?

Cosmos DB’s implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.

While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases.

In multi-tenant scenarios, it is often necessary to isolate queries to data belonging to a specific tenant, typically defined by the partition key or a component of a hierarchical partition key. This enables scoring to reflect statistics that are accurate for that tenant’s dataset, rather than for the entire container. For customers such as Veeam and Sitecore, which operate large multi-tenant containers, this is not just an optimization but a requirement. Their tenants often operate in very different domains, which can significantly change the distribution and importance of keywords and phrases. Using global statistics in these cases leads to distorted relevance rankings.

In other scenarios involving hundreds or thousands of physical partitions, computing statistics across the entire container can become both time-consuming and expensive. Customers may prefer to use statistics derived from only a subset of partitions to improve performance and reduce RU consumption. Indeed, there is precedence for this as Azure AI Search defaults to this “local” method.

What?

We propose extending the flexibility of BM25 scoring in Cosmos DB so that developers can choose between a global FullTextScore (existing behavior) or Scoped FullTextScore (statistics computed restricted to the partition key(s) used in the query). The key aspects:

For global BM25, FullTextScore retains its existing behavior and computes BM25 statistics, such as IDF and average document length, across all documents in the container regardless of any partition key filters in the query. In scoped BM25, when a query includes a partition key filter or explicitly requests scoped scoring, the engine computes these statistics only over the subset of documents within the specified partition key values. Query results are still returned only from the filtered partitions, and the resulting scores and ranking reflect relevance within that partition-specific slice of data.

How?

The user issues query like:

SELECT TOP 10 * FROM c   
WHERE c.tenantId = @tenantId   
ORDER BY RANK FullTextScore(c.text, "keywords") 

And sets a new QueryRequestOption called FullTextScoreScope which can be set to one of two values: local or global. The request option is inspected, and the query uses scoped/full stats accordingly.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Comment thread docs/query/local_statistics_for_hybrid_search.md
@adityasa
Copy link
Copy Markdown
Contributor

adityasa commented Jan 30, 2026

Is it possible to add some emulator based e2e tests?
One validation that is of interest is partition filtering based on :

  1. query filter
  2. QueryRequestOptions

In both cases, the query should honor FullTextScore Scope (local v/s global).

Comment thread Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs Outdated
adityasa
adityasa previously approved these changes Feb 3, 2026
Copy link
Copy Markdown
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

sc978345
sc978345 previously approved these changes Feb 4, 2026
Comment thread Microsoft.Azure.Cosmos/src/Resource/Settings/FullTextScoreScope.cs Outdated
sboshra
sboshra previously approved these changes Feb 4, 2026
Copy link
Copy Markdown
Contributor

@sboshra sboshra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@neildsh neildsh dismissed stale reviews from sboshra, sc978345, and adityasa via 44ba1c6 February 4, 2026 21:00
@neildsh neildsh force-pushed the users/ndeshpan/ftsLocalStatistics branch from 07a86a1 to 44ba1c6 Compare February 4, 2026 21:00
Copy link
Copy Markdown
Contributor

@adityasa adityasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copilot AI mentioned this pull request Mar 18, 2026
4 tasks
microsoft-github-policy-service Bot pushed a commit that referenced this pull request Mar 20, 2026
## Version Changes

| Property | Old | New |
|---|---|---|
| `ClientOfficialVersion` | 3.57.0 | **3.58.0** |
| `ClientPreviewVersion` | 3.58.0 | **3.59.0** |
| `ClientPreviewSuffixVersion` | preview.0 | **preview.0** |

## Changelog

### 3.59.0-preview.0 (Preview)

#### Added
- [#5502](#5502)
VectorIndex Policy: Adds Support for QuantizerType in IndexingPolicy
- [#5634](#5634)
Semantic Reranking: Adds response body in semantic reranking error
responses
- [#5685](#5685)
Read Consistency Strategy: Adds Read Consistency Strategy option for
read requests

### 3.58.0 (GA)

#### Added
- [#5447](#5447) Per
Partition Automatic Failover: Adds Hub Region Processing Only While
Routing Requests Failed with 404/1002 for single master accounts
- [#5551](#5551)
HPK: Adds internal CosmosClientOptions flag UseLengthAwareRangeComparer
for length aware range comparer rollout
- [#5582](#5582)
Query: Adds ability to choose global vs local/focused statistics for
FullTextScore
- [5610](#5610)
Refactors N-Region Synchronous Commit feature to use
IServiceConfigurationReaderVNext interface.
- [#5693](#5693)
ThinClient Integration: Adds Enable Multiple Http2 connection on
SocketsHttpHandler
- [#5614](#5614)
ThinClient Integration: Adds support for QueryPlan in thinclient mode

#### Fixed
- [#5597](#5597)
CosmosClient: Fixes ObjectDisposedException message when client is
disposed during request
- [#5613](#5613)
CrossRegionHedgingAvailabilityStrategy: Fixes ArgumentNullException race
condition in hedging cancellation
- [#5650](#5650)
Batch: Fixes null ErrorMessage when promoting status from MultiStatus
response
- [#5651](#5651)
Serializer: Fixes unsafe stream cast in FromStream<T>
- [#5697](#5697)
ResourceThrottleRetryPolicy: Fixes cumulativeRetryDelay tracking when
x-ms-retry-after-ms header is absent

### API Contract Diff (GA)

```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.57.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt"
index a1fa19e..1b74a69 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.57.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0.txt"
@@ -639,6 +639,11 @@ namespace Microsoft.Azure.Cosmos
         public string DefaultLanguage { get; set; }
         public Collection<FullTextPath> FullTextPaths { get; set; }
     }
+    public enum FullTextScoreScope
+    {
+        Global = 0,
+        Local = 1,
+    }
     public sealed class GeospatialConfig
     {
         public GeospatialConfig();
@@ -869,6 +874,7 @@ namespace Microsoft.Azure.Cosmos
         public Nullable<bool> EnableLowPrecisionOrderBy { get; set; }
         public bool EnableOptimisticDirectExecution { get; set; }
         public Nullable<bool> EnableScanInQuery { get; set; }
+        public FullTextScoreScope FullTextScoreScope { get; set; }
         public Nullable<int> MaxBufferedItemCount { get; set; }
         public Nullable<int> MaxConcurrency { get; set; }
         public Nullable<int> MaxItemCount { get; set; }
```

### API Contract Diff (Preview)

```diff
diff --git "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0-preview.0.txt" "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt"
index af57dd8..1ae52c0 100644
--- "a/Microsoft.Azure.Cosmos\\contracts\\API_3.58.0-preview.0.txt"
+++ "b/Microsoft.Azure.Cosmos\\contracts\\API_3.59.0-preview.0.txt"
@@ -128,6 +128,7 @@ namespace Microsoft.Azure.Cosmos
         public new string IfMatchEtag { get; set; }
         public new string IfNoneMatchEtag { get; set; }
         public Nullable<int> PageSizeHint { get; set; }
+        public Nullable<ReadConsistencyStrategy> ReadConsistencyStrategy { get; set; }
     }
     public abstract class ChangeFeedStartFrom
     {
@@ -414,6 +415,7 @@ namespace Microsoft.Azure.Cosmos
         public Nullable<TimeSpan> OpenTcpConnectionTimeout { get; set; }
         public Nullable<PortReuseMode> PortReuseMode { get; set; }
         public Nullable<PriorityLevel> PriorityLevel { get; set; }
+        public Nullable<ReadConsistencyStrategy> ReadConsistencyStrategy { get; set; }
         public TimeSpan RequestTimeout { get; set; }
         public CosmosSerializer Serializer { get; set; }
         public CosmosSerializationOptions SerializerOptions { get; set; }
@@ -746,6 +748,11 @@ namespace Microsoft.Azure.Cosmos
         public string DefaultLanguage { get; set; }
         public Collection<FullTextPath> FullTextPaths { get; set; }
     }
+    public enum FullTextScoreScope
+    {
+        Global = 0,
+        Local = 1,
+    }
     public sealed class GeospatialConfig
     {
         public GeospatialConfig();
@@ -825,6 +832,7 @@ namespace Microsoft.Azure.Cosmos
         public Nullable<IndexingDirective> IndexingDirective { get; set; }
         public IEnumerable<string> PostTriggers { get; set; }
         public IEnumerable<string> PreTriggers { get; set; }
+        public Nullable<ReadConsistencyStrategy> ReadConsistencyStrategy { get; set; }
         public string SessionToken { get; set; }
     }
     public class ItemResponse<T> : Response<T>
@@ -972,6 +980,11 @@ namespace Microsoft.Azure.Cosmos
         High = 1,
         Low = 2,
     }
+    public enum QuantizerType
+    {
+        Product = 0,
+        Spherical = 1,
+    }
     public class QueryDefinition
     {
         public QueryDefinition(string query);
@@ -988,6 +1001,7 @@ namespace Microsoft.Azure.Cosmos
         public Nullable<bool> EnableLowPrecisionOrderBy { get; set; }
         public bool EnableOptimisticDirectExecution { get; set; }
         public Nullable<bool> EnableScanInQuery { get; set; }
+        public FullTextScoreScope FullTextScoreScope { get; set; }
         public Nullable<int> MaxBufferedItemCount { get; set; }
         public Nullable<int> MaxConcurrency { get; set; }
         public Nullable<int> MaxItemCount { get; set; }
@@ -995,6 +1009,7 @@ namespace Microsoft.Azure.Cosmos
         public Nullable<bool> PopulateIndexMetrics { get; set; }
         public Nullable<bool> PopulateQueryAdvice { get; set; }
         public QueryTextMode QueryTextMode { get; set; }
+        public Nullable<ReadConsistencyStrategy> ReadConsistencyStrategy { get; set; }
         public Nullable<int> ResponseContinuationTokenLimitInKb { get; set; }
         public string SessionToken { get; set; }
     }
@@ -1004,10 +1019,18 @@ namespace Microsoft.Azure.Cosmos
         None = 0,
         ParameterizedOnly = 1,
     }
+    public enum ReadConsistencyStrategy
+    {
+        Eventual = 1,
+        GlobalStrong = 4,
+        LatestCommitted = 3,
+        Session = 2,
+    }
     public class ReadManyRequestOptions : RequestOptions
     {
         public ReadManyRequestOptions();
         public Nullable<ConsistencyLevel> ConsistencyLevel { get; set; }
+        public Nullable<ReadConsistencyStrategy> ReadConsistencyStrategy { get; set; }
         public string SessionToken { get; set; }
     }
     public static class Regions
@@ -1383,6 +1406,7 @@ namespace Microsoft.Azure.Cosmos
         public int IndexingSearchListSize { get; set; }
         public string Path { get; set; }
         public int QuantizationByteSize { get; set; }
+        public Nullable<QuantizerType> QuantizerType { get; set; }
         public VectorIndexType Type { get; set; }
         public string[] VectorIndexShardKey { get; set; }
     }
@@ -1482,6 +1506,7 @@ namespace Microsoft.Azure.Cosmos.Fluent
         public CosmosClientBuilder WithHttpClientFactory(Func<HttpClient> httpClientFactory);
         public CosmosClientBuilder WithLimitToEndpoint(bool limitToEndpoint);
         public CosmosClientBuilder WithPriorityLevel(PriorityLevel priorityLevel);
+        public CosmosClientBuilder WithReadConsistencyStrategy(ReadConsistencyStrategy readConsistencyStrategy);
         public CosmosClientBuilder WithRequestTimeout(TimeSpan requestTimeout);
         public CosmosClientBuilder WithSerializerOptions(CosmosSerializationOptions cosmosSerializerOptions);
         public CosmosClientBuilder WithSystemTextJsonSerializerOptions(JsonSerializerOptions serializerOptions);
@@ -1540,6 +1565,7 @@ namespace Microsoft.Azure.Cosmos.Fluent
         public VectorIndexDefinition<T> Path(string path, VectorIndexType indexType);
         public VectorIndexDefinition<T> WithIndexingSearchListSize(int indexingSearchListSize);
         public VectorIndexDefinition<T> WithQuantizationByteSize(int quantizationByteSize);
+        public VectorIndexDefinition<T> WithQuantizerType(QuantizerType quantizerType);
         public VectorIndexDefinition<T> WithVectorIndexShardKey(string[] vectorIndexShardKey);
     }
 }
```

## Checklist
- [ ] Changelog review by team
- [ ] Email `azurecosmossdkdotnet@microsoft.com` for preview API review
- [ ] API contract diff approval (Kiran & Kirill)
- [ ] Kiran sign-off (required)
- [ ] Determine if "Recommended Version" needs further updating

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs QUERY

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants