Skip to content

EmbeddingGenerator: Adds ICosmosEmbeddingGenerator client-wide configuration (preview)#5838

Merged
ananth7592 merged 1 commit into
mainfrom
users/amudumba/embedding-generator-interface
May 19, 2026
Merged

EmbeddingGenerator: Adds ICosmosEmbeddingGenerator client-wide configuration (preview)#5838
ananth7592 merged 1 commit into
mainfrom
users/amudumba/embedding-generator-interface

Conversation

@ananth7592
Copy link
Copy Markdown
Member

@ananth7592 ananth7592 commented May 5, 2026

Adds the public surface only for V0 query-time embedding generation in the .NET SDK (preview). No runtime behavior is wired up yet — pipeline resolver and binding land in a follow-up PR.

Customer-facing API (preview)

Interface and result type

public interface ICosmosEmbeddingGenerator
{
    Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(
        IReadOnlyList<string> texts,
        string endpoint,
        string deploymentName,
        int dimensions,
        CancellationToken cancellationToken = default);
}

public sealed class CosmosEmbeddingResult
{
    public IReadOnlyList<ReadOnlyMemory<float>> Vectors { get; }
    public int? TotalTokens { get; }
    public TimeSpan? Latency { get; }
}

Signature mirrors Python SDK PR #46902.

Client-wide configuration

var client = new CosmosClientBuilder(endpoint, cred)
    .WithEmbeddingGenerator(myGenerator)
    .Build();

ICosmosEmbeddingGenerator current = client.EmbeddingGenerator;

Or via options:

var client = new CosmosClient(endpoint, key, new CosmosClientOptions
{
    EmbeddingGenerator = myGenerator,
});

Files

Source (new)

  • Microsoft.Azure.Cosmos/src/ICosmosEmbeddingGenerator.cs
  • Microsoft.Azure.Cosmos/src/CosmosEmbeddingResult.cs

Source (modified)

  • CosmosClientOptions.cs — adds EmbeddingGenerator ([JsonIgnore])
  • Fluent/CosmosClientBuilder.cs — adds WithEmbeddingGenerator(...)
  • CosmosClient.cs — adds EmbeddingGenerator read-only accessor

Tests

  • CosmosClientOptionsUnitTests.cs:
    • VerifyEmbeddingGeneratorBuilderProperties — builder round-trip + null-arg guard
    • CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstance — read-only accessor surfaces builder- and options-set instances
    • CosmosClientOptions_Clone_PreservesEmbeddingGenerator — locks down Clone() preservation
  • Contracts/DotNetPreviewSDKAPI.net6.json — regenerated for the new surface

Validation

  • ✅ PREVIEW build (/p:IsPreview=true): 0 errors
  • ✅ Non-PREVIEW build: 0 errors
  • ContractEnforcement + SettingsContractTests + CosmosClientOptionsUnitTests: PREVIEW 220/220 ; non-PREVIEW 221/221
  • ✅ Baseline regenerated with LF line endings preserved

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines:
Successfully started running 1 pipeline(s).

@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from d984737 to 42db6a6 Compare May 6, 2026 23:32
Comment thread Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs Outdated
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client Options need this as well?

@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from 42db6a6 to e257209 Compare May 7, 2026 00:46
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ananth7592 , overall looks good, I have two non-blocking comments:

  1. Thread-safety. When set at the client level, the SDK will call GenerateEmbeddingsAsync concurrently from parallel queries/partition reads on a single
    instance. Please document on the interface that implementations must be safe to invoke concurrently — otherwise customers may wire up stateful implementations and may hit
    race conditions.

  2. No per-request opt-out. QueryRequestOptions.EmbeddingGenerator = null falls through to the client-level default; there's no way to say "use no
    generator for this call." Either document that as intentional or we can add a sentinel (cf. NullEmbeddingGenerator in Microsoft.Extensions.AI).

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinlian12
Copy link
Copy Markdown
Member

@sdkReviewAgent

Comment thread Microsoft.Azure.Cosmos/src/IEmbeddingGenerator.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/IEmbeddingGenerator.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/IEmbeddingGenerator.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/Fluent/CosmosClientBuilder.cs Outdated
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch 2 times, most recently from 40e7ba6 to 4903558 Compare May 18, 2026 19:22
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ananth7592, thanks for the iteration on this!

Two things I wanted to confirm with you before we land this:

  1. Per-request override — In an earlier revision, this PR added QueryRequestOptions.EmbeddingGenerator for per-request scope, and the PR description still references that API. The HEAD diff has moved to client + container scope only, and I don't see a per-request layer anywhere. Could you please confirm that the per-request override is intentionally deferred to a follow-up (e.g., one of #5832#5837), and not lost in the pivot? If so, a one-line callout in the PR description would help future readers understand the V0 scope.

  2. PR description vs HEAD — Relatedly, the description still shows the old surface (IEmbeddingGenerator without the Cosmos prefix, ReadOnlyMemory<double>, property on QueryRequestOptions). Could you please refresh it to match the current HEAD before merge? Since the merge commit body is permanent in git history, it'd be great to have it reflect what actually shipped.

Comment thread Microsoft.Azure.Cosmos/src/ICosmosEmbeddingGenerator.cs
Comment thread Microsoft.Azure.Cosmos/src/CosmosClient.cs
Comment thread Microsoft.Azure.Cosmos/src/CosmosEmbeddingResult.cs
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from 4903558 to 3671f8a Compare May 18, 2026 21:29
@ananth7592 ananth7592 changed the title Query: Adds IEmbeddingGenerator interface and EmbeddingGenerator option (preview) EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and Database.GetContainer(id, generator) per-container binding (preview) May 18, 2026
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch 3 times, most recently from 9ff95ca to 4011f00 Compare May 18, 2026 22:51
kushagraThapar
kushagraThapar previously approved these changes May 18, 2026
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ananth7592 for iterating on this — looks great overall, approving!

One small documentation fix I'd like to land before merge: the latest commit's <remarks> on a few sites pivoted to talk about ContainerProperties.EmbeddingGenerator, but I couldn't find that property anywhere on the current HEAD. The actual per-container override surface is still Database.GetContainer(string, ICosmosEmbeddingGenerator) (and the equivalent CosmosClient.GetContainer(db, c, generator) overload). The xmldoc on CosmosClient.EmbeddingGenerator (CosmosClient.cs L614-615) already references the correct member, so we just need to align the other three sites with that same wording. Pinned them as inline comments below — they're tiny edits.

The rest looks good:

  • <remarks> block now covers lifecycle/disposal, error semantics, cancellation, and concurrency — and concurrency is also called out in <summary>, which nicely picks up @Pilchie's IDE-tooltip note 👍
  • CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstance test covers the new accessor
  • CosmosEmbeddingResult picked up Latency — thanks for that
  • Unrelated using-cleanup in CosmosQueryRequestOptionsUniTests.cs is reverted
  • PR description now matches HEAD
  • Changelog entry added

Approving — please push the three docstring fixes before merging. Thank you!

Comment thread Microsoft.Azure.Cosmos/src/ICosmosEmbeddingGenerator.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/CosmosClientOptions.cs Outdated
Comment thread Microsoft.Azure.Cosmos/src/Fluent/CosmosClientBuilder.cs Outdated
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from 4011f00 to 5bfa71c Compare May 18, 2026 23:41
Comment thread Microsoft.Azure.Cosmos/src/Resource/Database/Database.cs Outdated
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not follow established config (override) pattern - client-level + operation level

@ananth7592 ananth7592 changed the title EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and Database.GetContainer(id, generator) per-container binding (preview) EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and RequestOptions.EmbeddingGenerator per-request override (preview) May 19, 2026
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from 5bfa71c to d2e18d1 Compare May 19, 2026 00:28
@ananth7592 ananth7592 changed the title EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and RequestOptions.EmbeddingGenerator per-request override (preview) EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and QueryRequestOptions.EmbeddingGenerator per-query override (preview) May 19, 2026
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch 2 times, most recently from 948227c to 4748027 Compare May 19, 2026 03:57
@ananth7592 ananth7592 changed the title EmbeddingGenerator: Adds ICosmosEmbeddingGenerator with client-wide default and QueryRequestOptions.EmbeddingGenerator per-query override (preview) EmbeddingGenerator: Adds ICosmosEmbeddingGenerator client-wide configuration (preview) May 19, 2026
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from 4748027 to ef6ccc0 Compare May 19, 2026 04:45
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ananth7592 for the productive iteration — the squash addressed a lot in one go:

  • per-container layer removed (and the associated ContainerProperties / Database.GetContainer(.., ICosmosEmbeddingGenerator) / SettingsContractTests / CosmosQueryRequestOptionsUniTests churn went with it)
  • comprehensive <remarks> block on ICosmosEmbeddingGenerator covering Lifecycle and disposal / Error semantics / Cancellation / Idempotency and concurrency
  • rebased cleanly onto da4efa8c8 (3.60.0)
  • PR description rewritten to match HEAD (no more pointers at the old IEmbeddingGenerator / double / QueryRequestOptions.EmbeddingGenerator shape)
  • two new tests landed (CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstance covering the new accessor across builder-set + options-set paths, and CosmosClientOptions_Clone_PreservesEmbeddingGenerator covering direct-assignment + clone isolation)

That covers 8 of the 10 follow-ups I had after my prior pass. Three small things I'd love your take on before this lands — none are merge blockers from my side, and all three are easier to relax now than after preview ships.


1. Could you please help me understand the V0 thinking on EmbeddingSource consolidation + VectorDataType widening on the interface signature?

Two related observations on Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(IReadOnlyList<string> texts, string endpoint, string deploymentName, int dimensions, CancellationToken cancellationToken = default):

  • Adjacent merged PR #5848 already shipped EmbeddingSource on main with 5 fields (SourcePaths, DeploymentName, ModelName, Endpoint, AuthType). The new method takes 2 of those as flat strings (endpoint, deploymentName) and drops AuthType and ModelName on the floor — so implementations have no signal for how to authenticate, and no way to distinguish a deployment name from a model name when the underlying service cares about the difference. Future fields on EmbeddingSource would also become breaking-shape changes on the interface.

  • On the data-type side, Cosmos supports four VectorDataType values (Float32, Uint8, Int8, Float16). The interface only surfaces Float32 via IReadOnlyList<ReadOnlyMemory<float>> and the xmldoc explicitly says "float32" — a customer whose container is configured for Uint8 (quantized vectors, cost optimization) silently cannot use this feature. This is also @Pilchie's still-unresolved P1 thread upthread.

Both of these are easier to relax in V0 than after preview ships — once shipped, broadening either dimension is a breaking change. A strongly-typed signature like

Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(
    IReadOnlyList<string> texts,
    EmbeddingSource source,
    int dimensions,
    VectorDataType dataType,
    CancellationToken cancellationToken = default);

would let the SDK pass the configured container's EmbeddingSource straight through (no field-by-field unpacking) and let implementations branch on dataType so all four VectorDataType values are reachable. The result type could stay IReadOnlyList<ReadOnlyMemory<float>> for now — or evolve to ReadOnlyMemory<byte>-per-vector once dataType selects the layout.

Could you please share whether this strongly-typed shape is on the table for V0, or whether there's a reason to keep the flat-string + float32-only contract for preview? If "float32-only V0" is the intentional answer, would it work to document that explicitly in <remarks> and link the tracking follow-up for the other three datatypes — that would let me close out the Pilchie thread cleanly.


2. Could you please soften the forward-promising tense in the ICosmosEmbeddingGenerator <summary>?

The first paragraph (ICosmosEmbeddingGenerator.cs:14-15) currently says:

The SDK invokes this when a query plan contains GenerateEmbeddings(...) literals (for example VectorDistance(GenerateEmbeddings("big brown cat"), c.embedding)).

Today, though, no call site exists yet:

$ grep -rn 'GenerateEmbeddingsAsync\b' Microsoft.Azure.Cosmos/src/ --include='*.cs' | grep -v ICosmosEmbeddingGenerator.cs
(no output)

That's intentional — wiring is deferred to a follow-up — but the present-tense <summary> will make customers reading IDE tooltips believe the SDK calls this today. Could you please prepend a one-sentence note to the <remarks> so the IDE-tooltip story stays honest about V0 scope? Something like:

Preview surface; the SDK call site is delivered in a follow-up release. Setting this generator has no effect today.

That keeps the existing forward-promising wording for the eventual GA story but flags the V0 gap clearly for anyone hovering on the type in 3.61.0-preview.0.


3. Housekeeping: could you please leave a short reply on the prior inline threads?

The force-push addressed (or made moot) 6 of the 7 inline comments I left earlier (the lifecycle/disposal <remarks> ask, the CosmosClient.EmbeddingGenerator direct-test ask, the CosmosQueryRequestOptionsUniTests.cs revert, and the three xmldoc-references-to-ContainerProperties.EmbeddingGenerator asks — all now addressed or moot since the per-container layer is gone). The threads on GitHub are still mechanically open, though, because the squash didn't post per-thread replies.

Could you please leave a one-line reply on each (or one summary comment listing fixed / moot status)? Saves the next reviewer pass and keeps the conversation tidy — purely housekeeping, not a merge blocker from my side.


To be explicit about scope: I am intentionally not re-raising @FabianMeiswinkel's CHANGES_REQUESTED ask about operation-level override here — that's a separate design conversation that belongs in a follow-up PR rather than this one. I am also setting aside the smaller minor / nit items from my prior pass ([JsonIgnore] regression-test guard, ClientOptionJsonConverter diagnostics surfacing, #if PREVIEW symmetry, Microsoft.Extensions.AI adapter sample, cross-SDK pioneering coordination, propose-design-doc-extension) — none of those need to land in this PR.

Thank you for the iteration!

@ananth7592
Copy link
Copy Markdown
Member Author

@kushagraThapar
Thanks — both are fair concerns. We chose this shape deliberately.

  1. Could you please help me understand the V0 thinking on EmbeddingSource consolidation + VectorDataType widening on the interface signature?
    On EmbeddingSource consolidation. Two reasons we kept the flat (endpoint, deploymentName, dimensions) shape instead of passing EmbeddingSource:

    1. Auth is provider-owned, not SDK-owned. Customers bootstrap authentication in their ICosmosEmbeddingGenerator constructor — same pattern as OpenAIClient / EmbeddingClient / AzureOpenAIClient. Surfacing AuthType on the call is redundant signal since the credential decision has already been made at construction time, and EmbeddingSource.AuthType is a service-side enum, not a runtime credential.
    2. Interface mirrors the embedding SDKs customers already know. EmbeddingClient.GenerateEmbeddingsAsync(inputs, EmbeddingGenerationOptions { Dimensions }) — a customer can forward (texts, deploymentName, options { Dimensions = dimensions }) directly. ModelName is a service-side hint for the Cosmos write-path embedding generator, not consumed by client-side providers. SourcePaths and AuthType are similarly server-DTO concerns.

If a future EmbeddingSource field genuinely needs to reach client-side providers, that's an additive parameter on the interface — a breaking change inside preview, which is what preview is for. We'd rather take that risk than leak the whole server DTO onto a client contract permanently.

On VectorDataType widening. Float32-only by signature is intentional for V0:

  1. All embedding services produce float32. Azure OpenAI, OpenAI, Cohere, Voyage, Mistral — none expose a "give me int8" knob. Quantization is a Cosmos storage-side concern.
  2. The wire protocol is float32 regardless of stored type. VectorDistance takes float32 query vectors and the service quantizes at write time per VectorEmbeddingPolicy.DataType. Surfacing VectorDataType on the interface would imply the generator can produce non-float32, which the wire doesn't accept anyway.
  3. Customers pre-quantizing locally can still do so inside their generator and return the float32 representation.

To close @Pilchie's thread cleanly, I'll add a paragraph on GenerateEmbeddingsAsync:

Query-time vectors are sent to the Azure Cosmos DB gateway as float32 regardless of the container's stored VectorDataType. Implementations targeting Uint8 / Int8 / Float16 storage should still produce float32 vectors; the service applies quantization at write time. This contract may be revisited if customer-side models that natively produce non-float32 vectors become common.

Direct answer: the strongly-typed (EmbeddingSource, VectorDataType) shape is not on the table for V0 — flat float32 is deliberate, justified by industry convention, provider-owned auth, and the float32-on-the-wire protocol.

@ananth7592
Copy link
Copy Markdown
Member Author

  1. Could you please soften the forward-promising tense in the ICosmosEmbeddingGenerator ?
    Done,

…uration (preview)

Adds public preview surface (gated on #if PREVIEW):
- ICosmosEmbeddingGenerator interface
- CosmosEmbeddingResult (Vectors, TotalTokens, Latency)
- CosmosClientOptions.EmbeddingGenerator
- CosmosClientBuilder.WithEmbeddingGenerator
- CosmosClient.EmbeddingGenerator

Surface-only. Pipeline wiring lands in a follow-up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ananth7592 ananth7592 force-pushed the users/amudumba/embedding-generator-interface branch from ef6ccc0 to c33e69f Compare May 19, 2026 18:13
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ananth7592 ananth7592 enabled auto-merge (squash) May 19, 2026 18:27
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ananth7592 for the thoughtful iteration — the new float32 wire-format <para> (lines 95-104) is a great architectural framing of the multi-datatype concern: clarifying that the SDK boundary stays float32 because the service applies quantization server-side at write time is cleaner than the "document V0 limitation" fallback I'd offered, and it cleanly covers all four VectorDataType storage configurations in one shot. The preview-surface <para> (lines 22-27) also reads exactly right for the IDE tooltip story.

Approving — looks great. Thank you!

@ananth7592 ananth7592 merged commit 695683d into main May 19, 2026
33 checks passed
@ananth7592 ananth7592 deleted the users/amudumba/embedding-generator-interface branch May 19, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants