EmbeddingGenerator: Adds ICosmosEmbeddingGenerator client-wide configuration (preview)#5838
Conversation
|
Azure Pipelines: Successfully started running 1 pipeline(s). |
6fbb787 to
d984737
Compare
d984737 to
42db6a6
Compare
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Client Options need this as well?
42db6a6 to
e257209
Compare
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @ananth7592 , overall looks good, I have two non-blocking comments:
-
Thread-safety. When set at the client level, the SDK will call
GenerateEmbeddingsAsyncconcurrently from parallel queries/partition reads on a single
instance. Please document on the interface that implementations must be safe to invoke concurrently — otherwise customers may wire up stateful implementations and may hit
race conditions. -
No per-request opt-out.
QueryRequestOptions.EmbeddingGenerator = nullfalls through to the client-level default; there's no way to say "use no
generator for this call." Either document that as intentional or we can add a sentinel (cf.NullEmbeddingGeneratorinMicrosoft.Extensions.AI).
|
@sdkReviewAgent |
40e7ba6 to
4903558
Compare
kushagraThapar
left a comment
There was a problem hiding this comment.
Hi @ananth7592, thanks for the iteration on this!
Two things I wanted to confirm with you before we land this:
-
Per-request override — In an earlier revision, this PR added
QueryRequestOptions.EmbeddingGeneratorfor per-request scope, and the PR description still references that API. The HEAD diff has moved to client + container scope only, and I don't see a per-request layer anywhere. Could you please confirm that the per-request override is intentionally deferred to a follow-up (e.g., one of #5832–#5837), and not lost in the pivot? If so, a one-line callout in the PR description would help future readers understand the V0 scope. -
PR description vs HEAD — Relatedly, the description still shows the old surface (
IEmbeddingGeneratorwithout theCosmosprefix,ReadOnlyMemory<double>, property onQueryRequestOptions). Could you please refresh it to match the current HEAD before merge? Since the merge commit body is permanent in git history, it'd be great to have it reflect what actually shipped.
4903558 to
3671f8a
Compare
9ff95ca to
4011f00
Compare
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @ananth7592 for iterating on this — looks great overall, approving!
One small documentation fix I'd like to land before merge: the latest commit's <remarks> on a few sites pivoted to talk about ContainerProperties.EmbeddingGenerator, but I couldn't find that property anywhere on the current HEAD. The actual per-container override surface is still Database.GetContainer(string, ICosmosEmbeddingGenerator) (and the equivalent CosmosClient.GetContainer(db, c, generator) overload). The xmldoc on CosmosClient.EmbeddingGenerator (CosmosClient.cs L614-615) already references the correct member, so we just need to align the other three sites with that same wording. Pinned them as inline comments below — they're tiny edits.
The rest looks good:
<remarks>block now covers lifecycle/disposal, error semantics, cancellation, and concurrency — and concurrency is also called out in<summary>, which nicely picks up @Pilchie's IDE-tooltip note 👍CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstancetest covers the new accessorCosmosEmbeddingResultpicked upLatency— thanks for that- Unrelated
using-cleanup inCosmosQueryRequestOptionsUniTests.csis reverted - PR description now matches HEAD
- Changelog entry added
Approving — please push the three docstring fixes before merging. Thank you!
4011f00 to
5bfa71c
Compare
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
Why not follow established config (override) pattern - client-level + operation level
5bfa71c to
d2e18d1
Compare
948227c to
4748027
Compare
4748027 to
ef6ccc0
Compare
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @ananth7592 for the productive iteration — the squash addressed a lot in one go:
- per-container layer removed (and the associated
ContainerProperties/Database.GetContainer(.., ICosmosEmbeddingGenerator)/SettingsContractTests/CosmosQueryRequestOptionsUniTestschurn went with it) - comprehensive
<remarks>block onICosmosEmbeddingGeneratorcovering Lifecycle and disposal / Error semantics / Cancellation / Idempotency and concurrency - rebased cleanly onto
da4efa8c8(3.60.0) - PR description rewritten to match HEAD (no more pointers at the old
IEmbeddingGenerator/double/QueryRequestOptions.EmbeddingGeneratorshape) - two new tests landed (
CosmosClient_EmbeddingGenerator_ReturnsConfiguredInstancecovering the new accessor across builder-set + options-set paths, andCosmosClientOptions_Clone_PreservesEmbeddingGeneratorcovering direct-assignment + clone isolation)
That covers 8 of the 10 follow-ups I had after my prior pass. Three small things I'd love your take on before this lands — none are merge blockers from my side, and all three are easier to relax now than after preview ships.
1. Could you please help me understand the V0 thinking on EmbeddingSource consolidation + VectorDataType widening on the interface signature?
Two related observations on Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(IReadOnlyList<string> texts, string endpoint, string deploymentName, int dimensions, CancellationToken cancellationToken = default):
-
Adjacent merged PR #5848 already shipped
EmbeddingSourceon main with 5 fields (SourcePaths,DeploymentName,ModelName,Endpoint,AuthType). The new method takes 2 of those as flat strings (endpoint,deploymentName) and dropsAuthTypeandModelNameon the floor — so implementations have no signal for how to authenticate, and no way to distinguish a deployment name from a model name when the underlying service cares about the difference. Future fields onEmbeddingSourcewould also become breaking-shape changes on the interface. -
On the data-type side, Cosmos supports four
VectorDataTypevalues (Float32,Uint8,Int8,Float16). The interface only surfacesFloat32viaIReadOnlyList<ReadOnlyMemory<float>>and the xmldoc explicitly says "float32" — a customer whose container is configured forUint8(quantized vectors, cost optimization) silently cannot use this feature. This is also @Pilchie's still-unresolved P1 thread upthread.
Both of these are easier to relax in V0 than after preview ships — once shipped, broadening either dimension is a breaking change. A strongly-typed signature like
Task<CosmosEmbeddingResult> GenerateEmbeddingsAsync(
IReadOnlyList<string> texts,
EmbeddingSource source,
int dimensions,
VectorDataType dataType,
CancellationToken cancellationToken = default);would let the SDK pass the configured container's EmbeddingSource straight through (no field-by-field unpacking) and let implementations branch on dataType so all four VectorDataType values are reachable. The result type could stay IReadOnlyList<ReadOnlyMemory<float>> for now — or evolve to ReadOnlyMemory<byte>-per-vector once dataType selects the layout.
Could you please share whether this strongly-typed shape is on the table for V0, or whether there's a reason to keep the flat-string + float32-only contract for preview? If "float32-only V0" is the intentional answer, would it work to document that explicitly in <remarks> and link the tracking follow-up for the other three datatypes — that would let me close out the Pilchie thread cleanly.
2. Could you please soften the forward-promising tense in the ICosmosEmbeddingGenerator <summary>?
The first paragraph (ICosmosEmbeddingGenerator.cs:14-15) currently says:
The SDK invokes this when a query plan contains
GenerateEmbeddings(...)literals (for exampleVectorDistance(GenerateEmbeddings("big brown cat"), c.embedding)).
Today, though, no call site exists yet:
$ grep -rn 'GenerateEmbeddingsAsync\b' Microsoft.Azure.Cosmos/src/ --include='*.cs' | grep -v ICosmosEmbeddingGenerator.cs
(no output)
That's intentional — wiring is deferred to a follow-up — but the present-tense <summary> will make customers reading IDE tooltips believe the SDK calls this today. Could you please prepend a one-sentence note to the <remarks> so the IDE-tooltip story stays honest about V0 scope? Something like:
Preview surface; the SDK call site is delivered in a follow-up release. Setting this generator has no effect today.
That keeps the existing forward-promising wording for the eventual GA story but flags the V0 gap clearly for anyone hovering on the type in 3.61.0-preview.0.
3. Housekeeping: could you please leave a short reply on the prior inline threads?
The force-push addressed (or made moot) 6 of the 7 inline comments I left earlier (the lifecycle/disposal <remarks> ask, the CosmosClient.EmbeddingGenerator direct-test ask, the CosmosQueryRequestOptionsUniTests.cs revert, and the three xmldoc-references-to-ContainerProperties.EmbeddingGenerator asks — all now addressed or moot since the per-container layer is gone). The threads on GitHub are still mechanically open, though, because the squash didn't post per-thread replies.
Could you please leave a one-line reply on each (or one summary comment listing fixed / moot status)? Saves the next reviewer pass and keeps the conversation tidy — purely housekeeping, not a merge blocker from my side.
To be explicit about scope: I am intentionally not re-raising @FabianMeiswinkel's CHANGES_REQUESTED ask about operation-level override here — that's a separate design conversation that belongs in a follow-up PR rather than this one. I am also setting aside the smaller minor / nit items from my prior pass ([JsonIgnore] regression-test guard, ClientOptionJsonConverter diagnostics surfacing, #if PREVIEW symmetry, Microsoft.Extensions.AI adapter sample, cross-SDK pioneering coordination, propose-design-doc-extension) — none of those need to land in this PR.
Thank you for the iteration!
|
@kushagraThapar
If a future EmbeddingSource field genuinely needs to reach client-side providers, that's an additive parameter on the interface — a breaking change inside preview, which is what preview is for. We'd rather take that risk than leak the whole server DTO onto a client contract permanently. On VectorDataType widening. Float32-only by signature is intentional for V0:
To close @Pilchie's thread cleanly, I'll add a paragraph on GenerateEmbeddingsAsync: Query-time vectors are sent to the Azure Cosmos DB gateway as float32 regardless of the container's stored VectorDataType. Implementations targeting Uint8 / Int8 / Float16 storage should still produce float32 vectors; the service applies quantization at write time. This contract may be revisited if customer-side models that natively produce non-float32 vectors become common. Direct answer: the strongly-typed (EmbeddingSource, VectorDataType) shape is not on the table for V0 — flat float32 is deliberate, justified by industry convention, provider-owned auth, and the float32-on-the-wire protocol. |
|
…uration (preview) Adds public preview surface (gated on #if PREVIEW): - ICosmosEmbeddingGenerator interface - CosmosEmbeddingResult (Vectors, TotalTokens, Latency) - CosmosClientOptions.EmbeddingGenerator - CosmosClientBuilder.WithEmbeddingGenerator - CosmosClient.EmbeddingGenerator Surface-only. Pipeline wiring lands in a follow-up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ef6ccc0 to
c33e69f
Compare
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @ananth7592 for the thoughtful iteration — the new float32 wire-format <para> (lines 95-104) is a great architectural framing of the multi-datatype concern: clarifying that the SDK boundary stays float32 because the service applies quantization server-side at write time is cleaner than the "document V0 limitation" fallback I'd offered, and it cleanly covers all four VectorDataType storage configurations in one shot. The preview-surface <para> (lines 22-27) also reads exactly right for the IDE tooltip story.
Approving — looks great. Thank you!
Adds the public surface only for V0 query-time embedding generation in the .NET SDK (preview). No runtime behavior is wired up yet — pipeline resolver and binding land in a follow-up PR.
Customer-facing API (preview)
Interface and result type
Signature mirrors Python SDK PR #46902.
Client-wide configuration
Or via options:
Files
Source (new)
Microsoft.Azure.Cosmos/src/ICosmosEmbeddingGenerator.csMicrosoft.Azure.Cosmos/src/CosmosEmbeddingResult.csSource (modified)
CosmosClientOptions.cs— addsEmbeddingGenerator([JsonIgnore])Fluent/CosmosClientBuilder.cs— addsWithEmbeddingGenerator(...)CosmosClient.cs— addsEmbeddingGeneratorread-only accessorTests
CosmosClientOptionsUnitTests.cs:VerifyEmbeddingGeneratorBuilderProperties— builder round-trip + null-arg guardCosmosClient_EmbeddingGenerator_ReturnsConfiguredInstance— read-only accessor surfaces builder- and options-set instancesCosmosClientOptions_Clone_PreservesEmbeddingGenerator— locks downClone()preservationContracts/DotNetPreviewSDKAPI.net6.json— regenerated for the new surfaceValidation
/p:IsPreview=true): 0 errorsContractEnforcement+SettingsContractTests+CosmosClientOptionsUnitTests: PREVIEW 220/220 ; non-PREVIEW 221/221