Skip to content

Conversation

@kmruiz
Copy link
Collaborator

@kmruiz kmruiz commented Oct 16, 2025

Proposed changes

This PR introduces the EmbeddingsProvider, a new component responsible for generating embeddings by accessing external APIs like VoyageAI. This provider is designed for exclusive use within the VectorSearchEmbeddingsManager. While it currently only supports Voyage models, it is structured to make adding new models straightforward.

It exposes contracts as Zod schemas, which can be used within the tool schema. This approach creates a single source of truth for our contracts that is also useful for agents. Quantization is not yet supported, but users can provide their required data types. By default, the provider uses a float data type and none for quantization.

Checklist

@kmruiz kmruiz marked this pull request as ready for review October 21, 2025 13:42
@kmruiz kmruiz requested a review from a team as a code owner October 21, 2025 13:42
Copilot AI review requested due to automatic review settings October 21, 2025 13:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the EmbeddingsProvider component to generate embeddings using external APIs (initially VoyageAI) for use in vector search queries. It enables the aggregate tool to automatically generate embeddings from raw string queries when performing $vectorSearch operations, with the embedding parameters specified in the query stage.

Key changes:

  • Added EmbeddingsProvider with Voyage model support for generating embeddings
  • Modified aggregate tool to detect and process $vectorSearch stages with string query vectors
  • Added comprehensive integration tests for vector search with different data types and quantization methods

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/common/search/embeddingsProvider.ts New provider implementing embeddings generation using VoyageAI API
src/common/search/vectorSearchEmbeddingsManager.ts Added generateEmbeddings method to manage embedding generation workflow
src/tools/mongodb/read/aggregate.ts Enhanced to support $vectorSearch stages and automatic embedding generation
src/common/errors.ts Added new error codes for embeddings-related failures
tests/integration/tools/mongodb/read/aggregate.test.ts Added comprehensive integration tests for vector search functionality
tests/integration/tools/mongodb/read/vyai/embeddings.ts Test fixture containing pre-generated embeddings in different formats
tests/accuracy/aggregate.test.ts Added accuracy tests for vector search tool calls
package.json Moved ai package to dependencies and added voyage-ai-provider
.github/workflows/code-health.yml Added VoyageAI API key environment variable for CI tests

We are using Zod to define the types so we can use the schemas
later for tool contracts

chore: minor renaming

chore: Only destruct what we actually need

chore: add voyage provider to the package.json

chore: draft integration of embeddings with the aggregate tool

chore: fix style issues and typings

chore: add accuracy test

chore: add an accuracy test where the index name is provided by the user

chore: fix metadata

chore: add some integration tests with voyage AI

chore: tests for basic quantization in the search itself

chore: fix yaml

chore: style check fixes

chore: fix issue with the embedding transformation

chore: simplify integration with embeddings and make it more configurable

chore: fix accuracy tests and add defaults

Update tests/integration/tools/mongodb/read/aggregate.test.ts

Co-authored-by: Copilot <[email protected]>

Update tests/integration/tools/mongodb/read/aggregate.test.ts

Co-authored-by: Copilot <[email protected]>

Update src/tools/mongodb/read/aggregate.ts

Co-authored-by: Copilot <[email protected]>

chore: improvements on documentation
@github-actions
Copy link
Contributor

📊 Accuracy Test Results

📈 Summary

Metric Value
Commit SHA a64abb3c2797882b4bc6fe042d4d84880d0d8e12
Run ID 87a03477-acd1-4db8-aef8-ba9332bb40ac
Status done
Total Prompts Evaluated 79
Models Tested 1
Average Accuracy 93.8%
Responses with 0% Accuracy 4
Responses with 75% Accuracy 4
Responses with 100% Accuracy 73

📊 Baseline Comparison

Metric Value
Baseline Commit 8a5da23269267523b6196ed85a42f57713451c3f
Baseline Run ID 7889df8a-d68c-4900-82cc-0f0acc92873b
Baseline Run Status done
Responses Improved 4
Responses Regressed 6

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 10/21/2025, 2:26:05 PM

@mongodb-js mongodb-js deleted a comment from github-actions bot Oct 22, 2025
@mongodb-js mongodb-js deleted a comment from github-actions bot Oct 22, 2025
@coveralls
Copy link
Collaborator

coveralls commented Oct 22, 2025

Pull Request Test Coverage Report for Build 18710860243

Details

  • 97 of 182 (53.3%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 81.332%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/common/search/embeddingsProvider.ts 22 48 45.83%
src/tools/mongodb/read/aggregate.ts 69 96 71.88%
src/common/search/vectorSearchEmbeddingsManager.ts 3 35 8.57%
Totals Coverage Status
Change from base Build 18709444254: -0.6%
Covered Lines: 6289
Relevant Lines: 7553

💛 - Coveralls

@kmruiz kmruiz enabled auto-merge (squash) October 22, 2025 13:00
@kmruiz kmruiz merged commit d189734 into main Oct 22, 2025
17 checks passed
@kmruiz kmruiz deleted the chore/mcp-245 branch October 22, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants