-
Couldn't load subscription status.
- Fork 149
chore: When querying with vectorSearch use the generated embeddings MCP-245 #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the EmbeddingsProvider component to generate embeddings using external APIs (initially VoyageAI) for use in vector search queries. It enables the aggregate tool to automatically generate embeddings from raw string queries when performing $vectorSearch operations, with the embedding parameters specified in the query stage.
Key changes:
- Added
EmbeddingsProviderwith Voyage model support for generating embeddings - Modified
aggregatetool to detect and process$vectorSearchstages with string query vectors - Added comprehensive integration tests for vector search with different data types and quantization methods
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/common/search/embeddingsProvider.ts |
New provider implementing embeddings generation using VoyageAI API |
src/common/search/vectorSearchEmbeddingsManager.ts |
Added generateEmbeddings method to manage embedding generation workflow |
src/tools/mongodb/read/aggregate.ts |
Enhanced to support $vectorSearch stages and automatic embedding generation |
src/common/errors.ts |
Added new error codes for embeddings-related failures |
tests/integration/tools/mongodb/read/aggregate.test.ts |
Added comprehensive integration tests for vector search functionality |
tests/integration/tools/mongodb/read/vyai/embeddings.ts |
Test fixture containing pre-generated embeddings in different formats |
tests/accuracy/aggregate.test.ts |
Added accuracy tests for vector search tool calls |
package.json |
Moved ai package to dependencies and added voyage-ai-provider |
.github/workflows/code-health.yml |
Added VoyageAI API key environment variable for CI tests |
We are using Zod to define the types so we can use the schemas later for tool contracts chore: minor renaming chore: Only destruct what we actually need chore: add voyage provider to the package.json chore: draft integration of embeddings with the aggregate tool chore: fix style issues and typings chore: add accuracy test chore: add an accuracy test where the index name is provided by the user chore: fix metadata chore: add some integration tests with voyage AI chore: tests for basic quantization in the search itself chore: fix yaml chore: style check fixes chore: fix issue with the embedding transformation chore: simplify integration with embeddings and make it more configurable chore: fix accuracy tests and add defaults Update tests/integration/tools/mongodb/read/aggregate.test.ts Co-authored-by: Copilot <[email protected]> Update tests/integration/tools/mongodb/read/aggregate.test.ts Co-authored-by: Copilot <[email protected]> Update src/tools/mongodb/read/aggregate.ts Co-authored-by: Copilot <[email protected]> chore: improvements on documentation
📊 Accuracy Test Results📈 Summary
📊 Baseline Comparison
📎 Download Full HTML Report - Look for the Report generated on: 10/21/2025, 2:26:05 PM |
Pull Request Test Coverage Report for Build 18710860243Details
💛 - Coveralls |
Proposed changes
This PR introduces the
EmbeddingsProvider, a new component responsible for generating embeddings by accessing external APIs like VoyageAI. This provider is designed for exclusive use within theVectorSearchEmbeddingsManager. While it currently only supports Voyage models, it is structured to make adding new models straightforward.It exposes contracts as Zod schemas, which can be used within the tool schema. This approach creates a single source of truth for our contracts that is also useful for agents. Quantization is not yet supported, but users can provide their required data types. By default, the provider uses a float data type and none for quantization.
Checklist