Skip to content

[CLOUDP-367240] Add support of auto embeddings for vector search using mongot config#680

Merged
viveksinghggits merged 10 commits intomasterfrom
auto-embeddings-vector-search
Jan 10, 2026
Merged

[CLOUDP-367240] Add support of auto embeddings for vector search using mongot config#680
viveksinghggits merged 10 commits intomasterfrom
auto-embeddings-vector-search

Conversation

@viveksinghggits
Copy link
Collaborator

@viveksinghggits viveksinghggits commented Jan 6, 2026

Summary

This PR adds the support for auto embeddings in case of vector search in MCK. Without auto embeddings if a customer has to use vector search, they will have to generate vector embeddings for their data manually (themselves) and after the vector embeddings are generated, they can use the vector search.
With the help of auto embeddings, if customers opt in, they won't have to generate the vector embeddings by themselves. If configured properly, mongot will do it for the customers when the vector search index is created. mongot looks for specific fields (embedding) in it's config to decide it has to do auto embedding or not.

This PR adds support to enable users to configure the mongot config so that mongot can do auto embedding, and to do that we are exposing the autoEmbedding field in the MongoDBSearch CR. Using this field customers will be able to configure embedding field of mongot config.

k explain mongodbsearch.spec.autoEmbedding
GROUP:      mongodb.com
KIND:       MongoDBSearch
VERSION:    v1

FIELD: autoEmbedding <Object>


DESCRIPTION:
    Configure MongoDB Search's automatic generation of vector embeddings using
    an embedding model service.
    `embedding` field of mongot config is generated using the values provided
    here.

FIELDS:
  embeddingModelAPIKeySecret    <Object> -required-
    EmbeddingModelAPIKeySecret would have the name of the secret that has two
    keys
    query-key and indexing-key for embedding model's API keys.

  providerEndpoint      <string>
    <no description>


k explain mongodbsearch.spec.autoEmbedding.embeddingModelAPIKeySecret
GROUP:      mongodb.com
KIND:       MongoDBSearch
VERSION:    v1

FIELD: embeddingModelAPIKeySecret <Object>


DESCRIPTION:
    EmbeddingModelAPIKeySecret would have the name of the secret that has two
    keys
    query-key and indexing-key for embedding model's API keys.

FIELDS:
  name  <string>
    Name of the referent.
    This field is effectively required, but due to backwards compatibility is
    allowed to be empty. Instances of this type with an empty value here are
    almost certainly wrong.
    More info:
    https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

Should we make EmbeddingModelAPIKeySecret mandatory in CR if the ProviderEndpoint is specified?

Proof of Work

Install the locally built version of search using locally run MCK operator, make sure MongoDBSearch resource is in running state. And create the search resource using below manifest

kubectl apply --context "${K8S_CTX}" -n "${MDB_NS}" -f - <<EOF
apiVersion: mongodb.com/v1
kind: MongoDBSearch
metadata:
  name: ${MDB_RESOURCE_NAME}
spec:
  # no need to specify source.mongodbResourceRef if MongoDBSearch CR has the same name as MongoDB CR
  # the operator infer it automatically
  resourceRequirements:
    limits:
      cpu: "3"
      memory: 5Gi
    requests:
      cpu: "2"
      memory: 3Gi
  autoEmbedding:
    providerEndpoint: https://api.voyageai.com/v1/embeddings
    embeddingModelAPIKeySecret: voyage-api-keys
EOF

and then go ahead and create the vector search index

use sample_mflix;

db.movies.createSearchIndex("vector_index", "vectorSearch",
      { "fields": [ {
        "type": "autoEmbed",
        "path": "plot",
        "modality": "text",
        "model": "voyage-3.5-lite"
      } ] });

query the data to make sure search is working

db.movies.aggregate([
   {
     "$vectorSearch": {
       "index": "vector_index",
       "path": "plot",
       "query": "spy crime",
       "numCandidates": 150,
       "limit": 10,
       "quantization": "scalar"
     }
   },
   {
     "$project": {
       "_id": 0,
       "plot": 1,
       "title": 1,
       "score": { "$meta": "vectorSearchScore" }
     }
   }
 ]);

E2E test will be part of another PR.

go test  -run ^TestEnsureEmbeddingConfig -v
=== RUN   TestEnsureEmbeddingConfig_APIKeySecretAndProviderEndpont
--- PASS: TestEnsureEmbeddingConfig_APIKeySecretAndProviderEndpont (0.00s)
=== RUN   TestEnsureEmbeddingConfig_WOAutoEmbedding
--- PASS: TestEnsureEmbeddingConfig_WOAutoEmbedding (0.00s)
=== RUN   TestEnsureEmbeddingConfig_JustAPIKeys
--- PASS: TestEnsureEmbeddingConfig_JustAPIKeys (0.00s)
PASS
ok      github.com/mongodb/mongodb-kubernetes/controllers/searchcontroller      0.704s

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.2 Release Notes

Bug Fixes

  • Fixed an issue where monitoring agents would fail after disabling TLS on a MongoDB deployment.
  • Persistent Volume Claim resize fix: Fixed an issue where the Operator ignored namespaces when listing PVCs, causing conflicts with resizing PVCs of the same name. Now, PVCs are filtered by both name and namespace for accurate resizing.
  • Fixed a panic that occurred when the domain names for a horizon was empty. Now, if the domain names are not valid (RFC 1123), the validation will fail before reconciling.
  • Fixed an issue where the Operator could crash when TLS certificates are configured using the certificatesSecretsPrefix field without additional TLS settings.

@viveksinghggits viveksinghggits force-pushed the auto-embeddings-vector-search branch from a094f57 to 1e00443 Compare January 7, 2026 18:57
@viveksinghggits viveksinghggits changed the title Enable auto embeddings for vector search using mongot config [CLOUDP-367240] Enable auto embeddings for vector search using mongot config Jan 7, 2026
@viveksinghggits viveksinghggits marked this pull request as ready for review January 7, 2026 21:51
@viveksinghggits viveksinghggits requested a review from a team as a code owner January 7, 2026 21:51
@viveksinghggits viveksinghggits changed the title [CLOUDP-367240] Enable auto embeddings for vector search using mongot config [CLOUDP-367240] Add support of auto embeddings for vector search using mongot config Jan 7, 2026
1. Make sure the reconciliation happens for search resource if the data of the secret that has api keys is changed
2. Validate the api key secret is present before reconciliation
Copy link
Contributor

@fealebenpae fealebenpae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, thanks for your hard work!

@anandsyncs
Copy link
Contributor

LGTM, please fix the linting issues.

1. Imrpove test to make sure even zero values are not present in mongot config if autoEmbedding is not provided in CR
@viveksinghggits viveksinghggits merged commit 5a087d6 into master Jan 10, 2026
34 checks passed
@viveksinghggits viveksinghggits deleted the auto-embeddings-vector-search branch January 10, 2026 12:09
viveksinghggits added a commit that referenced this pull request Jan 16, 2026
# Summary

Adds release note for the support for vector search auto embeddings PR
#680

## Proof of Work

NA

## Checklist

- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you added changelog file?
    - use `skip-changelog` label if not needed
- refer to [Changelog files and Release
Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes)
section in CONTRIBUTING.md for more details
viveksinghggits added a commit that referenced this pull request Jan 19, 2026
…arch (#701)

# Summary

As part of this PR
#680 we added support
for auto embedding for vector search. In this PR we are adding code
snippets for the docs.
These code snippets, update the MongDBSearch resourc with
`autoEmbedding` and then create vector search index of type `autoEmbed`
and then run a query using the `autoEmbed` vector search index.

## Proof of Work


https://spruce.mongodb.com/version/69669fa728f4aa0007937bec/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC


https://spruce.mongodb.com/version/6968c948a64269000735da96/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

## Checklist

- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you added changelog file?
    - use `skip-changelog` label if not needed
- refer to [Changelog files and Release
Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes)
section in CONTRIBUTING.md for more details
lsierant pushed a commit that referenced this pull request Jan 23, 2026
# Summary

Adds release note for the support for vector search auto embeddings PR
#680

## Proof of Work

NA

## Checklist

- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you added changelog file?
    - use `skip-changelog` label if not needed
- refer to [Changelog files and Release
Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes)
section in CONTRIBUTING.md for more details
lsierant pushed a commit that referenced this pull request Jan 23, 2026
…arch (#701)

# Summary

As part of this PR
#680 we added support
for auto embedding for vector search. In this PR we are adding code
snippets for the docs.
These code snippets, update the MongDBSearch resourc with
`autoEmbedding` and then create vector search index of type `autoEmbed`
and then run a query using the `autoEmbed` vector search index.

## Proof of Work


https://spruce.mongodb.com/version/69669fa728f4aa0007937bec/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC


https://spruce.mongodb.com/version/6968c948a64269000735da96/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

## Checklist

- [x] Have you linked a jira ticket and/or is the ticket in the title?
- [x] Have you checked whether your jira ticket required DOCSP changes?
- [x] Have you added changelog file?
    - use `skip-changelog` label if not needed
- refer to [Changelog files and Release
Notes](https://github.com/mongodb/mongodb-kubernetes/blob/master/CONTRIBUTING.md#changelog-files-and-release-notes)
section in CONTRIBUTING.md for more details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments