Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB Atlas VectorDB [clean] #2996

Closed
wants to merge 64 commits into from
Closed

MongoDB Atlas VectorDB [clean] #2996

wants to merge 64 commits into from

Conversation

ranfysvalle02
Copy link
Contributor

Why are these changes needed?

MongoDB has been ranked as the best vector database(https://www.mongodb.com/blog/post/atlas-vector-search-commands-highest-developer-nps-retool-state-ai-2023-survey) in the Retool AI report, so it is quite important to add MongoDB vector search as an option for Autogen RAG.

You can easily start the MongoDB vector search on a free tier M0 MongoDB Atlas cluster. Free tier cluster provides the full functionality of the MongoDB vector search. https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/

But why is MongoDB such a standout? Well, there are a few key reasons.

MongoDB Atlas integrates smoothly with existing databases. For organizations already using MongoDB, this means a seamless expansion into the vector storage—no major system overhauls required!
MongoDB Atlas is built to handle operational heavy-lifting. It excels when serving large-scale, mission-critical applications, offering robustness and reliability where it counts.
MongoDB's flexibility in handling a variety of data types and structures makes it perfectly suited to the complexity of vector embeddings.

As such, implementing MongoDB as a Retrieval Agent can unlock new potential in your AI applications, bringing the full power of vector storage to bear.

Related issue number: 711

Closes #711

Checks

ranfysvalle02 and others added 30 commits June 11, 2024 22:23
First steps towards MongoDB as a VectorDB.
update PREDEFINED_VECTOR_DB and change name to MongoDBAtlasVectorDB; upsert=True update logic; no more index/collection name check.
with MongoDB Atlas Vector Search indexes, things work a little differently than traditional MongoDB indexes. Atlas Search indexes are separate entities managed by the Atlas Search service. Deleting a collection doesn't automatically remove the associated Atlas Search index - leading to errors
@ranfysvalle02
Copy link
Contributor Author

@Hk669 @thinkall made a fresh pull request, with cleaner commit history. I did a lot of "learning" on that last pull request :)

I think we are pretty close to getting MongoDB into Autogen

autogen/agentchat/contrib/vectordb/mongodb.py Outdated Show resolved Hide resolved
test/agentchat/contrib/vectordb/test_mongodb.py Outdated Show resolved Hide resolved
test/agentchat/contrib/vectordb/test_mongodb.py Outdated Show resolved Hide resolved
@thinkall
Copy link
Collaborator

thinkall commented Jun 22, 2024

Test is still skipped:

https://github.com/microsoft/autogen/actions/runs/9621000866/job/26540852406?pr=2996#step:11:26

Need to update contrib-tests.yml

  • name: Install mongodb
    run: |
    pip install -e .[retrievechat-mongodb]

@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2024

Codecov Report

Attention: Patch coverage is 0.92593% with 107 lines in your changes missing coverage. Please review.

Project coverage is 26.01%. Comparing base (89c2f20) to head (5f89f21).
Report is 3 commits behind head on main.

Files Patch % Lines
autogen/agentchat/contrib/vectordb/mongodb.py 0.00% 104 Missing ⚠️
autogen/agentchat/contrib/vectordb/base.py 25.00% 3 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (89c2f20) and HEAD (5f89f21). Click for more details.

HEAD has 27 uploads more than BASE | Flag | BASE (89c2f20) | HEAD (5f89f21) | |------|------|------| |unittests|1|28|
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2996      +/-   ##
==========================================
- Coverage   32.49%   26.01%   -6.49%     
==========================================
  Files          93      100       +7     
  Lines       10097    10299     +202     
  Branches     2167     2356     +189     
==========================================
- Hits         3281     2679     -602     
- Misses       6532     7318     +786     
- Partials      284      302      +18     
Flag Coverage Δ
unittest 12.28% <0.00%> (?)
unittests 25.21% <0.92%> (-7.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

gitguardian bot commented Jun 22, 2024

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- MongoDB Credentials 54655e8 notebook/agentchat_mongodb_RetrieveChat.ipynb View secret
- MongoDB Credentials 3122301 notebook/agentchat_mongodb_RetrieveChat.ipynb View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@ranfysvalle02
Copy link
Contributor Author

@thinkall - I think there is something going on with testing retrieval?

test/test_retrieve_utils.py ............s.                               [ 56%]
test/agentchat/contrib/retrievechat/test_pgvector_retrievechat.py s      [ 60%]
test/agentchat/contrib/retrievechat/test_qdrant_retrievechat.py s..      [ 72%]
test/agentchat/contrib/retrievechat/test_retrievechat.py s.              [ 80%]
test/agentchat/contrib/vectordb/test_mongodb.py s                        [ 88%]



---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml


======================== 20 passed, 5 skipped in 52.41s ========================

@ranfysvalle02
Copy link
Contributor Author

I polluted this PR :( sorry -- lets try this one last time

@thinkall
Copy link
Collaborator

I polluted this PR :( sorry -- lets try this one last time

There is no need to worry about the commit history. Make a new PR will lost the track history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants