-
Notifications
You must be signed in to change notification settings - Fork 175
Feature/mongodb atlas memory tool #281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/mongodb atlas memory tool #281
Conversation
- Implement mongodb_memory.py following elasticsearch_memory.py patterns - Add MongoDB Atlas vector search with Amazon Bedrock Titan v2 embeddings - Support all CRUD operations: record, retrieve, list, get, delete - Include namespace-based data isolation and pagination - Add comprehensive unit tests (27 tests) with full coverage - Update pyproject.toml with pymongo optional dependency - Graceful error handling for vector index creation - Production-ready with proper logging and validation
- Implement mongodb_memory.py following elasticsearch_memory.py design patterns - Add MongoDB Atlas Vector Search with aggregation pipeline - Support Amazon Bedrock Titan v2 embeddings (1024 dimensions, cosine similarity) - Include namespace-based data isolation for multi-tenant scenarios - Add pagination support with next_token (skip/limit pattern) - Comprehensive error handling and logging - Add 27 unit tests covering all CRUD operations and edge cases - Create detailed documentation following elasticsearch_memory_tool.md pattern - Update README.md with MongoDB Atlas Memory usage examples - Add environment variables configuration for MongoDB Atlas - Add mongodb_memory optional dependency to pyproject.toml
- Integration test file contains sensitive credentials and should remain local only
src/strands_tools/mongodb_memory.py
Outdated
DEFAULT_VECTOR_INDEX_NAME = "vector_index" | ||
|
||
|
||
def _ensure_vector_search_index(collection, index_name: str = DEFAULT_VECTOR_INDEX_NAME): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each parameter we should specify type
annotation
src/strands_tools/mongodb_memory.py
Outdated
logger.info(f"Created vector search index: {index_name}") | ||
|
||
# Wait a moment for index to be ready | ||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not top
src/strands_tools/mongodb_memory.py
Outdated
# Don't raise exception - allow the tool to work without vector search | ||
|
||
|
||
def _generate_embedding(bedrock_runtime, text: str, embedding_model: str) -> List[float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and remaining functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also functions shouldn't receive infrastructure dependencies as parameters
except json.JSONDecodeError as e: | ||
raise MongoDBEmbeddingError(f"Invalid JSON response from Bedrock: {str(e)}") from e | ||
|
||
embedding = response_body["embedding"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comment where bedrock doc mentioned response contains "embedding"?
src/strands_tools/mongodb_memory.py
Outdated
skip_count = int(next_token) if next_token else 0 | ||
|
||
# Query for memories in namespace | ||
cursor = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does this naming consist with other memory tool?
src/strands_tools/mongodb_memory.py
Outdated
# Query for memories in namespace | ||
cursor = ( | ||
collection.find( | ||
{"namespace": namespace}, {"memory_id": 1, "content": 1, "timestamp": 1, "metadata": 1, "_id": 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these numbers are hardcoded?
action="record", | ||
content="User prefers vegetarian pizza with extra cheese", | ||
metadata={"category": "food_preferences", "type": "dietary"}, | ||
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe cluster_uri can be defined upfront (using class based init) so the agents have no access to possible sensitive information like password of cluster
src/strands_tools/mongodb_memory.py
Outdated
# Wait a moment for index to be ready | ||
import time | ||
|
||
time.sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the time.sleep(2)
from here ^^
src/strands_tools/mongodb_memory.py
Outdated
"index": index_name, | ||
"path": "embedding", | ||
"queryVector": query_embedding, | ||
"numCandidates": max_results * 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason we're gathering x3 results?
src/strands_tools/mongodb_memory.py
Outdated
) | ||
return { | ||
"status": "success", | ||
"content": [{"text": f"Memories retrieved successfully: {json.dumps(response, default=str)}"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can place the json under "content": [{"json": ..., }] block instead of serializing into text here ^^
- Implement complete MongoDB Atlas memory tool with vector search capabilities - Add semantic search using Amazon Bedrock Titan embeddings (1024 dimensions) - Support full CRUD operations: record, retrieve, list, get, delete - Add namespace support for multi-user memory isolation - Include environment variable configuration support - Add security features including connection string masking - Implement JSON response format optimization - Add comprehensive test suite with 27 test cases covering all functionality - Follow same design principles as existing Elasticsearch memory tool
@JackYPCOnline , @cagataycali Addressed all the feedback in the latest commit |
) | ||
# Create agent with secure tool usage | ||
agent = Agent(tools=[memory_tool.mongodb_memory]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be : mongodb_memory
from strands_tools.mongodb_memory import MongoDBMemoryTool | ||
# RECOMMENDED: Secure class-based approach (credentials hidden from agents) | ||
memory_tool = MongoDBMemoryTool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is different from doc. Please keep all docs consitent.
src/strands_tools/mongodb_memory.py
Outdated
|
||
try: | ||
# Pattern to match mongodb+srv://username:password@host/... | ||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to the top
docs/mongodb_memory_tool.md
Outdated
result = agent.tool.mongodb_memory( | ||
action="record", | ||
content="User prefers vegetarian pizza with extra cheese", | ||
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we don't need ?retryWrites=true&w=majority
docs/mongodb_memory_tool.md
Outdated
## Prerequisites | ||
|
||
1. **MongoDB Atlas**: You need a MongoDB Atlas cluster with: | ||
- Connection URI (mongodb+srv format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume user have no knowledge with MongoDB, we should point them where to find this URI.
- Fix documentation examples to use consistent class-based approach - Remove unnecessary query parameters from connection string examples - Add comprehensive MongoDB Atlas connection URI guidance - Add explanatory code comments for numCandidates usage - Ensure all examples follow the same pattern throughout documentation - All 27 unit tests continue to pass - Tested with real MongoDB Atlas credentials successfully
@JackYPCOnline Addressed the latest feedback |
Description
This PR adds a comprehensive MongoDB Atlas Memory tool that provides semantic memory management capabilities using MongoDB Atlas as the backend with vector embeddings for semantic search.
Key Features:
Implementation Details:
$vectorSearch
aggregation pipelineRelated Issues
Documentation PR
Type of Change
New Tool
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
I ran
hatch run prepare
All 27 unit tests pass successfully
Integration tests with real MongoDB Atlas cluster verified functionality
Code formatting and linting checks pass
No breaking changes to existing functionality
Test Coverage:
Checklist
I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature
My changes generate no new warnings
Any dependent changes have been merged and published
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.