Skip to content

Conversation

ayanray089
Copy link
Contributor

@ayanray089 ayanray089 commented Oct 9, 2025

Description

This PR adds a comprehensive MongoDB Atlas Memory tool that provides semantic memory management capabilities using MongoDB Atlas as the backend with vector embeddings for semantic search.

Key Features:

  • Semantic Search: Automatic embedding generation using Amazon Bedrock Titan for vector similarity search
  • Memory Management: Store, retrieve, list, get, and delete memory operations
  • Index Management: Automatic vector search index creation with proper configuration
  • Namespace Support: Organize memories by namespace for multi-user scenarios
  • Pagination: Support for paginated results in list and retrieve operations
  • Error Handling: Comprehensive error handling with clear error messages

Implementation Details:

  • Uses MongoDB Atlas Vector Search with $vectorSearch aggregation pipeline
  • Supports Amazon Bedrock Titan v2 embeddings (1024 dimensions, cosine similarity)
  • Includes namespace-based data isolation for multi-tenant scenarios
  • Implements pagination support with next_token (skip/limit pattern)
  • Follows the same action-based interface as elasticsearch_memory (record, retrieve, list, get, delete)

Related Issues

Documentation PR

Type of Change

New Tool

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

  • All 27 unit tests pass successfully

  • Integration tests with real MongoDB Atlas cluster verified functionality

  • Code formatting and linting checks pass

  • No breaking changes to existing functionality

Test Coverage:

  • Complete unit test suite with 27 test functions covering all CRUD operations
  • Mocking of MongoDB client and Bedrock client for isolated testing
  • Error handling scenarios, pagination, namespaces, and metadata handling
  • Integration test with real MongoDB Atlas credentials (kept local)

Checklist

  • I have read the CONTRIBUTING document

  • I have added any necessary tests that prove my fix is effective or my feature works

  • I have updated the documentation accordingly

  • I have added an appropriate example to the documentation to outline the feature

  • My changes generate no new warnings

  • Any dependent changes have been merged and published


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Ayan Ray added 3 commits October 9, 2025 11:51
- Implement mongodb_memory.py following elasticsearch_memory.py patterns
- Add MongoDB Atlas vector search with Amazon Bedrock Titan v2 embeddings
- Support all CRUD operations: record, retrieve, list, get, delete
- Include namespace-based data isolation and pagination
- Add comprehensive unit tests (27 tests) with full coverage
- Update pyproject.toml with pymongo optional dependency
- Graceful error handling for vector index creation
- Production-ready with proper logging and validation
- Implement mongodb_memory.py following elasticsearch_memory.py design patterns
- Add MongoDB Atlas Vector Search with  aggregation pipeline
- Support Amazon Bedrock Titan v2 embeddings (1024 dimensions, cosine similarity)
- Include namespace-based data isolation for multi-tenant scenarios
- Add pagination support with next_token (skip/limit pattern)
- Comprehensive error handling and logging
- Add 27 unit tests covering all CRUD operations and edge cases
- Create detailed documentation following elasticsearch_memory_tool.md pattern
- Update README.md with MongoDB Atlas Memory usage examples
- Add environment variables configuration for MongoDB Atlas
- Add mongodb_memory optional dependency to pyproject.toml
- Integration test file contains sensitive credentials and should remain local only
DEFAULT_VECTOR_INDEX_NAME = "vector_index"


def _ensure_vector_search_index(collection, index_name: str = DEFAULT_VECTOR_INDEX_NAME):
Copy link
Contributor

@JackYPCOnline JackYPCOnline Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each parameter we should specify type annotation

logger.info(f"Created vector search index: {index_name}")

# Wait a moment for index to be ready
import time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not top

# Don't raise exception - allow the tool to work without vector search


def _generate_embedding(bedrock_runtime, text: str, embedding_model: str) -> List[float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and remaining functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also functions shouldn't receive infrastructure dependencies as parameters

except json.JSONDecodeError as e:
raise MongoDBEmbeddingError(f"Invalid JSON response from Bedrock: {str(e)}") from e

embedding = response_body["embedding"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment where bedrock doc mentioned response contains "embedding"?

skip_count = int(next_token) if next_token else 0

# Query for memories in namespace
cursor = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does this naming consist with other memory tool?

# Query for memories in namespace
cursor = (
collection.find(
{"namespace": namespace}, {"memory_id": 1, "content": 1, "timestamp": 1, "metadata": 1, "_id": 0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these numbers are hardcoded?

action="record",
content="User prefers vegetarian pizza with extra cheese",
metadata={"category": "food_preferences", "type": "dietary"},
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cluster_uri can be defined upfront (using class based init) so the agents have no access to possible sensitive information like password of cluster

# Wait a moment for index to be ready
import time

time.sleep(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the time.sleep(2) from here ^^

"index": index_name,
"path": "embedding",
"queryVector": query_embedding,
"numCandidates": max_results * 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason we're gathering x3 results?

)
return {
"status": "success",
"content": [{"text": f"Memories retrieved successfully: {json.dumps(response, default=str)}"}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can place the json under "content": [{"json": ..., }] block instead of serializing into text here ^^

- Implement complete MongoDB Atlas memory tool with vector search capabilities
- Add semantic search using Amazon Bedrock Titan embeddings (1024 dimensions)
- Support full CRUD operations: record, retrieve, list, get, delete
- Add namespace support for multi-user memory isolation
- Include environment variable configuration support
- Add security features including connection string masking
- Implement JSON response format optimization
- Add comprehensive test suite with 27 test cases covering all functionality
- Follow same design principles as existing Elasticsearch memory tool
@ayanray089
Copy link
Contributor Author

@JackYPCOnline , @cagataycali Addressed all the feedback in the latest commit

)
# Create agent with secure tool usage
agent = Agent(tools=[memory_tool.mongodb_memory])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be : mongodb_memory

from strands_tools.mongodb_memory import MongoDBMemoryTool
# RECOMMENDED: Secure class-based approach (credentials hidden from agents)
memory_tool = MongoDBMemoryTool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is different from doc. Please keep all docs consitent.


try:
# Pattern to match mongodb+srv://username:password@host/...
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the top

result = agent.tool.mongodb_memory(
action="record",
content="User prefers vegetarian pizza with extra cheese",
cluster_uri="mongodb+srv://user:[email protected]/?retryWrites=true&w=majority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we don't need ?retryWrites=true&w=majority

## Prerequisites

1. **MongoDB Atlas**: You need a MongoDB Atlas cluster with:
- Connection URI (mongodb+srv format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume user have no knowledge with MongoDB, we should point them where to find this URI.

- Fix documentation examples to use consistent class-based approach
- Remove unnecessary query parameters from connection string examples
- Add comprehensive MongoDB Atlas connection URI guidance
- Add explanatory code comments for numCandidates usage
- Ensure all examples follow the same pattern throughout documentation
- All 27 unit tests continue to pass
- Tested with real MongoDB Atlas credentials successfully
@ayanray089
Copy link
Contributor Author

@JackYPCOnline Addressed the latest feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants