Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retrieval function #410

Closed
wants to merge 81 commits into from
Closed

Conversation

Wendong-Fan
Copy link
Member

@Wendong-Fan Wendong-Fan commented Dec 11, 2023

Description

add retrieval function, this function is based on unstructured io, embedding, vector storage modules.

Motivation and Context

the input would be the query, retrieved information path and vector storage path, the output would be the retrieved string.
this function will be able to call both local and cloud vector storage.

close #411

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

Summary by CodeRabbit

  • New Features
    • Introduced new retriever classes for handling various types of data retrieval, including a base class, vector retriever, and an automatic retriever.
    • Added a comprehensive guide on using the Retrieve Module, covering customized and automated ways to utilize it.
    • Enhanced utility module with new functions for lazy imports and facilitating structured role-playing scenarios.
  • Bug Fixes
    • Corrected typos in import statements across several modules, ensuring correct module usage.
  • Documentation
    • Added a new notebook guide for Retrieve Module usage, including setup and integration examples.
  • Refactor
    • Refactored role-playing functionality into a separate function for improved modularity.
  • Tests
    • Added new test cases for the AutoRetriever and VectorRetriever classes, ensuring their functionality with vector storage and retrieval processes.

@Wendong-Fan Wendong-Fan requested a review from FUYICC January 29, 2024 13:59
@dosubot dosubot bot added the lgtm label Jan 30, 2024
Copy link

coderabbitai bot commented Feb 22, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The recent updates introduce a robust framework for data retrieval, focusing on correcting minor errors and significantly enhancing functionality. Key features include the introduction of various retrievers, such as vector and auto retrievers, alongside improvements in unstructured I/O operations and utility functions. These changes collectively aim to streamline the retrieval of information, whether from local or cloud storage, based on query inputs and vector storage paths.

Changes

File Path Change Summary
camel/functions/... Corrected a typo in the import statement for a module related to unstructured I/O functions.
camel/retrievers/__init__.py Introduced retriever classes for handling different types of data retrieval.
camel/retrievers/... Added AutoRetriever, BaseRetriever, and VectorRetriever classes with specific functionalities.
camel/utils/__init__.py Added role_playing_with_function to the exported entities.
camel/utils/commons.py Introduced new functions for lazy imports and facilitating structured role-playing scenarios.
docs/... and examples/... Provided guides and examples on utilizing the new retrieval functionalities.
test/retrievers/... Introduced tests for AutoRetriever and VectorRetriever classes.

Assessment against linked issues

Objective Addressed Explanation
Add retrieval function for agent to use [#411]

Poem

🐇📚✨
In the land of code and query,
A rabbit hopped, its vision clear,
It sought the data, far and near,
With new tools, no task too dreary.
Through vectors and texts, it leaped,
Retrieving what was deeply sown,
'Till answers were gracefully shown.
🌟📖🐾

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9c0f598 and c3b4ae9.
Files ignored due to path filters (2)
  • poetry.lock is excluded by: !**/*.lock
  • pyproject.toml is excluded by: !**/*.toml
Files selected for processing (13)
  • camel/functions/init.py (1 hunks)
  • camel/retrievers/init.py (1 hunks)
  • camel/retrievers/auto_retriever.py (1 hunks)
  • camel/retrievers/base.py (1 hunks)
  • camel/retrievers/vector_retriever.py (1 hunks)
  • camel/utils/init.py (2 hunks)
  • camel/utils/commons.py (2 hunks)
  • docs/retrieval_augmented_generation/rag_cookbook.ipynb (1 hunks)
  • examples/function_call/role_playing_with_function.py (1 hunks)
  • examples/io/unstructured_modules_example.py (1 hunks)
  • test/functions/test_unstructured_io_functions.py (1 hunks)
  • test/retrievers/test_auto_retriever.py (1 hunks)
  • test/retrievers/test_vector_retriever.py (1 hunks)
Files skipped from review due to trivial changes (3)
  • camel/retrievers/init.py
  • examples/function_call/role_playing_with_function.py
  • test/functions/test_unstructured_io_functions.py
Additional comments: 16
camel/functions/__init__.py (1)
  • 23-23: The addition of UnstructuredModules to the __all__ list correctly exposes it for public use, aligning with the PR's objectives to enhance unstructured I/O functionalities.
camel/utils/__init__.py (1)
  • 49-49: The addition of role_playing_with_function to the __all__ list correctly exposes it for public use, aligning with the PR's objectives to introduce new utility functions.
camel/retrievers/base.py (1)
  • 23-68: The BaseRetriever class and its abstract methods process_and_store and query_and_compile_results are correctly defined and documented, providing a solid foundation for the retrieval functionality. The use of abstract methods enforces implementation in subclasses, aligning with the PR's objectives.
test/retrievers/test_vector_retriever.py (1)
  • 42-93: The tests for the VectorRetriever class are well-structured and comprehensive, covering initialization, process_and_store, and query_and_compile_results methods. The use of fixtures and mocking ensures that the tests are isolated and focused on the class's behavior.
test/retrievers/test_auto_retriever.py (1)
  • 43-92: The tests for the AutoRetriever class are comprehensive, covering initialization, handling file modified dates, and the retrieval process. The structured approach and use of mocking ensure focused testing on the class's behavior.
docs/retrieval_augmented_generation/rag_cookbook.ipynb (11)
  • 35-43: The code segment for loading the CAMEL paper has been updated to use Python's requests library and handle directory creation in a way that's compatible with both Unix-like and Windows systems. This change addresses the previous comment regarding compatibility issues on Windows.
  • 88-90: The instantiation of OpenAIEmbedding is straightforward and correct. However, ensure that the OPENAI_API_KEY is securely set as mentioned in the previous comment.
  • 107-113: The setup for QdrantStorage is correctly implemented, with appropriate parameters passed to the constructor. This code segment effectively demonstrates how to configure vector storage for use with the retrieval system.
  • 130-132: The instantiation of VectorRetriever with the embedding_model parameter correctly demonstrates how to set up a retriever instance using the previously configured embedding model. This is a key step in setting up the retrieval pipeline.
  • 158-161: The process_and_store method call on vector_retriever demonstrates how to process and store the content of the CAMEL paper in vector storage. This is an essential step for enabling subsequent retrieval operations. However, the warning message about IProgress not found suggests that there might be compatibility issues with the progress bar in certain environments. It's a minor issue but worth noting for environments where this might cause confusion.
  • 192-195: The query_and_compile_results method demonstrates querying the vector storage and compiling the results based on a given query. The example provided is clear and shows how to retrieve information relevant to the query. This is a practical demonstration of the retrieval functionality.
  • 220-224: This code segment demonstrates handling an irrelevant query by using the query_and_compile_results method. It effectively shows how the system responds when no suitable information is retrieved, which is important for understanding the system's behavior in various scenarios.
  • 266-273: The use of AutoRetriever with default settings and the demonstration of its capability to handle both local paths and remote URLs is well-implemented. This segment effectively showcases the flexibility and ease of use of the AutoRetriever class.
  • 315-341: The single_agent function demonstrates combining AutoRetriever with a ChatAgent to respond to queries based on retrieved information. This is a creative use of the retrieval and agent functionalities to simulate a conversational agent. The implementation is clear and serves as a good example of how to integrate different components of the system.
  • 373-396: The local_retriever function, intended for role-playing scenarios, is well-documented and demonstrates the use of AutoRetriever in a function that can be called by a language model. The docstring provides clear instructions on the function's purpose and usage, which is crucial for its integration into role-playing scenarios.
  • 505-513: The role_playing_with_function call demonstrates an advanced use case of combining retrieval functionality with role-playing scenarios. This segment effectively shows how to set up a task prompt and use a list of functions, including the previously defined local_retriever, to facilitate interactive role-playing sessions.

camel/utils/commons.py Show resolved Hide resolved
camel/utils/commons.py Show resolved Hide resolved
camel/utils/commons.py Show resolved Hide resolved
@Appointat
Copy link
Member

Hi @Wendong-Fan, if you are breaking down the huge PR, do you think it's necessary to convert it to draft status?

@Wendong-Fan Wendong-Fan marked this pull request as draft March 23, 2024 07:00
@Wendong-Fan
Copy link
Member Author

close this PR since it's already been splitted into small PRs

@Wendong-Fan Wendong-Fan closed this Apr 6, 2024
@Wendong-Fan Wendong-Fan deleted the feature/retrieval_function branch May 4, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: Merged or Closed
Development

Successfully merging this pull request may close these issues.

[Feature Request] add retrieval function for agent to use
5 participants