-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GitLoader Component with advanced filtering options #2850
feat: Add GitLoader Component with advanced filtering options #2850
Conversation
This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options. GitLoader Component: - Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module. - Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria. Examples of `file_filter` usage: - Include only .py files: `lambda file_path: file_path.endswith('.py')` - Exclude .py files: `lambda file_path: not file_path.endswith('.py')` This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities. Features: - Support for loading documents from Git repositories. - Advanced file filtering options to include or exclude specific files.
Pull Request Validation ReportThis comment is automatically generated by Conventional PR Whitelist Report
Result Pull request does not satisfy any enabled whitelist criteria. Pull request will be validated. Validation Report
Result Pull request satisfies all enabled pull request rules. Last Modified at 20 Jul 24 01:00 UTC |
This pull request is automatically being deployed by Amplify Hosting (learn more). |
This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options. GitLoader Component: - Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module. - Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria. Examples of `file_filter` usage: - Include only .py files: `lambda file_path: file_path.endswith('.py')` - Exclude .py files: `lambda file_path: not file_path.endswith('.py')` This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities. Features: - Support for loading documents from Git repositories. - Advanced file filtering options to include or exclude specific files.
…derComponent This commit fixes the issue where the GitLoaderComponent would fail if the file_filter input was not evaluated correctly. Changes include: - Added a check to ensure that file_filter is a valid string before calling eval. - Ensured that the evaluated file_filter is callable, otherwise it defaults to None.
src/backend/base/langflow/components/documentloaders/GitLoader.py
Outdated
Show resolved
Hide resolved
- Changed inputs from `StrInput` to `MessageTextInput` to enable dynamic use with agents. - Added `content_filter` field to allow additional content filtering using regex. - Updated `file_filter` to support glob format, simplifying usage for users. - Implemented binary file removal filter to exclude binary files from queries, aligning with the agent's purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the suggestion you recommended and took the opportunity to make a small improvement to the agent based on my use case. I look forward to your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot nicer.
Thanks for all the help, @danielgines
…ow-ai#2850) * feat: Add GitLoader Component with advanced filtering options This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options. GitLoader Component: - Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module. - Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria. Examples of `file_filter` usage: - Include only .py files: `lambda file_path: file_path.endswith('.py')` - Exclude .py files: `lambda file_path: not file_path.endswith('.py')` This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities. Features: - Support for loading documents from Git repositories. - Advanced file filtering options to include or exclude specific files. * feat: Add GitLoader Component with advanced filtering options This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options. GitLoader Component: - Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module. - Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria. Examples of `file_filter` usage: - Include only .py files: `lambda file_path: file_path.endswith('.py')` - Exclude .py files: `lambda file_path: not file_path.endswith('.py')` This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities. Features: - Support for loading documents from Git repositories. - Advanced file filtering options to include or exclude specific files. * fix: Ensure proper evaluation and validation of file_filter in GitLoaderComponent This commit fixes the issue where the GitLoaderComponent would fail if the file_filter input was not evaluated correctly. Changes include: - Added a check to ensure that file_filter is a valid string before calling eval. - Ensured that the evaluated file_filter is callable, otherwise it defaults to None. * [autofix.ci] apply automated fixes * feat: Enhance GitLoaderComponent with dynamic inputs, content filtering - Changed inputs from `StrInput` to `MessageTextInput` to enable dynamic use with agents. - Added `content_filter` field to allow additional content filtering using regex. - Updated `file_filter` to support glob format, simplifying usage for users. - Implemented binary file removal filter to exclude binary files from queries, aligning with the agent's purpose. * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> (cherry picked from commit d108ca1)
This commit enhances the GitLoaderComponent, providing users with the ability to load files from a Git repository with advanced filtering options, including dynamic input handling and content-based filtering.
GitLoader Component:
langchain_community.document_loaders.git.GitLoader
module to load files from a Git repository.StrInput
toMessageTextInput
to support dynamic usage with agents.content_filter
field to enable content-based filtering using regex patterns.Examples of file_filter usage:
*.py
!*.py
Examples of content_filter usage:
pathlib
Features: