Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add GitLoader Component with advanced filtering options #2850

Merged
merged 14 commits into from
Jul 30, 2024

Conversation

danielgines
Copy link
Contributor

@danielgines danielgines commented Jul 20, 2024

This commit enhances the GitLoaderComponent, providing users with the ability to load files from a Git repository with advanced filtering options, including dynamic input handling and content-based filtering.

GitLoader Component:

  • Implementation: Utilizes the langchain_community.document_loaders.git.GitLoader module to load files from a Git repository.
  • Dynamic Inputs: Changed inputs from StrInput to MessageTextInput to support dynamic usage with agents.
  • Advanced Filtering:
    • File Filter: Supports glob patterns for including or excluding specific files based on their extensions or other criteria.
    • Content Filter: Added content_filter field to enable content-based filtering using regex patterns.
    • Binary File Exclusion: Implemented a filter to remove binary files from queries, ensuring relevance to the agent's purpose.

Examples of file_filter usage:

  • Include only .py files: *.py
  • Exclude .py files: !*.py

Examples of content_filter usage:

  • Search for 'pathlib' in any file: pathlib

Features:

  • Support for loading documents from Git repositories.
  • Dynamic input handling for better flexibility with agents.
  • Advanced file filtering options using glob patterns.
  • Content-based filtering using regex.
  • Automatic exclusion of binary files to focus on relevant text files.

This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options.

GitLoader Component:

- Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module.
- Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria.

Examples of `file_filter` usage:
- Include only .py files: `lambda file_path: file_path.endswith('.py')`
- Exclude .py files: `lambda file_path: not file_path.endswith('.py')`

This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities.

Features:

- Support for loading documents from Git repositories.
- Advanced file filtering options to include or exclude specific files.
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 20, 2024
Copy link
Contributor

Pull Request Validation Report

This comment is automatically generated by Conventional PR

Whitelist Report

Whitelist Active Result
Pull request is a draft and should be ignored
Pull request is made by a whitelisted user and should be ignored
Pull request is submitted by a bot and should be ignored
Pull request is submitted by administrators and should be ignored

Result

Pull request does not satisfy any enabled whitelist criteria. Pull request will be validated.

Validation Report

Validation Active Result
All commits in this pull request has valid messages
Pull request does not introduce too many changes
Pull request has a valid title
Pull request has mentioned issues
Pull request has valid branch name
Pull request should have a non-empty body

Result

Pull request satisfies all enabled pull request rules.

Last Modified at 20 Jul 24 01:00 UTC

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 20, 2024
Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-2850.dmtpw4p5recq1.amplifyapp.com

This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options.

GitLoader Component:

- Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module.
- Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria.

Examples of `file_filter` usage:
- Include only .py files: `lambda file_path: file_path.endswith('.py')`
- Exclude .py files: `lambda file_path: not file_path.endswith('.py')`

This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities.

Features:

- Support for loading documents from Git repositories.
- Advanced file filtering options to include or exclude specific files.
…derComponent

This commit fixes the issue where the GitLoaderComponent would fail if the file_filter input was not evaluated correctly. Changes include:

- Added a check to ensure that file_filter is a valid string before calling eval.
- Ensured that the evaluated file_filter is callable, otherwise it defaults to None.
danielgines and others added 8 commits July 25, 2024 11:05
- Changed inputs from `StrInput` to `MessageTextInput` to enable dynamic use with agents.
- Added `content_filter` field to allow additional content filtering using regex.
- Updated `file_filter` to support glob format, simplifying usage for users.
- Implemented binary file removal filter to exclude binary files from queries, aligning with the agent's purpose.
Copy link
Contributor Author

@danielgines danielgines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogabrielluiz

I applied the suggestion you recommended and took the opportunity to make a small improvement to the agent based on my use case. I look forward to your feedback.

@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 29, 2024
Copy link
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot nicer.

Thanks for all the help, @danielgines

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 30, 2024
@ogabrielluiz ogabrielluiz enabled auto-merge (squash) July 30, 2024 01:50
@ogabrielluiz ogabrielluiz merged commit d108ca1 into langflow-ai:main Jul 30, 2024
59 of 60 checks passed
nicoloboschi pushed a commit to datastax/ragstack-ai-langflow that referenced this pull request Jul 30, 2024
…ow-ai#2850)

* feat: Add GitLoader Component with advanced filtering options

This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options.

GitLoader Component:

- Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module.
- Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria.

Examples of `file_filter` usage:
- Include only .py files: `lambda file_path: file_path.endswith('.py')`
- Exclude .py files: `lambda file_path: not file_path.endswith('.py')`

This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities.

Features:

- Support for loading documents from Git repositories.
- Advanced file filtering options to include or exclude specific files.

* feat: Add GitLoader Component with advanced filtering options

This commit introduces the GitLoaderComponent, enabling users to load files from a Git repository with advanced filtering options.

GitLoader Component:

- Implementation of the GitLoaderComponent to load files from a Git repository using the `langchain_community.document_loaders.git.GitLoader` module.
- Advanced filtering option using `file_filter` to include or exclude specific files based on their extensions or other criteria.

Examples of `file_filter` usage:
- Include only .py files: `lambda file_path: file_path.endswith('.py')`
- Exclude .py files: `lambda file_path: not file_path.endswith('.py')`

This component ensures a flexible and customizable approach for loading documents from Git repositories, enhancing the user experience with advanced filtering capabilities.

Features:

- Support for loading documents from Git repositories.
- Advanced file filtering options to include or exclude specific files.

* fix: Ensure proper evaluation and validation of file_filter in GitLoaderComponent

This commit fixes the issue where the GitLoaderComponent would fail if the file_filter input was not evaluated correctly. Changes include:

- Added a check to ensure that file_filter is a valid string before calling eval.
- Ensured that the evaluated file_filter is callable, otherwise it defaults to None.

* [autofix.ci] apply automated fixes

* feat: Enhance GitLoaderComponent with dynamic inputs, content filtering

- Changed inputs from `StrInput` to `MessageTextInput` to enable dynamic use with agents.
- Added `content_filter` field to allow additional content filtering using regex.
- Updated `file_filter` to support glob format, simplifying usage for users.
- Implemented binary file removal filter to exclude binary files from queries, aligning with the agent's purpose.

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
(cherry picked from commit d108ca1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants