Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add BSHTMLLoader support and enhance error handling for document loading #1166

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

LavX
Copy link
Contributor

@LavX LavX commented Feb 18, 2025

This PR enhances the DocumentLoader class by adding support for HTML documents and improving error handling. The changes include:

Added BSHTMLLoader support:

Imported BSHTMLLoader from langchain_community.document_loaders
Added handlers for both .html and .htm file extensions
BSHTMLLoader provides better HTML parsing capabilities compared to basic text loading
Improved error handling:

Added specific try-catch block for document loading operations
Enhanced error messages to differentiate between HTML-specific and general document loading failures
Provides better debugging information when loading fails
These changes make the document loader more robust and expand its capabilities to handle HTML documents more effectively.

Technical Details:

File modified: gpt_researcher/document/document.py
Added BSHTMLLoader to the imported loaders
Updated loader_dict to include HTML file extensions
Implemented specific error handling for HTML document loading failures

@ElishaKay ElishaKay merged commit 9a06536 into assafelovic:master Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants