-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTMLHeaderTextSplitter won't run (maxHead) #13149
Comments
🤖 Hello @AndyMik90, I'm an assistive bot here to help you with your issue while we wait for a human maintainer. I'm able to help address bugs, answer questions, and even guide you on how to become a contributor to the project. Rest assured, I'm here to help! I'm currently reviewing the problem you've encountered with HTMLHeaderTextSplitter in the LangChain repository. I'll be back shortly with a more comprehensive solution! Stay tuned! |
🤖 The error you're encountering, The One possible solution could be to split your HTML file into smaller chunks before processing it with the Another possible solution could be to increase the maximum size limit of the XSLT processor. However, this would involve modifying the source code of the libxslt library and recompiling it, which may not be feasible or desirable. Please note that these are just possible solutions and may not work in all cases. If you continue to encounter issues, please provide more details about the size and structure of your HTML file, and any other relevant information that could help in troubleshooting the issue. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Hi, @AndyMik90, I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you raised an issue regarding the HTMLHeaderTextSplitter not running due to a maxHead error when attempting to split a large HTML file. Dosubot provided a detailed response explaining the error and suggested potential solutions for resolving the issue, such as splitting the HTML file into smaller chunks or modifying the underlying libraries. They also requested more details for further troubleshooting. Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, kindly let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
I'm having the same problem and I'm using the the latest version of the LangChain repository. |
Me too? have you found any solutions? |
This is indeed a problem from the It is high time we switch to another dependency. |
I've just hit this problem too. Anyone know of any workarounds? |
+1 Looking for a workaround |
My workaround is to subclass HTMLHeaderTextSplitter and overwrite split_text_from_file. In there I call xsltproc via subprocess.run just for the transformation using the same html_chunks_with_headers.xslt. |
I have overwritten the class. I have used bs4 instead of lxml which is so far working for me
|
System Info
Notebook with latest langchain
Who can help?
No response
Information
Related Components
Reproduction
Trying in notebook the HTMLHeaderTextSplitter
The text was updated successfully, but these errors were encountered: