-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue where HTML elements cannot be dropped from the text selector returned by Selector.jmespath() #298
base: master
Are you sure you want to change the base?
Conversation
This feels like a workaround, I wonder if there is not some root issue that needs to be addressed here. Maybe Could you add a test for the issue, so it is easier to experiment with alternative solutions? |
Sure, I'll add a test for this issue after work. My main concern is that direct modifications to the selector might break the forward compatibility of the entire library. In the current version, the implementation of converting text to HtmlElement is not very elegant either. It doesn't cache the HtmlElement generated from text, nor does it associate the HtmlElement with the selector itself. This leads to the creation of a new copy every time a query is made on a text node. Any modifications made to this copy will not affect the original Selector at all. |
# Conflicts: # parsel/selector.py # tests/test_selector.py
@Gallaecio I tried to remove |
return etree.tostring( | ||
self._text_lazy_html_root, encoding="unicode", with_tail=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem with this approach is that we are assuming HTML, when it could be XML.
So, I have created #299 as an alternative, but I have no strong opinion on which way to go. To be honest, I think both could work, i.e. this approach makes things work by default for HTML (which is also assumed to be default when no type is specified), while #299 would provide a way to make things work with XML by specifying |
When using the .xpath method to create nodes from a text type selector, it appears that these nodes are actually copies generated from the text, rather than being generated based on the original root node. As a result, when executing the .drop method, it doesn't affect the content of the original HTML tree. This issue is mostly observed when using jmespath and xpath in combination.
This pull request fixes an issue where HTML elements were not being dropped correctly from a text selector. The ·_text_lazy_html_root· attribute has been added to store a temporary root node, which prevents the creation of a new root HTMLElement copy each time a text selector is used
Fixes #297, resolves #299