-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pickling problem in LinkedDataMapping
#655
Conversation
@addie9800 Oh wow, thanks a lot for catching this. That's actually a rather big thing, since it effects all parser who make use of the now officially supported I'm not quite sure about the road I want to go here, but that aside, your suggestion looks like a perfect solution if we're going to keep the XML persistent to save some time. Edit: I also gonna add an additional test to the Edit2: I think it's better to go with your solution and keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@addie9800 Thanks a lot for fixing this! I completely overlooked that. That's a huge miss on my side.
src/fundus/parser/data.py
Outdated
@@ -61,6 +61,17 @@ def __init__(self, lds: Iterable[Dict[str, Any]] = ()): | |||
self.add_ld(ld) | |||
self.__xml: Optional[lxml.etree._Element] = None | |||
|
|||
def __getstate__(self): | |||
picklable_dict = self.__dict__.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename pickable_dict
-> state
src/fundus/parser/data.py
Outdated
@@ -61,6 +61,17 @@ def __init__(self, lds: Iterable[Dict[str, Any]] = ()): | |||
self.add_ld(ld) | |||
self.__xml: Optional[lxml.etree._Element] = None | |||
|
|||
def __getstate__(self): | |||
picklable_dict = self.__dict__.copy() | |||
if (xml_element := picklable_dict.get("_LinkedDataMapping__xml")) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less error-prone if we check in self._xml
instead of the dict.
As of now CC-Crawling is crashing due to a pickling error, with a message along the lines of
cannot pickle lxml.etree._Element
. I traced the error back to this article: https://de.euronews.com/2023/02/17/chinas-botschaft-in-paris-twittert-falsche-behauptungen-uber-das-erdbeben-in-der-turkei where the issue seems to be the use of the__as_xml__()
function, which creates an attribute in theLinkedDataMapping
object of typelxml.etree._Element
. To avoid crashing, I suggest overwriting the__getstate__()
and__setstate__()
functions converting the Elements to a string before pickling.This is only an issue in multithreading situations.