-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage #57
Comments
Hello Louis, |
Hi @awetzel , thanks for looking into this. Here's the streaming code. We've not used data =
xml_stream
|> stream_tags(:CommercialDetail)
|> Stream.map(fn {_, doc} ->
%{
clock_number: xpath(doc, ~x"./ClockNumber/text()"s),
actioned_at: xpath( doc, ~x"./ActionDate/text()"s)),
presentation_codes: xpath(doc, ~x"./CommercialPresentations/Presentation/Code/text()"sl),
restriction_codes: xpath(doc, ~x"./CommercialRestrictions/Restriction/Code/text()"sl),
status: xpath(doc, ~x"./Status/text()"s),
vod_final_action_id: xpath(doc, ~x"./VODFinalActionId/text()"s),
final_action_id: xpath(doc, ~x"./FinalActionId/text()"s)
}
end)
|> Enum.to_list() Cheers, |
Hi, Thanks for pointing this issue, I just updated the documentation to mention this option. For you specific case, you should use the
Can you keep me posted, so that I can close the issue ? Thanks. |
@antoinereyt would you mind explaining about as to why this is? from reading the code, i'd guess that it is meant to free the memory used for the current iteration - after that one is done and we're on our way to the next one .. is that somewhat correct? |
@awetzel do you have some details after your investigations ? |
Hi @antoinereyt, I'm afraid we were unable to find a solution with the streaming code before your message there, so we rewrote to use another XML library to meet our deadline. |
Hi @antoinereyt, just wanted to let you know we managed to reduce our memory consumption while parsing ~10MB XML files by up to 800MB thanks to the We were really surprised by the need for this option since bounded memory consumption was the reason we used the streaming interface in the first place (we found out thanks to this issue as we shipped our code months ago before the doc mentioned the option). I understand there is a trade-off and it's hard to avoid surprising behavior: either you discard tags by default but then the streaming output might differ from the non-streaming output for no obvious reason, or you do not discard by default to have a consistent output but memory usage blows up. Assuming the main reason people end up using the streaming API is to get bounded memory usage, wouldn't it be fair to discard by default, with documentation warning of the consequences on the output? Thanks for the work. |
@lpil which one did you choose that supports xpath? |
@lpil so I looked into that but how did you get that to support xpath? |
I don't believe we used xpath though I've left the company so I can't be sure. |
I'm also having memory issues... I don't understand the comment:
The I read there:
I'm guessing this still hasn't been implemented, has it? As far as I'm concerned, this doesn't feel much complicated: I have a huge list of
Any ideas? Suggestions? Should I also move to another library? |
Since I made this issue an excellent XML pull parser that used a very small amount of memor was released. It worked really well for us now. I've been searching for 15 minutes but I can't find it now (I've forgotten the name) but it exists! |
@lpil by any chance, is it this one ? https://github.com/zadean/yaccety_sax (I haven't tested it yet, but it came across my radar) |
Yes! It was head and shoulders better than the rest memory wise. A shame it's not well known. |
Hello! We're using SweetXML in production and we've been having some excessive memory usage that we've not been able to debug.
In our latest text an 80MB XML file uses >9GB of memory, this causes the VM to crash.
We are parsing XML in a format like this:
We parse this XML like so:
Here's the load charts while iterating over and parsing XML files, and then discarding the result. It spikes each time the XML is parsed
What are we doing wrong here?
Extra note: We re-wrote this code to use the streaming API which used slightly less memory. Most our XML will not have newlines in it so this seemed to not be the rather path for a solution, and we would expect lower memory usage from the eager and streaming API.
After digging into the source it seems that memory spikes when
:xmerl_scan.string/1
is called.Thanks,
Louis
The text was updated successfully, but these errors were encountered: