-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance update: Refactor loader add json, yamllib support #471
Conversation
I'm still thinking about this... give me a bit more time please. |
Some first review note: I don't much like this one big commit handling few independent tasks. You get rid of Now on the real topic. I totally agree we need to improve My another idea would be to add cache support. It should be quite simple:
Finally I don't really like idea of adding new templates format. I understand JSON is faster to load but with cache implemented (which we need anyway as we want to keep YAML support) it shouldn't matter. I'd rather stick to one format to avoid any compatibility issue later, to avoid increasing maintenance cost and to avoid discussions on further formats. We already had a request for storing templates as database tables. Someone may prefer XML over YAML/JSON and request to support it too. |
Let's first respond to the json support. TL;DRYaml and json are basically the same thing. If the Pyyaml library gets updated to support the "new" (from 2009) yaml 1.2 standard. Invoice2data is getting support for Json documents either way. (whether we are concious about it or not) But then it would be realized by the external lib. (note the performance issue). Rationale:
source: https://realpython.com/python-yaml/ Basically, if Pyyaml started to support the yaml 1.2 standard. It could parse json documents as well. (if we really needed/wanted we could exchange the pyyaml library for some other). Conclusion |
I think this is a different topic. Can be quite interesting, I personally don't have a opinion on this now.
I have seen an implementation of this. The database handling was handled by a third party application.
Very nice solution. IMO this is something for the 3rd party application. Not for the "core" invoice2data parser. |
Orderedload is a leftover for python 2.7 support. It is no longer needed. Since python 3, there are cleaner/better options.
Rationale behind this is that the The best option (performance wise) would be to use base loader. If we really wanted to we could use an alternative implementation. if the key remove_accents is present we could start that function. It does'nt make sense to use |
some more rationale behind the removal of ordered load:
source: https://python.plainenglish.io/should-we-still-use-ordereddict-in-python-f223c85a01d5 Propably there are some more places, where we can move from ordereddict. |
9597a9e
to
b7bcaa2
Compare
@rmilecki I've split the commits so it is easier to review. Hope we can merge this soon. |
@rmilecki Can you have a look at this one?? |
0d4cf66
to
d8d082a
Compare
This is an important pr. As it increases speed and usbility a lot. We can depate further to drop the commit 'native json' support and wait until it's merged in pyyaml. |
Ordereddict no longer needed in python3
No coverage
This one is open for some time. 2 weeks passed since my last question. |
This PR is a big refactor of the loader.
Sorry for that ;)
Invoice2data was very slow to use. The main cause is the loading of the yaml templates by pyyaml.
This pr adds support for the c library libyaml. Which speeds up the template loading by 10x ✨
To use it, one needs to install this library.
$ apt-get install libyaml-dev
The usage of yaml templates was an design choice. They offer good human readability and might be easier to write then json files.
yaml is also very flexible, as it is able to package json inside.
(Discovered that, while developing the camelot plugin)
However there is a good case to be made to support json templates as well.