-
Notifications
You must be signed in to change notification settings - Fork 16
JSON list as Python generator? #24
Comments
I just pushed a release (0.2) an hour ago that implements lazy querysets (they will probably get some improvements soon). You can also pass generators to the manager and it will only be iterated accessing queryset data (note that the resulted data will still be in memory though). I think it partially adress your issue, at least the part regarding memory usage. However, once the generator is consumed, lifter won't be able to consume it again. The solution that comes to mind it to allow passing a callable to def return_json_generator():
return generator
manager = lifter.load(return_json_generator) This seems easier to implement than the blueprint you suggested. When #25 will be fixed, it will also increase performance (generator will only be looped once, regardless the number of filters/excludes applied). I'm not really fond of the index, at least currently: the package is still in alpha state and I'd rather not reinvent a whole database system at this point. Also, in your present situation, I think any effort you'll deploy to reduce the memory footprint of your queries will be useless if you need to maintain an index of your whole data in memory. |
I was thinking to something like:
This way you can save memory by using a generator for all the filtering operations you apply. The manager creates the generator; each time a filter operation is required, a deep copy of the generator is made and consumed. |
Yes this is exaclty that, the only difference is that you won't even need to The main advantage over your proposal is that you can call the manager a thousand time if you want, without providing a different copy each time, and it will still work. With your example, after you run |
I'll leave this open since I still need to implement the callable feature ;) |
I am collecting information about the possibility to use a generator instead of loading the full JSON in memory as
manager
get called:Possible algorithm:
I couldn't find any memory/CPU-attentive method in the Standard Library to accomplish the cloning or the deep copy of a generator in memory, the only one is tee() but it seems to have downsides for our usecase:
list()
can be better if you think of consuming the generator until the endDoes it sounds like a good idea?
The text was updated successfully, but these errors were encountered: