Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance fix for delta generation #2

Open
pavel-odintsov opened this issue Jul 26, 2014 · 7 comments
Open

Performance fix for delta generation #2

pavel-odintsov opened this issue Jul 26, 2014 · 7 comments

Comments

@pavel-odintsov
Copy link

Hello!

I plans to test delta generation performance but found this in your code:

268: buf = changedfd.read() # Just read the whole damn file into memory

Could you fix this code to real file reading instead putting whole file to memory?

Thank you so much :)

@therealmik
Copy link
Owner

Ha, forgot about that.

I did have some horrible buffering algorithm before, but I was trying to make readable code so I just did that :)

I've committed a change that doesn't do that anymore - but it does issue lots of small reads. I also added support for buffering when opening a file (just the 3rd argument to open), so hopefully that's not too slow.

@pavel-odintsov
Copy link
Author

Thank you so much :) You are very kindly!

@therealmik
Copy link
Owner

Just a warning - you might want to write some test cases for this if you actually use it on real data. I tried a few, but who knows what crazy stuff I've done...

@vps2fast
Copy link

vps2fast commented Oct 3, 2014

Hello, Michael!

I'm compiled (oh, I need for it about 8 hours!) and installed pypy-3 for testing mainline version of pyrdiff.

I run it for 4Gb file and it eat almost all my memory :)

695218 root      20   0 8380m 6.6g  12m R 100.0  1.7   3:47.99 pypy-c                                                                                                                                                               

Because of this:

    def _generate_delta(signatures, changedfd):
        """Given a file object and signatures from another file,
           generate a set of deltas (LiteralChange / CopyChange)"""
        buf = changedfd.read() # Just read the whole damn file into memory

Could you fix it in Python-3 branch?

@vps2fast
Copy link

vps2fast commented Oct 3, 2014

And this code:

buf = buf[offset+blocksize:]

Looks like memory allocator killer.. But I'm not sure about underlying implementation.

@therealmik
Copy link
Owner

Those two are related. I don't have any time to do this work at the moment, but feel free to send a pull request if you make the change.

@vps2fast
Copy link

vps2fast commented Oct 8, 2014

Hello!

Did it :) Please check this: #3 But please review patch very thoroughly because I did not execute tests for enough amount of test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants