-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC calculation can be improved #306
Comments
Sounds nice, and it does seem a shame to add an extra external dependency for what looks like a small amount of code. Would it be possible and/or reasonable to 'internalize' it by adding it as a submodule and then including the source file for the internal class in the SZL project? maybe overkill compared ot just copying one or two files though. |
We could use it as a paket dependency. One of the "core" directives I have for this repo is to not add any consumer dependencies. |
On a related note, the Crc32.net issue @ force-net/Crc32.NET#10 refers to a CRC32 implementation using intrinsics for extra performance. |
That intrinsics implementation looks to be very fast indeed! A pity that the license does not match. I have not worked with paket before, but it seems it's possible to add a reference to a specific GitHub file,, so a reference to SafeProxy.cs could be added this way. Since this class is internal it would meet that requirement also. It seems to be somewhat overkill to introduce paket for just referencing a single file though. Wouldn't it be easier to just copy the one file into SharpZipLib? |
Sounds the simplest idea if it is just one file - not something that would be complicated to update if a new version were picked up later. |
Yes, perhaps it's a bit overkill. We could just add the source repo and commit hash to the header as a mean of upstream reference. |
Ok, I will create a pull request for further discussion based on copying the CRC32.net code whilst referencing the original code in the header. Thanks for all the input thus far! |
I created a pull request for this: #318 Some remarks:
Please let me know what you think. |
I don't know much about the implementation details of the CRC bits, but fwiw, I tried testing the change against a simple unit test that I previously wrote to test creating a Zip64 file with several gigabytes of zeros, and: Creating the file took 1 min 42 seconds using the master branch and 1 min 20 seconds with this change. So, a pretty good speed up. |
Thanks for testing! I'm no CRC expert by any means either, but the writeup from Michealangel007 is pretty good, with some nice code examples. I made another pull request (#319) for this, which is basically the same change, but using subclassing instead of Func<>'s. I feel that some people probably might like this better. |
fwiw, added a couple of BenchmarkDotNet benchmarks comparing this PR to the current code @ #328 |
Current implementation (master)BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.508 (2004/?/20H1)
AMD Ryzen 7 3800X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.8.20417.9
[Host] : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-AVMSLM : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-HCWAMU : .NET Framework 4.8 (4.8.4220.0), X64 RyuJIT
|
PR #319BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.508 (2004/?/20H1)
AMD Ryzen 7 3800X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.8.20417.9
[Host] : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-RQIUQR : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-QIKFQA : .NET Framework 4.8 (4.8.4220.0), X64 RyuJIT
|
PR #516BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.508 (2004/?/20H1)
AMD Ryzen 7 3800X, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.8.20417.9
[Host] : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-DYBSVV : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-SNAPLI : .NET Framework 4.8 (4.8.4220.0), X64 RyuJIT
|
Look good, both pull requests have a significant performance enhancement 👍 |
@decipherer Yeah, the difference between them is not really significant. Either would be a great improvement in performance. |
The CRC32 saga has been concluded as of v1.3.1 which included #516. Sorry for taking this long and not using your PRs @decipherer and @dpethes. I got a little bit burned by #202 and shied away from CRC changes after that. Thanks for your contributions! |
After making pull request #301 (Performance enhancement by skipping the (unnecessary) checks in ArraySegment) I started to wonder if it was possible to make the CRC calculation even faster. Since CRC calculation speed has a big influence on overall unzipping speed (and probably zipping too) this can potentially greatly improve the performance of SharpZipLib.
Sure enough, I found an algorithm that is significantly faster than the algorithm currently in use in SharpZipLib. Total unzipping time dropped by approx. 33% after I used this algorithm in SharpZipLib. The algorithm itself can be found here: https://github.com/force-net/Crc32.NET/blob/develop/Crc32.NET/SafeProxy.cs
Please note that the CRC lookup table is generated during the first call, so the first call is slightly slower (although I can hardly measure any difference on an 8k zipped file)
I would like to make a pull request to incorporate this faster algorithm in SharpZipLib. But I'm not sure how best to incorporate this since SharpZipLib at the moment does not have a lot of external dependencies. I can either
Can you please advise?
The text was updated successfully, but these errors were encountered: