Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A faster alternative to pako.js #136

Closed
photopea opened this issue May 24, 2018 · 9 comments
Closed

A faster alternative to pako.js #136

photopea opened this issue May 24, 2018 · 9 comments

Comments

@photopea
Copy link

photopea commented May 24, 2018

Hi guys, I was curious if I can make a faster alternative to pako.js , so I made UZIP.js . Actually, it is the whole ZIP parser and generator (alternative e.g. to JSZip), all inside 28 kB unminified.

I made it from scratch, without rewriting other implementations (like ZLIB).

  • Inflate (decompress) is 20 % to 50 % faster than pako.
  • Deflate (compress) is almost always faster, but produces larger files sometimes.

Here I compress a 11 MB BMP image. It is faster and better than pako, but it usually does not work so well e.g. for plain text.
uzipvspako

It is not an issue, but could we keep it here for a while, so I can maybe attract some people interested improving it, independently on ZLIB? We could add Zopfli-like optimizations, etc.

@puzrin
Copy link
Member

puzrin commented May 24, 2018

Please, understand me right, i have a lot of good (i hope) projects to do and only 24 hour in a day. Every time i need to make a choice - can i add significant value or not. Process of polish is endless, and we have say "stop" to self at some point to continue with more valuable things. This project is very stable and battle-tested. So prior to start improvements, i need very strong proofs, that time will not been spent for nothing in the end.

What i mean - it's not easy to make benchmarks correct. That depends on many things and even on node.js version used (v8 engine version). If you like me to participate in considering your acheivements, i strongly recommend to apply patches into existing pako fork and use existing benchmark. It will give some kind of guarantee, that everything else is correct. If your code is completely indenendent, it will require a huge time to analize what happens - that's very ineffective. Also, i recommend recheck benchmarks in node 8.2.1, because next v8 versions has significant perfomance loss. This approach will help to understand this things:

  • is this difference related to algorythm?
  • is this difference related to v8 JIT?
  • is this difference specific to some v8 version?
  • is this difference related to specific wrapper call options (strings conversion, for example)?
  • what is the total gain, and does it worth spend resources to have it?

I don't mean you do something useless. But would like to see more deep and more careful investigation results. It may look that i'm stupid to reject "obvious WOW benchmarks", but i have reasons to do so, based on my experience. Trust me, i had a huge number of benchmarking issues and experiments when developped this package (and spent a lot of time for those). There are many cases, when "fantastic" results are caused by environment, not by measured code. So, i don't reject, but suggest to use more accurate/safe/predictable benchmarking approach.

PS. Of cause if you need help with promoting your initiative - i will be happy to help, even if not participate myself. Just dont forget to close issue at some moment, in 1-2 months for example, or if you loose interest :)

@photopea
Copy link
Author

I am not asking to change pako.js in any way. I understand, that it is a rewrite of ZLIB, it is stable and properly tested.

My library is like a new ground for experiments, for people, who would like to "get away" from ZLIB and its structure. Actually, I think changing ZLIB may be very hard (that is why I did not do it). I will definitely close it at some point :)

@puzrin
Copy link
Member

puzrin commented May 24, 2018

Why don't you wish to experiment with pako sources in fork?

  • It will be more easy to understand small diff with changes
  • It will be more easy to measure difference, caused by such patch, and test multiple v8 versions
  • Benchmarks will be easy to reproduce (that's more impressive than screenshot image)
  • If result will be interesting, it will be more easy to promote it into upstream (original zlib)

In theory - less efforts for more "added value".

@photopea
Copy link
Author

I can not read or rewrite other peoples code, it is extremely uncomfortable for me :( I can only use such code through a described interface, or add new parts to it, that cooperate through an interface.

I also have no experience with node.js and tools, that you use for processing and testing your code (I run my JS in a browser the same way I wrote it). I am still using pako.js for compression (because UZIP.js is qute unpredictable now), but I use UZIP.js for decompression (because it is always faster, the difference is even bigger in Firefox). I use different algorithms, but I guess I should suggest them to ZLIB, not to you.

@ssb22
Copy link

ssb22 commented May 27, 2018

Maybe add a "Related Projects" section to the Pako Readme, and mention UZip there? I'm not sure about the benchmarking issue, but UZip does have the advantage of being entirely within one file, which makes it easier to include in browser scripts etc without having to get browserify to work. I'd like to try using UZip for the Javascript output of my Annotator Generator—currently if you say --javascript and --zlib it puts require(pako) into its generated code, which could be a bit awkward if you're not on Node.JS, so I'd rather be able to include some self-contained chunk of Javascript that just provides the inflate function (even better if it's a cut-down version that only provides inflate, as my use-case doesn't need the rest of the API). UZip looks promising in this respect.

One slight problem with UZip though, it's not quite correct to say in its Readme "The API is the same as the API of pako.js" because currently UZip's inflate accepts only a Uint8Array whereas Pako can also accept a string. So with Pako I can say data=pako.inflate("x\xda\xcb\xc8\x04\0\x01;\0\xd2") which is more compact than data=UZip.inflate(new Uint8Array([120, 218, 203, 200, 4, 0, 1, 59, 0, 210])). You full-on Javascript devs will immediately see something I missed, but the inability of UZip to take a string argument to inflate is the main thing stopping me from using it instead of Pako right now.

@ssb22
Copy link

ssb22 commented May 27, 2018

Following up my previous comment, I think I picked the wrong example because "x\xda\xcb\xc8\x04\0\x01;\0\xd2" is the same length as [120,218,203,200,4,0,1,59,0,210] but in the general case the 256 possible byte values take a total of 729 source bytes if represented as a string, versus 915 if represented as an array:

"\0\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255]

so I expect a string representation to save 20% on average. It's a pity Javascript doesn't come with Base64 or something as standard. (You're more restricted if you can't assume libraries will be there.)

@puzrin
Copy link
Member

puzrin commented May 27, 2018

@ssb22 https://github.com/nodeca/pako/tree/master/dist.

Please, let's keep this repo for issues about pako only, when my personal participation is absolutely nesessary.

@photopea
Copy link
Author

@ssb22 We can discuss UZIP-related stuff at https://github.com/photopea/UZIP.js/issues . Also, if you need to store binary data directly inside a JS code, there is base64 for JS called "btoa" and "atob" https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding

@ssb22
Copy link

ssb22 commented May 27, 2018

Good point. I have now opened the API-compatibility issue on UZIP's repository, and created Pako pull request #137 for my suggestion to mention UZip in Pako's Readme (and marked the pull request as closing this issue). Hope that was vaguely the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants