A faster alternative to pako.js #136

photopea · 2018-05-24T20:24:04Z

Hi guys, I was curious if I can make a faster alternative to pako.js , so I made UZIP.js . Actually, it is the whole ZIP parser and generator (alternative e.g. to JSZip), all inside 28 kB unminified.

I made it from scratch, without rewriting other implementations (like ZLIB).

Inflate (decompress) is 20 % to 50 % faster than pako.
Deflate (compress) is almost always faster, but produces larger files sometimes.

Here I compress a 11 MB BMP image. It is faster and better than pako, but it usually does not work so well e.g. for plain text.

It is not an issue, but could we keep it here for a while, so I can maybe attract some people interested improving it, independently on ZLIB? We could add Zopfli-like optimizations, etc.

puzrin · 2018-05-24T21:26:27Z

Please, understand me right, i have a lot of good (i hope) projects to do and only 24 hour in a day. Every time i need to make a choice - can i add significant value or not. Process of polish is endless, and we have say "stop" to self at some point to continue with more valuable things. This project is very stable and battle-tested. So prior to start improvements, i need very strong proofs, that time will not been spent for nothing in the end.

What i mean - it's not easy to make benchmarks correct. That depends on many things and even on node.js version used (v8 engine version). If you like me to participate in considering your acheivements, i strongly recommend to apply patches into existing pako fork and use existing benchmark. It will give some kind of guarantee, that everything else is correct. If your code is completely indenendent, it will require a huge time to analize what happens - that's very ineffective. Also, i recommend recheck benchmarks in node 8.2.1, because next v8 versions has significant perfomance loss. This approach will help to understand this things:

is this difference related to algorythm?
is this difference related to v8 JIT?
is this difference specific to some v8 version?
is this difference related to specific wrapper call options (strings conversion, for example)?
what is the total gain, and does it worth spend resources to have it?

I don't mean you do something useless. But would like to see more deep and more careful investigation results. It may look that i'm stupid to reject "obvious WOW benchmarks", but i have reasons to do so, based on my experience. Trust me, i had a huge number of benchmarking issues and experiments when developped this package (and spent a lot of time for those). There are many cases, when "fantastic" results are caused by environment, not by measured code. So, i don't reject, but suggest to use more accurate/safe/predictable benchmarking approach.

PS. Of cause if you need help with promoting your initiative - i will be happy to help, even if not participate myself. Just dont forget to close issue at some moment, in 1-2 months for example, or if you loose interest :)

photopea · 2018-05-24T22:05:32Z

I am not asking to change pako.js in any way. I understand, that it is a rewrite of ZLIB, it is stable and properly tested.

My library is like a new ground for experiments, for people, who would like to "get away" from ZLIB and its structure. Actually, I think changing ZLIB may be very hard (that is why I did not do it). I will definitely close it at some point :)

puzrin · 2018-05-24T22:29:15Z

Why don't you wish to experiment with pako sources in fork?

It will be more easy to understand small diff with changes
It will be more easy to measure difference, caused by such patch, and test multiple v8 versions
Benchmarks will be easy to reproduce (that's more impressive than screenshot image)
If result will be interesting, it will be more easy to promote it into upstream (original zlib)

In theory - less efforts for more "added value".

photopea · 2018-05-25T06:55:38Z

I can not read or rewrite other peoples code, it is extremely uncomfortable for me :( I can only use such code through a described interface, or add new parts to it, that cooperate through an interface.

I also have no experience with node.js and tools, that you use for processing and testing your code (I run my JS in a browser the same way I wrote it). I am still using pako.js for compression (because UZIP.js is qute unpredictable now), but I use UZIP.js for decompression (because it is always faster, the difference is even bigger in Firefox). I use different algorithms, but I guess I should suggest them to ZLIB, not to you.

ssb22 · 2018-05-27T15:33:05Z

Maybe add a "Related Projects" section to the Pako Readme, and mention UZip there? I'm not sure about the benchmarking issue, but UZip does have the advantage of being entirely within one file, which makes it easier to include in browser scripts etc without having to get browserify to work. I'd like to try using UZip for the Javascript output of my Annotator Generator—currently if you say --javascript and --zlib it puts require(pako) into its generated code, which could be a bit awkward if you're not on Node.JS, so I'd rather be able to include some self-contained chunk of Javascript that just provides the inflate function (even better if it's a cut-down version that only provides inflate, as my use-case doesn't need the rest of the API). UZip looks promising in this respect.

One slight problem with UZip though, it's not quite correct to say in its Readme "The API is the same as the API of pako.js" because currently UZip's inflate accepts only a Uint8Array whereas Pako can also accept a string. So with Pako I can say data=pako.inflate("x\xda\xcb\xc8\x04\0\x01;\0\xd2") which is more compact than data=UZip.inflate(new Uint8Array([120, 218, 203, 200, 4, 0, 1, 59, 0, 210])). You full-on Javascript devs will immediately see something I missed, but the inability of UZip to take a string argument to inflate is the main thing stopping me from using it instead of Pako right now.

ssb22 · 2018-05-27T15:43:59Z

Following up my previous comment, I think I picked the wrong example because "x\xda\xcb\xc8\x04\0\x01;\0\xd2" is the same length as [120,218,203,200,4,0,1,59,0,210] but in the general case the 256 possible byte values take a total of 729 source bytes if represented as a string, versus 915 if represented as an array:

"\0\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255]

so I expect a string representation to save 20% on average. It's a pity Javascript doesn't come with Base64 or something as standard. (You're more restricted if you can't assume libraries will be there.)

puzrin · 2018-05-27T15:46:44Z

@ssb22 https://github.com/nodeca/pako/tree/master/dist.

Please, let's keep this repo for issues about pako only, when my personal participation is absolutely nesessary.

photopea · 2018-05-27T15:53:32Z

@ssb22 We can discuss UZIP-related stuff at https://github.com/photopea/UZIP.js/issues . Also, if you need to store binary data directly inside a JS code, there is base64 for JS called "btoa" and "atob" https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding

ssb22 · 2018-05-27T20:38:32Z

Good point. I have now opened the API-compatibility issue on UZIP's repository, and created Pako pull request #137 for my suggestion to mention UZip in Pako's Readme (and marked the pull request as closing this issue). Hope that was vaguely the right thing to do.

ssb22 mentioned this issue May 27, 2018

Add an 'Other Javascript-based fast (de)compression projects' to READ… #137

Closed

puzrin mentioned this issue May 29, 2018

Patch 1 #140

Closed

puzrin closed this as completed Jul 15, 2018

This was referenced Jul 18, 2019

Benchmark? photopea/UZIP.js#8

Closed

inflate compressed entries using pako transcend-io/conflux#30

Merged

101arrowz mentioned this issue Nov 6, 2020

Switch to faster hashing for deflate #195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A faster alternative to pako.js #136

A faster alternative to pako.js #136

photopea commented May 24, 2018 •

edited

Loading

puzrin commented May 24, 2018 •

edited

Loading

photopea commented May 24, 2018

puzrin commented May 24, 2018

photopea commented May 25, 2018

ssb22 commented May 27, 2018

ssb22 commented May 27, 2018

puzrin commented May 27, 2018

photopea commented May 27, 2018

ssb22 commented May 27, 2018

A faster alternative to pako.js #136

A faster alternative to pako.js #136

Comments

photopea commented May 24, 2018 • edited Loading

puzrin commented May 24, 2018 • edited Loading

photopea commented May 24, 2018

puzrin commented May 24, 2018

photopea commented May 25, 2018

ssb22 commented May 27, 2018

ssb22 commented May 27, 2018

puzrin commented May 27, 2018

photopea commented May 27, 2018

ssb22 commented May 27, 2018

photopea commented May 24, 2018 •

edited

Loading

puzrin commented May 24, 2018 •

edited

Loading