Revised compress method #1

lafncow · 2013-04-05T23:39:51Z

I made 2 changes to the "compress" method:

it will return fewer than the target number of bytes if it is given a digest that is smaller than the target size already (instead of throwing an error)
it spreads the modulo bytes around rather than dumping them all into the final byte (I think this might preserve some entropy, no?)

Instead of throwing an error or zero-padding, "compress" now returns the input bytes if there are less than or equal to "target" number of them. I think this is logical since the goal of compress is to reduce the complexity of the digest before making it human consumable. In this case the complexity is already low enough to proceed.

Excess bytes are now distributed amongst the compressed bytes, instead of being dumped into the final bit as they were before.

blag · 2017-04-12T07:05:19Z

I'm maintaining a Python 3 fork of humanhash on GitHub and PyPI.

Can you add some comments to the code to explain what this is doing? And why it's better than the existing compress method? Sorry to dig this up from four years ago...

lafncow · 2017-05-09T18:22:26Z

Happy to resurrect this! Compression method comments are added.

Why is this better?
The old method divided the bytes into the target number of segments and after even division, placed all remainder bytes into the final segment. This meant that the effect of the remainder bytes on overall entropy was confined to the final byte.
In the new method, the remainder bytes are selected throughout the input bytes and are distributed evenly among the target segments, allowing them to express more entropy. The compression per input byte is more even, since the biggest difference in the number of input bytes per output byte is 1.

For example:

compress_old([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_old([123,456,789,147,258,369,321],4)
# -> [123, 456, 789, 417] (only the last byte has changed)

compress_new([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_new([123,456,789,147,258,369,321],4)
# -> [435, 902, 115, 321] (all 4 bytes have changed)

As an aside, I have an equivalent compress method prepared for the Javascript port and I will create a pull request there if this is merged.

Thanks!

blag · 2017-05-09T22:26:34Z

I'm maintaining the humanhash3 PyPI package, and if you can create a PR to my repo I'd be happy to merge it in. Thanks for the explanation! 😄

Adam Cornille added 2 commits April 5, 2013 16:08

Revised compression method

26b3692

Excess bytes are now distributed amongst the compressed bytes, instead of being dumped into the final bit as they were before.

Compression method comments

6ad5c71

lafncow mentioned this pull request May 10, 2017

Revised Compression Method blag/humanhash#1

Closed

lafncow force-pushed the master branch from e6dc5f7 to 6ad5c71 Compare May 10, 2017 19:17

lafncow mentioned this pull request May 10, 2017

Revised Compression Method blag/humanhash#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised compress method #1

Revised compress method #1

lafncow commented Apr 5, 2013

blag commented Apr 12, 2017

lafncow commented May 9, 2017

blag commented May 9, 2017

Revised compress method #1

Are you sure you want to change the base?

Revised compress method #1

Conversation

lafncow commented Apr 5, 2013

blag commented Apr 12, 2017

lafncow commented May 9, 2017

blag commented May 9, 2017