Revised Compression Method #2

lafncow · 2017-05-10T19:37:13Z

I made 2 changes to the "compress" method:

it will return fewer than the target number of bytes if it is given a digest that is smaller than the target size already (instead of throwing an error)
it spreads the modulo bytes around rather than dumping them all into the final byte

Why is this better?
The old method divided the bytes into the target number of segments and after even division, placed all remainder bytes into the final segment. This meant that the effect of the remainder bytes on overall entropy was confined to the final byte.
In the new method, the remainder bytes are selected throughout the input bytes and are distributed evenly among the target segments, allowing them to express more entropy. The compression per input byte is more even, since the biggest difference in the number of input bytes per output byte is 1.

For example:

compress_old([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_old([123,456,789,147,258,369,321],4)
# -> [123, 456, 789, 417] (only the last byte has changed)

compress_new([123,456,798,147], 4)
# -> [123, 456, 789, 147]
compress_new([123,456,789,147,258,369,321],4)
# -> [435, 902, 115, 321] (all 4 bytes have changed)

Instead of throwing an error or zero-padding, "compress" now returns the input bytes if there are less than or equal to "target" number of them. I think this is logical since the goal of compress is to reduce the complexity of the digest before making it human consumable. In this case the complexity is already low enough to proceed.

Excess bytes are now distributed amongst the compressed bytes, instead of being dumped into the final bit as they were before.

coveralls · 2017-05-10T19:42:25Z

Coverage decreased (-2.2%) to 97.826% when pulling 7c850d0 on lafncow:PyPI into 5bed9ac on blag:master.

Set new correct test outputs for revised compression method. Editted comments to be within line limit.

coveralls · 2017-05-10T19:54:18Z

Coverage decreased (-2.2%) to 97.826% when pulling de4fe43 on lafncow:PyPI into 5bed9ac on blag:master.

coveralls · 2017-05-10T19:54:18Z

Coverage decreased (-2.2%) to 97.826% when pulling de4fe43 on lafncow:PyPI into 5bed9ac on blag:master.

coveralls · 2017-05-10T19:54:19Z

Coverage decreased (-2.2%) to 97.826% when pulling de4fe43 on lafncow:PyPI into 5bed9ac on blag:master.

lafncow · 2017-05-10T19:57:32Z

Changed tests to reflect the new compression method's outputs.

This feature is also discussed in the previous PR and in the original repo:
#1
zacharyvoase#1

blag · 2017-05-10T23:09:42Z

This is a nitpick, but...can you remove the previous checksum() method? It's useless now and not used anywhere.

blag · 2017-05-10T23:11:48Z

And thank you for writing out that explanation - this makes complete sense now and I'm all for it.

blag · 2017-05-11T03:10:02Z

Nevermind, I got to it after all. Thanks! 😄

blag · 2017-05-11T03:27:16Z

humanhash.py

        """

        bytes_list = list(bytes_)

        length = len(bytes_list)
-        if target > length:
-            raise ValueError("Fewer input bytes than requested output")


I'm curious, why did you remove this check?

I might flip this check back in the next version.

I removed this check to make the function more generous in what it accepts. I changed the comments to reflect the change. My thinking is that the purpose of compression is to ensure that there are no more than the target number of bytes, which is satisfied if target > length. So now if target > length, the original bytes are returned since there is no compression needed.

I do understand that this is a different behavior. The impact on end users is that hashing small inputs will not cause an error, which I think is more intuitive. Another solution would be to pad input bytes up to the target number, but this would lead to padded final outputs.

For example:

digest = '010203' # Old method humanhash.humanize(digest, 6) Traceback (most recent call last): ... ValueError: Fewer input bytes than requested output # New method humanhash.humanize(digest, 6) 'alabama-alanine-alaska' # Padding method humanhash.humanize(digest, 6) 'alabama-alanine-alaska-ack-ack-ack'

It's your call, I'm happy to help if I can. :)

Adam Cornille and others added 4 commits April 5, 2013 16:08

Revised compression method

26b3692

Excess bytes are now distributed amongst the compressed bytes, instead of being dumped into the final bit as they were before.

Compression method comments

6ad5c71

Merge branch 'master' into PyPI

7c850d0

Updated tests

de4fe43

Set new correct test outputs for revised compression method. Editted comments to be within line limit.

blag merged commit 1438145 into blag:master May 11, 2017

blag reviewed May 11, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised Compression Method #2

Revised Compression Method #2

lafncow commented May 10, 2017

coveralls commented May 10, 2017 •

edited

Loading

coveralls commented May 10, 2017

coveralls commented May 10, 2017

coveralls commented May 10, 2017 •

edited

Loading

lafncow commented May 10, 2017

blag commented May 10, 2017

blag commented May 10, 2017

blag commented May 11, 2017

blag May 11, 2017

lafncow May 11, 2017

Revised Compression Method #2

Revised Compression Method #2

Conversation

lafncow commented May 10, 2017

coveralls commented May 10, 2017 • edited Loading

coveralls commented May 10, 2017

coveralls commented May 10, 2017

coveralls commented May 10, 2017 • edited Loading

lafncow commented May 10, 2017

blag commented May 10, 2017

blag commented May 10, 2017

blag commented May 11, 2017

blag May 11, 2017

Choose a reason for hiding this comment

lafncow May 11, 2017

Choose a reason for hiding this comment

coveralls commented May 10, 2017 •

edited

Loading

coveralls commented May 10, 2017 •

edited

Loading