Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised compress method #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 26 additions & 21 deletions humanhash.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

import operator
import uuid as uuidlib
import math


DEFAULT_WORDLIST = (
Expand Down Expand Up @@ -99,30 +100,34 @@ def compress(bytes, target):
>>> HumanHasher.compress(bytes, 4)
[205, 128, 156, 96]

Attempting to compress a smaller number of bytes to a larger number is
an error:
If there are less than the target number bytes, the input bytes will be returned

>>> HumanHasher.compress(bytes, 15) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: Fewer input bytes than requested output
>>> HumanHasher.compress(bytes, 15)
[96, 173, 141, 13, 135, 27, 96, 149, 128, 130, 151]
"""

length = len(bytes)
if target > length:
raise ValueError("Fewer input bytes than requested output")

# Split `bytes` into `target` segments.
seg_size = length // target
segments = [bytes[i * seg_size:(i + 1) * seg_size]
for i in xrange(target)]
# Catch any left-over bytes in the last segment.
segments[-1].extend(bytes[target * seg_size:])

# Use a simple XOR checksum-like function for compression.
checksum = lambda bytes: reduce(operator.xor, bytes, 0)
checksums = map(checksum, segments)
return checksums
# If there are less than the target number bytes, the input bytes will be returned
if target >= length:
return bytes

# Split `bytes` evenly into `target` segments
# Each segment will be composed of `seg_size` bytes, rounded down for some segments
seg_size = float(length) / float(target)
# Initialize `target` number of segments
segments = [0] * target
seg_num = 0

# Use a simple XOR checksum-like function for compression
for i, byte in enumerate(bytes):
# Divide the byte index by the segment size to determine which segment to place it in
# Floor to create a valid segment index
# Min to ensure the index is within `target`
seg_num = min(int(math.floor(i / seg_size)), target-1)
# Apply XOR to the existing segment and the byte
segments[seg_num] = operator.xor(segments[seg_num], byte)

return segments

def uuid(self, **params):

Expand All @@ -139,4 +144,4 @@ def uuid(self, **params):

DEFAULT_HASHER = HumanHasher()
uuid = DEFAULT_HASHER.uuid
humanize = DEFAULT_HASHER.humanize
humanize = DEFAULT_HASHER.humanize