-
-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/hash serialization #1382
Feat/hash serialization #1382
Conversation
we should avoid it
it is fine |
Okay, that's actually fairly easy to do - just throw away the dashes with the leading digits :D Give me a second |
@alexander-akait Done in 101989a (impln) and 817368b (snaps) |
Looks like the [email protected] introduced a new hash On the other hand, |
Updated webpack in package.json to actually match Fixed |
Can you rebase? |
Sure, a moment please |
dd56b6b
to
247090d
Compare
Update snaphots
247090d
to
aba706d
Compare
Here. I've accidentally squashed the code changes commit and the snaps update commit into a single one while rebasing. I hope that's fine. |
Codecov Report
@@ Coverage Diff @@
## master #1382 +/- ##
=======================================
Coverage 98.43% 98.43%
=======================================
Files 12 12
Lines 1084 1088 +4
Branches 375 375
=======================================
+ Hits 1067 1071 +4
Misses 14 14
Partials 3 3
Continue to review full report at Codecov.
|
let localIdentHash = ""; | ||
for (let tier = 0; localIdentHash.length < hashDigestLength; tier++) { | ||
// eslint-disable-next-line no-underscore-dangle | ||
const hash = loaderContext._compiler.webpack.util.createHash(hashFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid only for perf here, create multiple times hash is not good idea, can you explain why we need create it again and again and do not reuse existing hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd iteration is only required if the digest length is not enough to generate a pseudorandom string long enough.
If we won't change anything between the iterations then each iteration would generate the same output over and over and the string would just cycle:
If hash(str) = "ABCDE"
, then it would be "ABCDEABCDEABCDEA"
. Even if this str is 16 chars long its "uniqueness" is only 5 chars long.
I've ran npm test:only
logging the number of times this loop iterated per function call.
1375 calls only ran this loop once.
122 twice.
2 calls took three iterations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upd: My bad, jest
spawns 4 processes and they were concurrently writing the same file 😓
1672 - 1 time
125 - 2 times
2 - 3 times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated those two suspicious calls where the loop is iterated 3 times.
It's my own xxhash64
test from this very PR. I'm asking a hex digest of length 20 (= 80bits) from a digest that can only produce 64 bits and have a really high chance to produce a long run of leading digits.
Iteration 0:
digest
is 7268299549203d4d
Leading digits are dropped, the resulting string so far is:
7268299549203
d4d
Iteration 1:
digest
is 26a899717294ba28
=>
7268299549203
d4d
26a899717294ba28
Iteration 2:
digest
is 45400c2599876c30
. The string is trimmed at length 20:
7268299549203
d4d
26a899717294ba28
4
5400c2599876c30
src/utils.js
Outdated
for (let tier = 0; localIdentHash.length < hashDigestLength; tier++) { | ||
// eslint-disable-next-line no-underscore-dangle | ||
const hash = loaderContext._compiler.webpack.util.createHash(hashFunction); | ||
const { hashSalt } = options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this under loop, because we don't change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 725e4a5
src/utils.js
Outdated
localIdentHash = (localIdentHash + hash.digest(hashDigest)) | ||
.replace(/^\d+/, "") | ||
.replace(/\//g, "_") | ||
.replace(/\W+/g, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I am afraid about these changes, can you explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop all leading digits
Then replace any /
with '_'
Then drop anything except a-zA-Z0-9_
(including +
and '=' of base64)
I could change the last one to /[^a-zA-Z0-9_]+/, for better readablility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 725e4a5
Thanks |
FFS, why is this in a patch release? Not even a minor release, a PATCH release. This kind of change deserves a major version bump since it breaks e.g. babel-plugin-react-css-modules. |
@ThiefMaster it is MINOR release - v6.3.0...v6.4.0 |
Indeed, I just realized that I'm still on v5 since I didn't do major version bumps and updated the babel plugin (which did not use a major version bump for this). Sorry for ranting early. |
if |
This PR contains a:
Motivation / Use-Case
Current implementation incorrectly uses the hash buffer as an entropy source:
Instead of tossing away unacceptable values they are replaced with other ones. This results a probability "skew" that is most noticeable for a first character. For base64 the alphabet (to pick the pseudorandom value from)
becomes
As you can see, the underscore has significantly more chances to be picked. It raises the collision probability for the first character from 1.56% (1 / 64) up to 4.79%!
All the subsequent characters are affected, too. They are drawn form the alphabet with two underscores:
The proposed solution is to:
/
→_
and+
→-
(instead of both/
and+
→_
)As the parts of digest are tossed away, there is a (small) chance it won't be enough to generate the string of the desired length.
To overcome this issue the string is now hashed several times until it's long enough. Each hashing tier has its unique sequential salt. For MD4 / MD5 the max string length generated this way is 68 719 476 736. That should be enough to throw away any amount of unacceptable characters.
This is essentially a PBKDF / PBKDF2 but with 1 round and indefinite length: the length grows on demand.
Most of the time only one hashing is required so this change should not significantly impact the performance.
Breaking Changes
localIdentHash
now may contain dashes (-
), including leading and trailing ones. It should be okay for classnames.Additional Info
I've made a synthetic test to show this PR actually improves something:
https://gist.github.com/subzey/3f1a0efd1634bdf0b03ba2a2e744bc3e
Both implementations are called until the first collision. This is repeated 101 times then a median value is picked. This value corresponds to a 50% collision chance.
Current implementation can generate ≈ 20200 hashes until the 50% collision chance.
The proposed implementation can generate ≈ 35600 hashes until the 50% collision chance.
The expected value for true random is ≈ 35440.