Feat/hash serialization #1382

subzey · 2021-10-01T17:49:17Z

This PR contains a:

Motivation / Use-Case

Current implementation incorrectly uses the hash buffer as an entropy source:

  const localIdentHash = hash
    .digest(hashDigest)
    .slice(0, hashDigestLength)
    .replace(/[/+]/g, "_")
    .replace(/^\d/g, "_");

Instead of tossing away unacceptable values they are replaced with other ones. This results a probability "skew" that is most noticeable for a first character. For base64 the alphabet (to pick the pseudorandom value from)

A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z a b c d e f
g h i j k l m n o p q r s t u v
w x y z 0 1 2 3 4 5 6 7 8 9 + /

becomes

A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z a b c d e f
g h i j k l m n o p q r s t u v
w x y z _ _ _ _ _ _ _ _ _ _ _ _

As you can see, the underscore has significantly more chances to be picked. It raises the collision probability for the first character from 1.56% (1 / 64) up to 4.79%!

All the subsequent characters are affected, too. They are drawn form the alphabet with two underscores:

A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z a b c d e f
g h i j k l m n o p q r s t u v
w x y z 0 1 2 3 4 5 6 7 8 9 _ _

The proposed solution is to:

Throw away the unacceptable leading digits instead of trying to replace them.
Replace / → _ and + → - (instead of both / and + → _)

As the parts of digest are tossed away, there is a (small) chance it won't be enough to generate the string of the desired length.

To overcome this issue the string is now hashed several times until it's long enough. Each hashing tier has its unique sequential salt. For MD4 / MD5 the max string length generated this way is 68 719 476 736. That should be enough to throw away any amount of unacceptable characters.

This is essentially a PBKDF / PBKDF2 but with 1 round and indefinite length: the length grows on demand.

Most of the time only one hashing is required so this change should not significantly impact the performance.

Breaking Changes

⚠️ The localIdentHash now may contain dashes (-), including leading and trailing ones. It should be okay for classnames.

⚠️ The hashing has changed and all the classnames gets another hashes. (The snapshots were updated accordingly). This may come as a surprise for users.

Additional Info

I've made a synthetic test to show this PR actually improves something:

https://gist.github.com/subzey/3f1a0efd1634bdf0b03ba2a2e744bc3e

Both implementations are called until the first collision. This is repeated 101 times then a median value is picked. This value corresponds to a 50% collision chance.

Current implementation can generate ≈ 20200 hashes until the 50% collision chance.

The proposed implementation can generate ≈ 35600 hashes until the 50% collision chance.

The expected value for true random is ≈ 35440.

alexander-akait · 2021-10-01T17:55:28Z

warning The localIdentHash now may contain dashes (-), including leading and trailing ones. It should be okay for classnames.

we should avoid it

warning The hashing has changed and all the classnames gets another hashes. (The snapshots were updated accordingly). This may come as a surprise for users.

it is fine

subzey · 2021-10-01T17:58:10Z

we should avoid it [the dash]

Okay, that's actually fairly easy to do - just throw away the dashes with the leading digits :D Give me a second

subzey · 2021-10-01T18:11:45Z

@alexander-akait Done in 101989a (impln) and 817368b (snaps)

subzey · 2021-10-04T11:57:36Z

Looks like the [email protected] introduced a new hash xxhash64 and that's the reason the hex hash in the filenames doesn't match.

On the other hand, xxhash64 doesn't accept the Uint32Array as input data, I'll fix that shortly in this PR.

subzey · 2021-10-04T12:22:44Z

Updated webpack in package.json to actually match webpack@latest in tests. Updated the test to contain a now-default xxhash64 digest.

Fixed defaultGetLocalIdent to work with xxhash64. And a new test that it actually works.

alexander-akait · 2021-10-07T12:48:34Z

Can you rebase?

subzey · 2021-10-07T14:04:08Z

Sure, a moment please

Update snaphots

subzey · 2021-10-07T14:28:44Z

Here.

I've accidentally squashed the code changes commit and the snaps update commit into a single one while rebasing. I hope that's fine.

codecov · 2021-10-07T14:35:07Z

Codecov Report

Merging #1382 (725e4a5) into master (a56bd94) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1382   +/-   ##
=======================================
  Coverage   98.43%   98.43%           
=======================================
  Files          12       12           
  Lines        1084     1088    +4     
  Branches      375      375           
=======================================
+ Hits         1067     1071    +4     
  Misses         14       14           
  Partials        3        3

Impacted Files	Coverage Δ
src/utils.js	`97.65% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a56bd94...725e4a5. Read the comment docs.

alexander-akait · 2021-10-07T14:41:54Z

src/utils.js

+  let localIdentHash = "";
+  for (let tier = 0; localIdentHash.length < hashDigestLength; tier++) {
+    // eslint-disable-next-line no-underscore-dangle
+    const hash = loaderContext._compiler.webpack.util.createHash(hashFunction);


I am afraid only for perf here, create multiple times hash is not good idea, can you explain why we need create it again and again and do not reuse existing hash?

The 2nd iteration is only required if the digest length is not enough to generate a pseudorandom string long enough.

If we won't change anything between the iterations then each iteration would generate the same output over and over and the string would just cycle:

If hash(str) = "ABCDE", then it would be "ABCDEABCDEABCDEA". Even if this str is 16 chars long its "uniqueness" is only 5 chars long.

I've ran npm test:only logging the number of times this loop iterated per function call.

1375 calls only ran this loop once.

122 twice.

~~2 calls took three iterations~~

Upd: My bad, jest spawns 4 processes and they were concurrently writing the same file 😓

1672 - 1 time
125 - 2 times
2 - 3 times

I've investigated those two suspicious calls where the loop is iterated 3 times.

It's my own xxhash64 test from this very PR. I'm asking a hex digest of length 20 (= 80bits) from a digest that can only produce 64 bits and have a really high chance to produce a long run of leading digits.

Iteration 0:
digest is 7268299549203d4d
Leading digits are dropped, the resulting string so far is:
~~7268299549203~~ d4d

Iteration 1:
digest is 26a899717294ba28 =>
~~7268299549203~~ d4d 26a899717294ba28

Iteration 2:
digest is 45400c2599876c30. The string is trimmed at length 20:
~~7268299549203~~ d4d 26a899717294ba28 4 ~~5400c2599876c30~~

alexander-akait · 2021-10-07T15:48:25Z

src/utils.js

+  for (let tier = 0; localIdentHash.length < hashDigestLength; tier++) {
+    // eslint-disable-next-line no-underscore-dangle
+    const hash = loaderContext._compiler.webpack.util.createHash(hashFunction);
+    const { hashSalt } = options;


Let's move this under loop, because we don't change it

Good point! Will fix

Fixed in 725e4a5

alexander-akait · 2021-10-07T15:49:07Z

src/utils.js

+    localIdentHash = (localIdentHash + hash.digest(hashDigest))
+      .replace(/^\d+/, "")
+      .replace(/\//g, "_")
+      .replace(/\W+/g, "")


And I am afraid about these changes, can you explain

Drop all leading digits
Then replace any / with '_'
Then drop anything except a-zA-Z0-9_ (including + and '=' of base64)

I could change the last one to /[^a-zA-Z0-9_]+/, for better readablility

Changed in 725e4a5

Review fixes

alexander-akait · 2021-10-09T12:26:00Z

Thanks

ThiefMaster · 2021-10-29T15:37:29Z

FFS, why is this in a patch release? Not even a minor release, a PATCH release. This kind of change deserves a major version bump since it breaks e.g. babel-plugin-react-css-modules.

alexander-akait · 2021-10-29T15:39:04Z

@ThiefMaster it is MINOR release - v6.3.0...v6.4.0

ThiefMaster · 2021-10-29T15:41:04Z

Indeed, I just realized that I'm still on v5 since I didn't do major version bumps and updated the babel plugin (which did not use a major version bump for this). Sorry for ranting early.

alexander-akait · 2021-10-29T15:42:21Z

if babel-plugin-react-css-modules want to be align with out logic they should install us in dev deps and reuse, I can't fix it here and we cannot ignore problems in logic

subzey force-pushed the feat/hash-serialization branch from dd56b6b to 247090d Compare October 7, 2021 14:16

feat: generate more collision resistant localIdentHash'es:

aba706d

Update snaphots

subzey force-pushed the feat/hash-serialization branch from 247090d to aba706d Compare October 7, 2021 14:25

alexander-akait reviewed Oct 7, 2021

View reviewed changes

feat: generate more collision resistant localIdentHash'es:

725e4a5

Review fixes

alexander-akait merged commit c7db752 into webpack-contrib:master Oct 9, 2021

subzey deleted the feat/hash-serialization branch October 11, 2021 09:52

OliverJAsh mentioned this pull request Oct 11, 2021

SSR: generated class names are not consistent for client and server bundles since v5.1.3 #1384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/hash serialization #1382

Feat/hash serialization #1382

subzey commented Oct 1, 2021

alexander-akait commented Oct 1, 2021

subzey commented Oct 1, 2021

subzey commented Oct 1, 2021

subzey commented Oct 4, 2021

subzey commented Oct 4, 2021

alexander-akait commented Oct 7, 2021

subzey commented Oct 7, 2021

subzey commented Oct 7, 2021

codecov bot commented Oct 7, 2021 •

edited

Loading

alexander-akait Oct 7, 2021

subzey Oct 7, 2021 •

edited

Loading

subzey Oct 7, 2021 •

edited

Loading

subzey Oct 7, 2021

alexander-akait Oct 7, 2021

subzey Oct 7, 2021

subzey Oct 7, 2021

alexander-akait Oct 7, 2021

subzey Oct 7, 2021

subzey Oct 7, 2021

alexander-akait commented Oct 9, 2021

ThiefMaster commented Oct 29, 2021 •

edited

Loading

alexander-akait commented Oct 29, 2021

ThiefMaster commented Oct 29, 2021

alexander-akait commented Oct 29, 2021

Feat/hash serialization #1382

Feat/hash serialization #1382

Conversation

subzey commented Oct 1, 2021

Motivation / Use-Case

Breaking Changes

Additional Info

alexander-akait commented Oct 1, 2021

subzey commented Oct 1, 2021

subzey commented Oct 1, 2021

subzey commented Oct 4, 2021

subzey commented Oct 4, 2021

alexander-akait commented Oct 7, 2021

subzey commented Oct 7, 2021

subzey commented Oct 7, 2021

codecov bot commented Oct 7, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

subzey Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

subzey Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexander-akait commented Oct 9, 2021

ThiefMaster commented Oct 29, 2021 • edited Loading

alexander-akait commented Oct 29, 2021

ThiefMaster commented Oct 29, 2021

alexander-akait commented Oct 29, 2021

codecov bot commented Oct 7, 2021 •

edited

Loading

subzey Oct 7, 2021 •

edited

Loading

subzey Oct 7, 2021 •

edited

Loading

ThiefMaster commented Oct 29, 2021 •

edited

Loading