[Question] What does threshold mean in a method toMatchSnapshot #10219

KirProkopchik · 2021-11-10T15:44:28Z

Hi all,
Now I am using jest as a test runner with jest-playwright-preset and jest-image-snapshot for image comparisons.
Versions:
jest: 27.0.4
jest-playwright-preset: 1.7.0
jest-image-snapshot: 4.5.1
playwright: 1.16.2

Screenshot comparisons looks like:

const customConfig = { threshold: 0.5 };
  expect(image).toMatchImageSnapshot({
    customDiffConfig: customConfig,
  });

Where threshold is the ratio of differing to the total number of pixels. Or the maximum value of differing pixels, in case if threshold
is > 1.

When I started migrating to the native playwright runner, I faced with the image comparison issue. The expected and actual screenshots are not equals. But the difference is only a few pixels. For the image with resolution 1920x1080. I tried to set the threshold 0.5 or even more. Without any positive results.

I started looking the pixlematch source and find out that threshold determines how much each pixel can differ in color, and not the number of different pixels.
https://github.com/mapbox/pixelmatch/blob/b9261a447515f5aff37a15cfab9f4a491868f720/index.js#L45

Further, the number of different pixels is simply compared with zeros in playwright matcher.

playwright/packages/playwright-test/src/matchers/golden.ts

Line 69 in 2a0a44b

return count > 0 ? { diff: PNG.sync.write(diff) } : null;

In my case, this approach leads to failures even for several pixels with a fairly high threshold.

The text was updated successfully, but these errors were encountered:

aslushnikov · 2021-11-10T23:16:17Z

@KirProkopchik Thank you for the input! Pixelmatch threshold is not very well explained in their docs, so we could've been confused. Let me look into this.

GitHubby23 · 2021-11-23T13:45:01Z

I observe the same behavior. Subsequent runs can take new snapshots that differ by 1 pixel in resolution. toMatchSnapshot fails regardless of threshold value, even when bumped up to 1.

p01 · 2021-12-09T10:49:14Z

In Pixelmatch which is used so far by the method toMatchSnapshot(...), the threshold is about the percentage of color difference for one individual pixel.

If a single pixel exceeds that threshold, the images are considered different.
Images are also considered different if they have different resolution.

dimkin-eu · 2021-12-23T07:50:46Z

old one ticket, where pics with big and small diffs are same after same threshold
#9444

maybe its worth to investigate usage of some alternative?

frkj600 · 2022-02-02T14:55:17Z

Hi Team,
Even I noticed the same behavior while running test cases in BrowserStack, threshold is set to 1
"@playwright/test": "^1.17.2",
"playwright": "^1.17.2",

p01 · 2022-02-02T15:49:22Z

@dimkin-eu I developed a pixel-buffer-diff library that takes care of partial pixel differences to reduce false positive. Additionally it is 4x faster than pixelMatch ( the image comparison lib used in Playwright ). You can see the PR #10823 which tried to bump to an early version of the library. The current version is 1.3.2 addressed some issues found around that time.

Could you try on your project and let us know if it helped ?

@fanjum66 the screenshots in your example have different resolutions and therefore could not be diffed.

frkj600 · 2022-02-03T05:32:58Z

@dimkin-eu I developed a pixel-buffer-diff library that takes care of partial pixel differences to reduce false positive. Additionally it is 4x faster than pixelMatch ( the image comparison lib used in Playwright ). You can see PR #10823 which tried to bump to an early version of the library. The current version is 1.3.2 addressed some issues found around that time.

Could you try on your project and let us know if it helped ?

@fanjum66 the screenshots in your example have different resolutions and therefore could not be diffed.

@p01 - Thanks for your quick response, I think even though a different library is used the below line still throw an error as PNG library is used to compare the sizes https://github.com/p01/playwright/blob/5148c1dbf62ae2bc3c36083fc9d9fa54c9405e0d/packages/playwright-test/src/matchers/golden.ts#L63

It could be better if a flag can be used to ignore the size diff and focus only on the data change.

And is there any playwright version I can use to test the mentioned fix ?

p01 · 2022-02-03T10:45:58Z

@fanjum66 In your example, one image is 1px taller than the other. How can the pixel buffer diffing library make sense of that ? Is the whole page/UI really 1px taller ? Is everything shifted 1px to the bottom ? to the top ? Any way you look at it these snapshots won't match and need to be reviewed. It could be a true positive where your page/UI really did shrank by 1px but it is likely a real issue that needs to be investigated to figure why the screenshots have different resolution to fix the tests and ensure it is deterministic.

To try pixel-buffer-diff in a recent version of Playwright, one could fork Playwright and cherry-pick / redo the commit #9f913d94791e3e11dff98461609c729069fffc15 with the latest version of pixel-buffer-diff. If you have access to Github Codespaces, this should be fairly quick. you could pull the PR #11838 and use the env variable PW_USE_PIXEL_BUFFER_DIFF

p01 · 2022-02-03T10:46:04Z

@aslushnikov @pavelfeldman @dgozman , it seems that Playwright uses a threshold of 0.2, that's 20% difference per pixel. Was this threshold inspired by other test runners ?

playwright/packages/playwright-test/src/matchers/golden.ts

Line 72 in 1215057

const thresholdOptions = { threshold: 0.2, ...options };

If we look at the colors black, #000000, this 0.2 threshold means that the further color Playwright will mark as the same is #252525. This is a very significant difference.

In our tests, we use 0.03, which brings the furthest possible color from black to #050505.

p01 · 2022-02-03T14:50:05Z

Let's open a PR #11838 Add pixel-buffer-diff behind env variable opt-in, so anyone who want can try it without disrupting existing tests.

aslushnikov · 2022-02-17T01:51:39Z

If we look at the colors black, #000000, this 0.2 threshold means that the further color Playwright will mark as the same is #252525. This is a very significant difference. In our tests, we use 0.03, which brings the furthest possible color from black to #050505.

@p01 Did our defaults result in some tests passing erroneously? With the new suggested pixelCount option (with a non-zero default value), it might be possible to tighten the threshold back to some extend.

This patch adds additional options to `toMatchSnapshot` method: - `pixelCount` - acceptable number of pixels that differ to still consider images equal. - `pixelRatio` - acceptable percentage of pixels that differ to still consider images equal. Since some anti-aliasing artifacts still can cripple in, we default `pixelCount` to some arbitrary small number - `17` - to improve tolerance. Fixes microsoft#12167, microsoft#10219

…12169) This patch adds additional options to `toMatchSnapshot` method: - `pixelCount` - acceptable number of pixels that differ to still consider images equal. Unset by default. - `pixelRatio` - acceptable ratio of all image pixels (from 0 to 1) that differ to still consider images equal. Unset by default. Fixes #12167, #10219

aslushnikov · 2022-02-17T23:48:22Z

Everybody: We just landed two new options to toMatchSnapshot: pixelCount and pixelRatio.
With these, you can have the following options to configure snapshot matching:

threshold <[float]> an acceptable percieved color difference in the YIQ color space between pixels in compared images, between zero (strict) and one (lax). Defaults to 0.2.
pixelCount <[int]> an acceptable amount of pixels that could be different, unset by default.
pixelRatio <[float]> an acceptable ratio of pixels that are different to the total amount of pixels, between 0 and 1 , unset by default.

You can give them a try using npm i @playwright/test@next. Let us know how it goes!

turnkey-commerce · 2022-03-08T19:21:46Z

Everybody: We just landed two new options to toMatchSnapshot: pixelCount and pixelRatio. With these, you can have the following options to configure snapshot matching:

threshold <[float]> an acceptable percieved color difference in the YIQ color space between pixels in compared images, between zero (strict) and one (lax). Defaults to 0.2.

pixelCount <[int]> an acceptable amount of pixels that could be different, unset by default.

pixelRatio <[float]> an acceptable ratio of pixels that are different to the total amount of pixels, between 0 and 1 , unset by default.

You can give them a try using npm i @playwright/test@next. Let us know how it goes!

@aslushnikov I gave them a try on a scenario and they work well for me, thanks very much. I found the options to actually be named as:

maxDiffPixelRatio
maxDiffPixels

aslushnikov self-assigned this Nov 10, 2021

aslushnikov added the v1.18 label Nov 10, 2021

dgozman added the feature-visual-regression-testing label Dec 6, 2021

pavelfeldman added v1.19 and removed v1.18 labels Jan 4, 2022

csouchet mentioned this issue Jan 18, 2022

[POC] Replace the Jest test runner by the Playwright test runner process-analytics/bpmn-visualization-js#1709

Closed

22 tasks

aslushnikov added v1.20 and removed v1.19 labels Feb 2, 2022

aslushnikov mentioned this issue Feb 9, 2022

[Question] Is there a way to ignore small difference between snapshots? #7346

Closed

aslushnikov mentioned this issue Feb 17, 2022

feat(test-runner): introduce pixelCount and pixelRatio options #12169

Merged

aslushnikov closed this as completed Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What does threshold mean in a method toMatchSnapshot #10219

[Question] What does threshold mean in a method toMatchSnapshot #10219

KirProkopchik commented Nov 10, 2021

aslushnikov commented Nov 10, 2021

GitHubby23 commented Nov 23, 2021 •

edited

Loading

p01 commented Dec 9, 2021

dimkin-eu commented Dec 23, 2021 •

edited

Loading

frkj600 commented Feb 2, 2022 •

edited

Loading

p01 commented Feb 2, 2022

frkj600 commented Feb 3, 2022

p01 commented Feb 3, 2022 •

edited

Loading

p01 commented Feb 3, 2022

p01 commented Feb 3, 2022

aslushnikov commented Feb 17, 2022

aslushnikov commented Feb 17, 2022

turnkey-commerce commented Mar 8, 2022

[Question] What does threshold mean in a method toMatchSnapshot #10219

[Question] What does threshold mean in a method toMatchSnapshot #10219

Comments

KirProkopchik commented Nov 10, 2021

aslushnikov commented Nov 10, 2021

GitHubby23 commented Nov 23, 2021 • edited Loading

p01 commented Dec 9, 2021

dimkin-eu commented Dec 23, 2021 • edited Loading

frkj600 commented Feb 2, 2022 • edited Loading

p01 commented Feb 2, 2022

frkj600 commented Feb 3, 2022

p01 commented Feb 3, 2022 • edited Loading

p01 commented Feb 3, 2022

p01 commented Feb 3, 2022

aslushnikov commented Feb 17, 2022

aslushnikov commented Feb 17, 2022

turnkey-commerce commented Mar 8, 2022

GitHubby23 commented Nov 23, 2021 •

edited

Loading

dimkin-eu commented Dec 23, 2021 •

edited

Loading

frkj600 commented Feb 2, 2022 •

edited

Loading

p01 commented Feb 3, 2022 •

edited

Loading