Visual comparison with PW— worth it?!

3 min readOct 24, 2023

Each time I hear about visual comparison, I think how awesome it is to check the UI graphically. If each button, frame, table, form, and so on is in place and has the shape it should, the UI is visually correct. Before Playwright, almost every testing framework needed an additional library to validate the UI graphically. But then, Playwright stepped in with a ‘built-in’ mechanism.

How to start

Visual comparison in PW is almost the same as regular expect(). The easiest comparison (full page — with default threshold) is as follows:

test('example test', async ({ page }) => {
  await page.goto('https://medium.com');
  await expect(page).toHaveScreenshot();
});

This simple test enters the medium main page and checks if the layout is compliant with the baseline (reference PNG). If there is no baseline, the first run will create it. As you can see, this is almost the same as regular expect(), but with no additional parameters. This means that the name of the screenshot will be somewhat generic. In that case, the screenshot will be created based on the suite file name and the browser that was used for the test execution (e.g., Chromium, Firefox, etc.). Of course, you can name the screenshot (and I encourage you to do so) by simply filling the name in brackets, like this: toHaveScreenshot(“pw-main-page.png”). The created baseline (or the screenshots that will be used as a reference) is located in the current spec file directory, under a newly created folder named after the spec name with -snapshots appended to the end.

What in case of failure?

If visual comparison fails, you will receive a PNG file with a diff as well as the PNG file that was currently grabbed. This will be located in the test-results folder — if you didn’t change the default config. Here is how it looks for the main medium page:

As you can see, all the differences are marked in red, making them easy to find. How to deal with this if you have a lot of false-positive failures?

Playwright visual comparison uses the pixelmatch library, which is great because it allows us to set a threshold or maximum pixel difference that is acceptable in our test. If you want to use the pixel diff approach, simply add it in the function brackets:

  await expect(page).toHaveScreenshot({ maxDiffPixels: 100 });

This will allow a difference of 100 pixels, meaning that up to 100 pixels could be different from the reference image. Another option that could be used is ‘maxDiffPixelsRatio’, which specifies the acceptable ratio of different pixels to the total number of pixels, between 0 and 1 (similar to a percentage range). Finally, threshold specifies the acceptable perceived colour difference in the YIQ color space between the same pixel in the compared images, between zero (strict) and one (lax).

All of these options can be configured for each individual expect() call, or globally for the entire project. To configure them globally, simply define them in the project config file, like this:

export default defineConfig({
  expect: {
    toHaveScreenshot: { maxDiffPixels: 100 },
  },
});

Of course, like any global variable, this one can also be overwritten in a particular instance. The toHaveScreenshot() method has many other configuration options, which are all described here. I have only mentioned the most useful ones. Also, and this may be obvious to some but worth mentioning anyway, this expect() call can be used not only for the whole page, but also with a selector to a particular element, such as a form frame or a single graph.

One last thing: how to update all references? To update all references, run the test with the additional flag ‘— — update-snapshots’ (e.g., npx playwright test — — update-snapshots).

Summing up

As you can see, Playwright allows you to easily configure, use, and maintain visual comparison. After some tuning with the parameters like ‘maxDiffPixels’ or ‘threshold’, it should be relatively stable. However, you should remember that visual comparison depends heavily on the rendering speed, network condition, and even the hardware on which the tests are running. This is why I’m not always 100% sure if it’s worth doing, as it could be a waste of time. So, have fun and experiments. ;)

Visual comparison with PW— worth it?!

How to start

What in case of failure?

Summing up

Written by mati-qa

No responses yet