Debug a Flaky Visual Regression Test

October 2, 2020

By Gleb Bahmutov

In our Cypress RealWorld App we are using visual regression testing to catch unintended style and layout changes. We are using the Percy.io service, and you can see the screenshots at https://percy.io/cypress-io/cypress-realworld-app. During tests, we periodically take image snapshots using cy.percySnapshot command:

it("creates a new bank account", function () {
  cy.getBySel("sidenav-bankaccounts").click();

  cy.getBySel("bankaccount-new").click();
  cy.location("pathname").should("eq", "/bankaccounts/new");
  cy.percySnapshot("Display New Bank Account Form"); // 🖼

  cy.getBySelLike("bankName-input").type("The Best Bank");
  cy.getBySelLike("routingNumber-input").type("987654321");
  cy.getBySelLike("accountNumber-input").type("123456789");
  cy.percySnapshot("Fill out New Bank Account Form"); // 🖼
  cy.getBySelLike("submit").click();
  ...
})

The above test is typical - it goes through the user story and takes visual snapshots when the page has significant change.

We have noticed that some visual tests were flaky - Percy was reporting an image difference at almost every run, while there were no test changes. For example, the runs below where the only change was dependency bump all had one image screenshot that visually was different.

List of builds all showing a single changed image

Let's look at the image that has changed. Percy shows the original (baseline) image on the left, and the changed image on the right. Clicking highlights the changed pixels in red.

Percy highlights changed pixels

The balance in the baseline image is $1,681.37 while in the flaky test on the right the balance is $1,656.37. Let's dig in and find why.

Tip: you can find the code changes from this blog post in the pull request #571.

Duplicate screenshot names

The Percy run shows every screenshot under its name. In this case the name is "Transaction Payment Submitted Notification", let's find the test that saves it. Hmm, we see a problem - there are two different tests that save a screenshot with this title:

Two different Percy snapshots with the same name

Every visual snapshot should have its own name. Even if two tests are at the same logical point and the DOM screen looks exactly the same, the snapshots should have separate titles. Otherwise the tests are forever linked together and any changes in one test shows up as a visual flake due to the screenshot from the other test.

In this instance I simply renamed one of the screenshots to something else. In general, I suggest the following trick to ensure good image names: use the full test title plus optional suffix as the screenshot name. Here is a code snippet that creates a custom command to do exactly this:

// custom command to make taking snapshots with full name
// formed from the test title + suffix easier
Cypress.Commands.add('visualSnapshot', (maybeName) => {
  let snapshotTitle = cy.state('runnable').fullTitle()
  if (maybeName) {
    snapshotTitle = snapshotTitle + ' - ' + maybeName
  }
  cy.percySnapshot(snapshotTitle)
})

For smaller tests with a single snapshot we can take the default snapshot, and it will use the full test title.

describe('RealWorld App', () => {
  context('Bank Accounts', () => {
    it('creates a new bank account', () => {
      // saves the screenshot under name
      // "RealWorld App Bank Accounts creates a new bank account"
      cy.visualSnapshot()
    })
  })
})

For a longer test with several screenshots I add suffixes with numbers to list the screenshots in Percy in order matching the test progression.

describe('RealWorld App', () => {
  context('Bank Accounts', () => {
    it('creates a new bank account', () => {
      ...
      cy.visualSnapshot('1 new account form')
      ...
      cy.visualSnapshot('2 fills the form')
      ...
      cy.visualSnapshot('3 shows account created')
    })
  })
})

The screenshots from the above test will be named:

  • RealWorld App Bank Accounts - 1 new account form
  • RealWorld App Bank Accounts - 2 fills the form
  • RealWorld App Bank Accounts - 3 shows account created

This list of screenshots is nice to browse in the long list of screenshots.

Flaky DOM State

Even after renaming the snapshot, it still shows the balance mismatch. Let's inspect this test locally using cypress open mode. Even with a different database seed and different random user accounts and balances running locally we can see what the test is meant to do:

it("submits a transaction payment and verifies the deposit for the receiver", function () {
  cy.getBySel("nav-top-new-transaction").click();

  const transactionPayload = {
    transactionType: "payment",
    amount: 25,
    description: "Indian Food",
    sender: ctx.user,
    receiver: ctx.contact,
  };
  cy.createTransaction(transactionPayload);
  cy.wait("@createTransaction");
  cy.getBySel("new-transaction-create-another-transaction").should("be.visible");
  cy.percySnapshot("Transaction Payment Submitted Notification");

  cy.switchUser(ctx.contact!.username);
  ...
})

The test creates a transaction to pay $25, thus the original balance is X changes to X - $25. Seems like our visual snapshot gets confused: sometimes it takes a snapshot while the balance is still X, and sometimes it takes a snapshot after the balance has been updated to X - $25!

Let's confirm this: hover over the command preceding the cy.percySnapshot command in this fragment:

cy.wait("@createTransaction");
cy.getBySel("new-transaction-create-another-transaction").should("be.visible");
cy.percySnapshot("Transaction Payment Submitted Notification");

While running Cypress locally we typically do not actually send screenshots to Percy (we do it from CI only). Thus, the Percy command still shows in the Command Log as "Percy health check" and we can see a curious thing:

The DOM changes between an assertion and Percy command

When we hover over the assertion immediately preceding the Percy command the DOM shows the original balance X, and when we hover over the Percy command it shows the updated balance X - $25. This means there might be a race condition - sometimes the DOM has updated the balance element before Percy snapshot is taken, and sometimes it has not. Let's strengthen our test and assert the balance has been updated in the DOM before taking Percy snapshot.

Since we do not know the balance we expect to have (this depends on the database seed), we can check if the balance displayed is NOT the same as before.

it("submits a transaction payment and verifies the deposit for the receiver", function () {
  cy.getBySel("nav-top-new-transaction").click();

  const transactionPayload = {
    transactionType: "payment",
    amount: 25,
    description: "Indian Food",
    sender: ctx.user,
    receiver: ctx.contact,
  };

  // first let's grab the current balance from the UI
  let startBalance: string;
  cy.get("[data-test=sidenav-user-balance]")
    .invoke("text")
    .then((x) => {
      startBalance = x; // something like "$1,484.81"
      expect(startBalance).to.match(/\$\d/);
    });

  cy.createTransaction(transactionPayload);
  cy.wait("@createTransaction");
  cy.getBySel("new-transaction-create-another-transaction").should("be.visible");

  // make sure the new balance is displayed
  cy.get("[data-test=sidenav-user-balance]").should(($el) => {
    // here we only make sure the text has changed
    // we could also convert the balance to actual number
    // and confirm the new balance is the start balance - amount
    expect($el.text()).to.not.equal(startBalance);
  });
  cy.percySnapshot("Transaction Payment Submitted Notification");

  cy.switchUser(ctx.contact!.username);
  ...
})

Let's see the DOM behavior in the time-traveling debugger now. Notice in the screen recording below that the balance changes when we check the cy.get command, and does not change anymore when taking the visual snapshot. We have asserted the DOM has been updated before taking the snapshot.

The updated test ensures the DOM is stable before taking visual snapshot

Mobile Resolution

The Cypress RealWorld App runs the tests on CI using both Chrome and Firefox browsers. It also runs tests in desktop and mobile resolutions:

Separate CI jobs

After we have modified the test to ensure the balance has been updated before taking visual snapshot the mobile test has failed.

The updated test has failed when running in mobile resolution

The screenshot on failure recorded in Cypress Dashboard shows the browser, and turns out we do not display the balance on mobile screen!

Failed test in mobile resolution

Since there is no balance shown, its value will not affect the visual snapshot, and we can skip our balance element check in this case. Our code already has a utility function that returns true if the viewport matches the mobile resolution.

export const isMobile = () => {
  return Cypress.config("viewportWidth") < Cypress.env("mobileViewportWidthBreakpoint");
};

We can use isMobile in our test.

// first let's grab the current balance from the UI
let startBalance: string;
if (!isMobile()) {
  // only check the balance display in desktop resolution
  // as it is NOT shown on mobile screen
  cy.get("[data-test=sidenav-user-balance]")
    .invoke("text")
    .then((x) => {
      startBalance = x; // something like "$1,484.81"
      expect(startBalance).to.match(/\$\d/);
    });
}
cy.createTransaction(transactionPayload);
cy.wait("@createTransaction");
cy.getBySel("new-transaction-create-another-transaction").should("be.visible");

if (!isMobile()) {
  // make sure the new balance is displayed
  cy.get("[data-test=sidenav-user-balance]").should(($el) => {
    // here we only make sure the text has changed
    // we could also convert the balance to actual number
    // and confirm the new balance is the start balance - amount
    expect($el.text()).to.not.equal(startBalance);
  });
}
cy.percySnapshot("Transaction Payment Submitted Notification");

Now the tests pass consistently.

Bonus: the search results test

After a while, I have noticed Percy periodically getting visual differences in another test. The screenshot is shown below

Using the screenshot title "User List for Search: ..." we can find the test. The test iterates over the list of search attributes, and performs a search with each attribute.

const targetUser = ctx.allUsers![2];
const searchAttrs: (keyof User)[] = [
  "firstName",
  "lastName",
  "username",
  "email",
  "phoneNumber",
];
...
searchAttrs.forEach((attr: keyof User) => {
  cy.getBySel("user-list-search-input").type(targetUser[attr] as string, { force: true });
  cy.wait("@usersSearch");

  cy.getBySelLike("user-list-item")
    .first()
    .contains(targetUser[attr] as string);
  cy.percySnapshot(`User List for Search: ${targetUser[attr]}`);

  cy.focused().clear();
  cy.getBySel("users-list").should("be.empty");
  cy.percySnapshot("User List Clear Search");
});

The test verifies that we can find a user by the first name, by the last name, etc. Every iteration of the loop types the search text, waits for the user search network call, then checks if the top result contains the search query text. After visual snapshot command, the test clears the search field and confirms the search results list is empty.

Let's make this test stricter. First, I love to divide in the Command Log for each iteration of the loop:

searchAttrs.forEach((attr: keyof User) => {
  cy.log(`Searching by **${attr}**`);
  ...
})

Now it is clear what the test is doing by looking at the command log.

In the flaky visual test result, we see 1 search result on the left, and 2 search results, and that's what causing the difference. The search is based on autocomplete - the search results list is refreshing as we type. As is common in autocomplete widgets, the results arrive dynamically. Look at the Gif below where I search by typing quickly - the search results is refreshing some time after.

Have you noticed the second user appearing very briefly and then disappearing? What if we happen to take a visual snapshot while the second user is present in the search result? We have a race condition!

Let's tighten the test. We assume that a single unique search result is returned by the API. Thus we can assert it from the test before taking a visual snapshot. While we are at it - let's use a better Percy snapshot title. Let's include the search attribute and its value.

cy.getBySelLike("user-list-item")
  // make sure the list of results is fully updated
  .should("have.length", 1)
  .first()
  .contains(targetUser[attr] as string);
cy.percySnapshot(`User List for Search: ${attr} = ${targetUser[attr]}`);

With this functional assertion before taking a visual snapshot the test stabilizes.

Updated test ensures the application shows the latest search results before taking snapshot

You can find the full change I made to this test in pull request #605.