A Day at the Races: Avoiding Random Failures in Selenium UI Tests

Posted by Mike Pennisi

Oct 15 2014

Selenium is an indispensable tool for developing web applications. It allows developers to write test scripts that control real browsers and ensure their applications behave in the way that users expect. Tests like these make software development much more pleasant–developers can have much greater certainty that their application is functioning correctly even after large refactoring operations.

There’s a dark side to UI testing, though. So-called “race conditions” can lead to unexpected and intermittent test failures. Such failures undermine developer confidence in their test suites at large, and this subverts the entire motivation for maintaining tests in the first place.

Programmers have been struggling with “races” in their logic for as long as they have been writing asynchronous code. The term generally applies to any behavior that is unintentionally dependent on the timing of uncontrollable events. It’s called a “race” because the outcome depends on the order of events–the “winning” event will determine the result.

Now, if I learned anything from the 1993 blockbuster film Cool Runnings, it’s that races are an exciting opportunity to prove yourself in the face of adversity. Very early on in my time debugging software race conditions, I began to doubt this lesson. There is nothing remotely inspiring about an intermittent ElementNotFoundError. Your hometown isn’t crowded around a television rooting you on, and John Candy isn’t waiting to give you a bear hug at the finish line. Usually, it’s late and you’re trying to explain to another developer why the continuous integration server is reporting a totally unrelated failure on their feature branch. Really, these races have more in common with the ones in 1987’s The Running Man.

It doesn’t have to be this way, though. In this post, I’ll explain where these race conditions originate, how to avoid them, and how to debug them in existing tests. I can’t promise that your controlling father will finally begin to respect your autonomy, but at least you won’t have to sweat intermittent failures.

Race Conditions in Selenium UI Tests

If you’ve never used Selenium before, you may be wondering how race conditions are relevant. “Where is the uncontrollable event in a UI test?” To answer this question, let’s take a quick detour into the Selenium architecture.

In the diagram below, you can see how the Selenium server is responsible for sending users commands (i.e. “click the mouse here”) to the browser.

A Selenium server interaction with a hidden race condition

When I first learned about this architecture, I was surprised to discover that the Selenium server, despite initiating events like this, does not actually know that the web application has responded. The web browser takes some additional time to respond and trigger your code to update the UI. See the delay introduced by the connection between your tests and the Selenium server? This delay is usually “long enough” so that the UI has time to update. In other words, the UI usually “wins the race”. Because this is so often the case, it’s easy to write tests that assume that user commands are handled immediately. But it’s always possible that the tests could “win the race”, and that the UI has not been updated yet. Here’s what that might look like:

The same Selenium server interaction when the race condition is expressed

One explanation for this is related to JavaScript’s single-threaded nature. The web browser can only process one thing at a time, so it uses an internal “scheduler” to manage dynamic events. If the browser is busy with another task when the Selenium event arrives, it will schedule the work to be done in the future. So once Selenium initiates the event, it reports back to your tests, “I’ve sent the click”. At some later time, the browser finishes responding to previously-queued events and handles that “click” event.

Here’s a more complete diagram that describes how the scheduler’s state effects the race outcome:

A more detailed view of the same race condition

In short: the uncontrollable variable that makes race conditions possible is the state of the browser’s internal scheduler. (For a short-and-sweet overview of scheduling in JavaScript, check out Help, I’m stuck in an event loop by Philip Roberts.) When the UI “loses the race”, tests will likely fail in non-obvious ways.

Avoidance Techniques

Ideally, you should try to author your tests to be race-free from day one. To review, the basic problem occurs when tests are too “eager”–they interact with the UI and expect immediate results. Although this might work most of the time, those failures creep in during extenuating circumstances. There are a number of ways you might try to say, “be cool, tests.”

For each of these approaches, let’s use the Clock application from Mozilla’s Firefox OS as an example. That app has a “Stopwatch” feature which allows users to start and stop a digital stopwatch. Here it is in action:

The Stopwatch feature in Gaia's "Clock" application

I’ll be writing example tests with the “WebDriverJs” JavaScript binding for Selenium, but these approaches are relevant for tests written with any binding and any programming language. Let’s say we want to test that the “Pause” button is displayed after the user starts the timer. Here’s the naive approach:

driver.findElement(By.css('.start-btn'));
  }).then(function(startBtn)) {

    // Race! Will the UI respond to this "click" before
    // we check for the pause button?
    return startBtn.click();
  }).then(function() {
    return driver.isElementPresent(By.css('.pause-btn'));
  }).then(function(pauseIsPresent)) {
    assert(pauseIsPresent);
  });

So how can we avoid race conditions like that?

Avoidance Technique #1: Sleep your troubles away. Selenium provides a sleep method that pauses test execution for some specified amount of time. At first blush, this sounds like exactly what we need: just wait a little while after pushing that button, and even slow machines will run the tests without failure.

driver.findElement(By.css('.start-btn'));
  }).then(function(startBtn)) {
    return startBtn.click();
  }).then(function() {

    // If we remember to pause for five seconds
    // every time we interact with the application,
    // we can avoid problems... right?
    return driver.sleep(5000);
  }).then(function() {
    return driver.findElement(By.css('.pause-btn'));
  }).then(function(pauseBtn)) {
    return pauseBtn.isDisplayed();
  }).then(function(pauseIsDisplayed) {
    assert(pauseIsDisplayed);
  });

There are a couple of problems with this approach. Foremost among them is the huge hassle of writing test code like this. Each interaction has to be followed by an explicit call to sleep just to help the UI “win the race”. In all likelihood, your UI tests are already the slowest tests in your project, and this approach won’t help. Because those sleep statements always execute, test runtime will suffer even in contexts that never expressed the race condition.

If you need more convincing that sleeping is no way to respond to life’s problems, check out the classic American fairytale, Rip Van Winkle. The tremendous foresight demonstrated by Washington Irving in this allegory of user interface testing cannot be overstated.

Avoidance Technique #2: Be patient! The Selenium driver can be configured with an “implicit wait” duration. If set, the driver will not fail immediately if asked to retrieve an element that does not exist. Instead, it will “poll” the page repeatedly until either (1) the element is found (in which case the element is returned), or (2) the duration has expired (in which case an error is thrown). If we set this value, then we don’t need to litter the rest of our test code with calls to sleep, and the test script only waits for as long as necessary.

If the default behavior of the driver is “too eager”, this seems like great a way to teach it some patience. It’s still far from perfect.

// If the element isn't found immediately, try again
// for 5 seconds. Even in slow/busy browsers, surely
// the UI will be updated after 5 seconds have
// passed... right?
driver.manage().timeouts().implicitlyWait(5000);

driver.findElement(By.css('.start-btn'));
  }).then(function(startBtn)) {
    return startBtn.click();
  }).then(function() {
    return driver.findElement(By.css('.pause-btn'));
  }).then(function(pauseBtn)) {
    return pauseBtn.isDisplayed();
  }).then(function(pauseIsDisplayed) {
    assert(pauseIsDisplayed);
  });

Although we’re not forced to duplicate arbitrary sleep durations throughout the test logic, we’re still dependent on one of those much-maligned “magic numbers”. For any given duration we might choose for the “implicit wait”, it’s difficult to answer questions like, “Why not longer?” or “Why not shorter?”. These questions get at a more fundamental problem: neither sleep nor explicit waits actually address the race condition. They basically sweep it under the rug. For any value we choose, it’s always possible (although increasingly unlikely) that extenuating circumstances cause the UI to “lose the race”. All this means that (although a general solution would be nice), we need to avoid each race condition in context.

Avoidance Technique #3: Know what you’re waiting for. What do I mean by “in context”? Generally speaking, every time the user interacts with the UI, the UI responds in some way. It might be to transition to a new page, it might be to report the status of an operation, or it might just be to say, “Please wait”. The exact response is dependent on your application, but it is the only way for you to be sure that the “triggering” event has been handled. This means the only “safe” way to interact with the application is to delay execution until the UI has responded accordingly.

var initialTime;
driver.findElement(By.css('.timer-face'))
  .then(function(timerFace) {
    // Store the value originally displayed by the
    // timer:
    return timerFace.getText();
  }).then(function(currentTime) {
    initialTime = currentTime;

    return driver.findElement(By.css('.start-btn'));
  }).then(function(startBtn)) {
    return startBtn.click();
  }).then(function() {

    // Now that a "click" has been requested,
    // wait until the value displayed on the
    // timer updates. At that point, we can be
    // sure the timer has started.
    return driver.wait(function() {
      return findElement(By.css('.timer-face'))
        .then(function(timerFace) {
          return timerFace.getText();
        }).then(function(currentTime) {
          return currentTime !== initialTime;
        });
    });
  }).then(function() {
    return driver.findElement(By.css('.pause-btn'));
  }).then(function(pauseBtn) {
    return pauseBtn.isDisplayed();
  }).then(function(pauseIsDisplaed)) {
    assert(pauseIsDisplayed);
  });

Above, the condition we’re explicitly waiting for is a change in the timer face. This approach, while safe against UI race conditions, is pretty ugly. We can make this test readable again by using a pattern known among Selenium veterans as a “page object”:

// Start the timer "safely". This method encapsulates the
// logic demonstrated above so that the returned promise
// does not resolve until the timer "face" advances.
timerPage.start()
  .then(function() {
    return timerPage.pauseBtn();
  }).then(function(pauseBtn) {
    return pauseBtn.isDisplayed();
  }).then(function(pauseIsDisplaed) {
    assert(pauseIsDisplayed);
  });

By encapsulating all that extra logic inside a domain-specific start method, we expose a single, simple API that can be re-used across the test suite (and, importantly, updated in a single place). I’ll admit: when you first start creating these objects, it can feel like you’ve just traded one kind of maintenance for another. Over time, I’ve found that they pay dividends in test readability and code reviews (the encapsulation means that test files change less frequently), so I’ve come to use page objects often. For more about this pattern, check out “Page Objects” on the Selenium project’s wiki and “Using a Page Object” on ElementalSelenium.com

Debugging

It would be disingenuous to suggest that you could completely avoid intermittent failures by taking the above advice. No one is perfect (no, no, not even me), and even the most vigilant of us are bound to miss a race condition or two. Debugging races in existing tests can be challenging because the failure may not occur immediately after the race is resolved. Error stack traces may point to a line of code that is completely unrelated to the bug. Because of this, it’s useful to have some techniques for debugging intermittent failures.

I’ll introduce three helpful commands implemented by Selenium’s JSON Wire protocol. The way you issue the commands will vary between programming languages (and even between client bindings), but since they are part of the protocol, any binding worth its salt will make it easy.

Capturing DOM state Sometimes, resolving these issues is just a matter of learning what’s going on in the document. Lucky for us, Selenium defines the source command to easily retrieve the page source.

Capturing JavaScript state Although it can be helpful to review the HTML of the page when failures occur, usually you will need to learn more about the internal state of your application when things go wrong. In these cases, you can reach for the execute command–just specify some JavaScript that returns the information you need. Remember that this data is being sent over HTTP, so it should be serializable (the docs have more context on that if you need it).

Capturing display state There’s a lot of power in the ability to execute arbitrary JavaScript, but usually you need to know what you’re looking for first. In these cases, the screenshot command is just what the doctor ordered. This command returns a Base64-encoded PNG representation of the page and is often the first step in finding the source of the problem.

Finding New Problems

Although you may never have heard of this class of failure, much less experienced them in your tests, I hope I’ve been able to convince you of their significance. At the same time, I don’t mean to discourage anyone from writing UI tests in the first place. Although it took more than a few words to explain it all, I can say that dealing with these race conditions gets easier with practice, and the “page object” pattern is an excellent way to organize and maintain that code over time.

Many thanks to James Lal for his guidance while I was struggling with these problems for the first time and to David Burns for his careful and patient explanations of the Selenium architecture.

Posted by
Mike Pennisi
on October 15th, 2014

Tagged in

Contact Us

We'd love to hear from you. Get in touch!

Email

hello@bocoup.com