Maybe you’re not a “people person.” It’s not that you dislike other humans, but you recognize certain realities of your work. Your day job is maintaining a web application, after all, not carousing with your users. You know that accessibility is an important topic, but you haven’t been able to find the time to learn more about it. Keeping the application running smoothly while your team adds features, fixes bugs, and re-designs is quite enough to worry about, thank you very much. That introduction to accessibility post has been open in a tab for the better part of a week, but the UI tests are failing again. Is it any wonder that “a11y” concerns take a back seat?

I can relate! When I’m contributing to a web application, a lot of my energy always goes towards writing and (unfortunately) maintaining UI tests. I’ve been fortunate to work alongside the likes of Mat Marquis and Susan Robertson–folks who really earn the title “Inclusive Web Developer.” Their influence helped me develop a more nuanced understanding of UI testing and accessibility best practices. Far from being two separate responsibilities, with the right procedures in place, these practices can actually reinforce each other.

In this post, I’d like to talk about the never-ending struggle of writing maintainable tests and how accessibility best practices offer a solution that can rally your whole team.

What’s all the fuss about?

Generally speaking, automated testing is about verifying that a piece of code is operating as intended. More specifically, UI testing focuses on verifying that an application behaves as intended from a user’s perspective. The major difference is in the expectations: UI tests are much more humanistic (as in, “the blog post’s date is visible”) while unit tests focus on technical details (something like, “the bloom filter calculates an accurate probability”). But at the end of the day, UI tests still need to be expressed in terms that a machine can understand.

For web applications, this means that we need to write in terms of the underlying document structure. For example, consider the following bit of markup:

<div class="blog-post">
  <div>
    <h1>Using NVDA Screen Reader on Windows</h1>
    <span class="published-on">March 21, 2017</span>
  </div>
</div>

While we might say, “the blog post’s date,” in plain English, the equivalent reference for a machine might be a CSS selector like .blog-post h1 + .published-on. We use this reference as a “hook” into the UI’s state–the test script can read from that location whenever we want to express some expectation about the blog post’s publication date.

assert(page.findByCss('.blog-post h1 + .published-on').text() === 'March 21, 2017');

The trouble with this is that the “hook” we choose is subject to change. Your team’s designers might adopt a new convention like BEM and subsequently refactor CSS class names and IDs. Or a redesign may introduce a byline into the heading, changing the relationship between the referenced elements. When you fill a test suite with “hooks” like the one above, any of these changes will result in failing tests. Fixing them is not only more work (your team’s designers will not appreciate the opportunity to work with WebDriver), it is also an opportunity for bugs (since mistakes made while updating tests might mean that the tests no longer guarantee what they were meant to).

One popular response to this predicament is to introduce class names and/or attributes dedicated to use as test “hooks.” With additional class names like js-publication-date or element attributes like data-byline, developers create stable hooks for themselves without making it harder for designers to do their thing. This only addresses part of the problem, though. Sure, folks working on the application’s presentation can rename selectors to their heart’s content. But with this approach, they have the additional responsibility of maintaining that extra markup whenever the structure changes. (Your more surly co-workers might also point out that this markup really doesn’t add any direct user value.) I like my tests to be unobtrusive, so I’ve never bought in to this practice.

Instead, I’ve tried to limit this effect with code structure. I’d store “hook definitions” in declarative JSON files. These would assign a stable and human-readable name to each of those unstable machine-readable locators, e.g.:

{
  "postDate": ".blog-post h1 + .published-on"
}

The test code could then be written in terms of the stable names.

assert(page.findByCss(lookUp('postDate')).text() === 'March 21, 2017');

This pattern reduced maintenance overhead (and subsequent risk of error). Where before, updating a “hook” involved performing a “find and replace” across many test files, now only a single value in a “dictionary” had to be updated. Unfortunately, the tests were a little harder to understand since readers had to know what the lookUp function did, and that depended on another file. More concerning, though, was that valid changes to the source could still produce surprising test failures. I hadn’t actually solved the problem. I just made it less burdensome.

A parallel

In web accessibility circles, they have a name for those “stable and human-readable names”: roles. Long before my naive but well-meaning exploits with JSON “dictionary” files, inclusive web developers were writing meaningful labels directly into their application markup itself. For example:

<div class="blog-post" role="main">
  <div role="heading">
    <h1>Using NVDA Screen Reader on Windows</h1>
    <span class="published-on" role="note">March 21, 2017</span>
  </div>
</div>

They weren’t doing this to make writing UI tests any easier, though. They included this extra structure so that assistive technologies like screen readers could aid their visitors with disabilities. This is just one small aspect of the ARIA standard for accessibility on the web.

…but that doesn’t mean our UI tests can’t hitch along for the ride.

Alignment

Due to their semantic value, roles can be expected to change far less frequently than other traits of the document like CSS class name and node structure. This is exactly the same property that my ad-hoc JSON “dictionary” files were intended to provide. But unlike those dictionary terms, roles are built in to the application itself. This means that we can write test “hooks” in those terms:

assert(page.findByCss('[role="main"] [role="heading"] [role="note"]').text() === 'March 21, 2017');

Ignoring the verbosity for a moment, this version already feels like an improvement because the test code is in terms of the application code (i.e. no more “look up” indirection). And because the roles have value beyond making the tests pass, your whole team will be far more motivated to keep them accurate.

On verbosity

If you’re anything like me, you may be tempted to take this great new thing and apply it consistently, everywhere. (You can imagine what my kitchen looked like the weekend I discovered Nutella.) This is a great intention, but keep in mind that ARIA isn’t always necessary. The HTML5 specification defines a handful of “semantic elements” that accurately describe the meaning of their content. This includes tags like <main>, <header>, <nav>, and <footer>. For these common elements, a dedicated role attribute is technically superfluous. “Technically superfluous” because in reality, Assistive Technologies took some time to include support for these elements. That’s why you’ll find some blog posts recommending the use of roles with semantic elements. Things have improved since those days, but you might still consider over-specifying. When it comes to roles, it can’t hurt!

Anyway, using these tags, we can make the markup and tests more concise, e.g.

<main class="blog-post">
  <header>
    <h1>Using NVDA Screen Reader on Windows</h1>
    <time class="published-on">March 21, 2017</time>
  </header>
</main>

…and:

assert(page.findByCss('main header time').text() === 'March 21, 2017');

This is everything we’d hoped for: the application is accessible, the tests are robust, and both are concise!

To be sure, this simplification doesn’t apply to all elements. Today’s div is tomorrow’s span, and hard-coding either tag name into a test selector is still asking for trouble. So for less common document regions, the role attribute isn’t going anywhere.

A better definition of “stable”

On the surface, it may not be clear how ARIA roles like tablist or semantic elements like <main> are any more stable for testing purposes. The truth is, these elements are more than just means to improve search engine ranking. For users with disabilities, they are in fact part of the UI. Changing a role attribute is no less jarring than re-locating the navigation menu. It’s not that it can’t be done, just that it shouldn’t be done lightly. When you structure your UI tests to rely on ARIA semantics, you are actually extending coverage.

Don’t forget that accessibility is a field of study unto itself, though! It’s one thing for a blog post to introduce an idea with a few simple examples. It’s another thing to discuss all the considerations that inform responsible implementation in a user-facing application. For instance, over-applying ARIA markup can actually degrade user experience. This isn’t meant to discourage; it’s just to say you should do your homework first. (You might start with this guide from the Web Platform Working Group.)

Even with the “homework” under your belt, I won’t claim this to be a silver bullet. More severe changes associated with site overhauls will still require extra work in test maintenance. In my mind, this extra work is a kind of “healthy friction” for change. The overhead discourages accidental UI modifications and encourages intentionality whenever a redesign takes place. In this way, careful integration of accessibility standards can transform a brittle test suite into a meaningful safety net–one that promotes stability for all of your users.