New Thoughts on Test Doubles

Richard Gabriel wrote this about generative advice:

When I play squash and mis-hit or misdirect the ball, I chant, “follow through, follow through, you idiot.” A mantra of simple things. […] The law of follow-through is hard to forget - it’s a law, though, that makes little sense: where is its enchantment? How can advice about what to do after you hit a ball make any difference to what happens when you hit it, or before?

When you follow through, your arm is aiming at a point well beyond where the racket will contact the ball, so there is no deceleration. “Don’t decelerate, don’t decelerate, you idiot!” It might make a mantra worth repeating, but it isn’t advice you can take.

[The follow-through] mantra generates the effect we want - just as a seed generates the flower which eventually blossoms. Just as the wind in the sand generates dune designs and sidewinding sinews and ripples of grains of sand.

For the last, oh, year or so, I’ve been telling people that my TDD strategy doesn’t involve test doubles, and that they’ll write better tests if they try to eschew doubles entirely. I still believe that’s good generative advice, and I still use it as a heuristic when testing, but it’s based on an informal, relatively unexamined view of the “physics” of TDD, so it falls apart under close analysis. “Avoid test doubles” is TDD’s equivalent of the follow-through mantra.

Now I want to pull back the veil of generative advice and discuss the physics of what’s actually going on when I test “without doubles”. The scare quotes are there because, really, I still use test doubles—just not the kind you may be used to seeing.

Definitions

Before we can begin to discuss the role of doubles in tests, we need to agree on what a test double even is.

In this article, I’m going to define the term test double very broadly, to include many entities that you probably wouldn’t normally think of as test doubles. Under this definition, a test double is any value controlled by the test code that interacts with the test subject (i.e. the thing we’re testing). The doubles used by a test are the parameters the test holds constant so it can reliably observe changes in the variable of interest—the behavior of the subject. Thus, doubles are completely known quantities to the test code. In fact, an understanding of the properties of the doubles involved is crucial to understanding what the test is even testing.

Based on this broad definition, almost any argument we pass to a test subject is a double. Suppose I use the string “hello, world” as data when testing a formatting module. From an OO perspective, which considers the string as an object with a set of behaviors, the string literal “hello, world” is just a convenient way of creating a stub with self-consistent behavior. This is because the observable properties of the object—that it responds to .concat("foo") with “hello, worldfoo” and to .length() with 12—could just as well be implemented by a stub, but it’s convenient for the person writing the test to just use a string literal instead.

You might react to this idea, that a string can be a test double, in one of two ways:

Oh, weird! I guess a string can be a test double!
That’s crazy. Strings clearly aren’t test doubles.

If you said (1)—great! You should keep reading the rest of this article. If you said (2)—then I take back what I said in the introduction, I actually don’t use test doubles at all, and you can take my “generative advice” literally.

What’s Not a Test Double?

At this point, you may be wondering “if a string can be a test double, then what isn’t a test double?” Really, what I’d like you to take away from the previous section is not “strings are test doubles” but that a test double is a role that any type of object can play in a test.

To make this idea more concrete, here are some things that are not test doubles, because they are not directly under the control of the test code:

the system clock
the filesystem
the test subject

And here is an example of a test that truly uses no test doubles at all. The test subject, foo, is a procedure that just sets a global variable.

describe('foo', () => {
  it('sets a global variable `bar`', () => {
    foo()
    expect(bar).toEqual(3)
  })
})

This test controls essentially nothing in the environment; it is at the mercy of foo() and its side effects.

Contrast that test with this one:

describe('foo', () => {
  it('sets a global variable `bar`', () => {
    foo('blah')
    expect(bar).toEqual('blah')
  })
})

Here, we’re telling foo what value to assign to the bar global, and that value, the string 'blah', is a kind of test double—a dummy.

So, do tests for code that takes parameters inevitably have to use test doubles? No, they can use production data instead!

describe('foo', () => {
  it('sets a global variable `bar`', () => {
    foo(APP_CONFIG)
    expect(bar).toEqual('blah')
  })
})

This test uses a global constant APP_CONFIG, which is a real production value that gets passed into foo when the production system runs. There are no test doubles here, because the test doesn’t control what data APP_CONFIG contains.

You Can’t Test (Well) Without Doubles

From these examples, it should be clear that test doubles are not just useful; they’re indispensable. You simply cannot isolate your test subjects from external confounding factors without using test doubles as I’ve defined them.

Test Doubles I Have Loved

I enthusiastically use test doubles to the degree that the following statements are true of the situation:

the double implements an interface that is ubiquitous in the programming language. Often it’s a built-in type like an iterable, string, stream, function, or channel.
the double is actually an instance of a built-in type defined by the language itself.
the double naturally has properties that are important to the contract between the system under test and the collaborator the double replaces. Examples of such properties include idempotency, nilpotency, commutativity, and associativity.
the double is structured in a way that allows me to make assertions in my test using the full vocabulary of available matchers. For instance, it’s more convenient and readable to assert that an array contains elements in a particular order than to assert that a mock received several method calls in a particular order.

Examples

At this point, you probably want to see code. I’m happy to oblige.

1. Observing state changes via a callback function (JavaScript)

let changeEvent = 'never sent'

let appState = AppState(function(event) {
  changeEvent = event
})

appState.update({foo: 'bar'})

expect(changeEvent).toEqual({
  type: 'update',
  data: {foo: 'bar'}
})

Here, AppState and its callback function implement the Observer Pattern: the contract between them is that the function will be called whenever AppState is updated. In the test, the function passed to AppState is a double—specifically, a spy.

The object {foo: bar} is a dummy: its purpose is to pass through update’s digestive tract unchanged and show up in the data field of the event, where we can detect it. Its apparent unrealism is actually a boon: it’s clear to anyone reading this test that {foo: bar} is not meaningful to the application, so there’s no illusion that update might have special logic related to this value.

2. Reading data via a stream (Golang)

input := strings.NewReader("{}")
dom, err := json.Parse(input)
Expect(err).NotTo(HaveOccurred())
Expect(dom.Type()).To(Equal(json.Object))

Here, we’re testing a json.Parse function designed to read input from a File by feeding it a string reader instead. We could have designed Parse to simply take a string instead of a Reader, but that would mean we could only parse JSON files by reading the whole file into memory. That would be fine for small files but might not be acceptable for very large ones.

3. Processing data with a function (JavaScript)

In this example, we’re testing a curried map function. The concept of map comes from functional programming. map takes a function that operates on a single value and “upgrades” it to a function that performs that operation on every element of a list, returning a new list of the return values.

let add1 = function(x) { return x + 1 }
let add1toEach = map(add1)
expect(add1toEach([1, 2, 3])).toEqual([2, 3, 4])

Here, the add1 function is a test double. Our production code probably wouldn’t use such a trivial function with map—there’s rarely a need to increment every number in an array. However, for the purposes of illustrating the behavior of map in a test, a simple function like add1 works perfectly.

4. Collecting output in a string (Java)

Here, we’re testing that a web server generates the correct HTML to display a dashboard page.

Writer htmlOutput = new StringWriter();
View view = new DashboardView(data);
view.renderTo(htmlOutput);
assertThat(htmlOutput.toString(), equalTo("..."));

The StringWriter is a test double which stands in for the HttpResponseWriter that would be used in production to send the response HTML to the client. The Writer interface implemented by StringWriter specifies a large set of methods for appending characters, strings, and lines to a buffer.

In this situation, we don’t care about the exact sequence of method calls received by the Writer—we just care that the resulting HTML is correct. For example, the following two snippets of production code should be equivalent from the perspective of our tests, in the sense that we should be free to refactor from one to the other without causing any test failures:

writer.write("<h1>");
writer.write("Dashboard");
writer.write("</h1>");

versus:

writer.write("<h1>Dashboard</h1>");

Because we don’t care about the specific sequence of calls received, using a mock or a spy to test the View is inappropriate.

Going a bit deeper: we can say that the calls to a writer are associative. That is:

write("ab"); write("c");

produces the same effect as

write("a"); write("bc");

We can use a string to collect the value on which we assert precisely because string concatenation is associative, just like the calls to write.

These Tests Are Trivial; I Want Real Examples!

No they’re not, your code is too complicated. These tests are totally realistic examples of what the tests for your actual production systems can and should look like if you appropriately decouple your units of code and depend on small, simple interfaces.

What “decoupled”, “small”, and “simple” mean is probably a topic for a whole different post, but I’ll sketch out a few general points of design advice here:

Depend on few things. If you can change your design so the interesting logic—the part you really want to test—depends on fewer objects and methods, you should do it. Your tests will get much simpler as a result.
Parameters are dependencies too. They’re more loosely-coupled than dependencies on global variables or classes, but they are still dependencies. Analyze your code’s parameters carefully and try to prune and simplify them.
Depend on ubiquitous things. Your code will be much more flexible—not to mention easier to test—if the interfaces it depends on have many, diverse implementors.

For example, imagine we’re writing some code to parse data from a file. Which of these (pseudo-Java) interfaces for getting the data would you rather implement a test double for?
- A unique, snowflake-y class with a getData method that’s not implemented by any other type in the system?
```
class MyData {
  public String getData() { /* ... */ }
}
```
- An application-specific DataContainer interface that has a few different implementors?
```
interface DataContainer {
  String getData();
}
```
- A Readable interface (defined by the standard library, and implemented by many different, widespread types)?
```
interface Reader {
  int read(CharBuffer cb);
}
```
- A string?
```
class String {
  // ...
}
```
- Something with a toString method?
```
interface CharSequence {
  String toString();
}
```
In case you missed it: these are ordered from least flexible to most flexible; tightly coupled to loosely coupled. The ranking is pragmatic rather than formal: perhaps read(CharBuffer) can be considered a more general interface than toString(), but it’s far less common in a typical Java program.

Conclusion

By now you can probably see how I arrived at the mantra “don’t use test doubles” based on an informal understanding of these concepts. I was thinking only of the kind of test doubles that are generated by mocking libraries. These libraries are a symptom, not the cause, of our testing woes: if you need a library to generate your doubles for you, your test subject’s collaborators are almost certainly of the inflexible, tightly-coupled variety pictured earlier on the list above.

Perhaps I should revise my advice from “don’t use test doubles” to “don’t use mocking libraries”. But I don’t think that would be adequate, because I received exactly that advice from a blog post by Bob Martin years ago, but didn’t understand it until recently.

These topics are complex, but I’ll try to condense this post into a few takeaways:

A test double is any value controlled by the test.
Thus defined, test doubles are indispensable for test isolation.
The moment you create a test double, you get feedback about how flexible the design of your code is. Listen to that feedback.
You know your code is flexible when your test doubles are types that are built into the language, or are part of the standard library: strings, streams, lists, functions, and channels.

Next: The Paradox of Throwaway Code

Previous: Types and Tests

Most Recent First