Beware of chicken testing! (or mocks overuse)

Posted Apr 17, 2020 Updated Jan 17, 2024

By Sebastian Buczyński

8 min read

Need for mocking

Dealing with problematic dependencies is an indispensable part of software testing. Often, we cannot or do not want to rely on 3rd party service/network communication/hard drive etc., especially in unit-tests. The reasons vary; external dependencies are usually slow, fallible and difficult to put into the expected state before the actual test. Consider the following code snippet (simplified - no logger, no saving to database etc):

class FlightService:
    ...

    def book(self, params: BookingParams) -> None:
        try:
            payment_ref = self.payment_provider.authorize(params.confirmed_price)
        except AuthorizePaymentFailed:
            return

        try:
            self.flights_booking_provider.book(params.booking_token)
        except BookingFailed:
            self.payment_provider.cancel(payment_ref)
        else:
            self.payment_provider.capture(payment_ref)

...and think how would you test it, assuming that using external services is not an option.

There is no other way but to use so-called test doubles - objects replacing real implementations in tests. Let's use Mocks from the standard library:

def test_let_mew_write_about_test_names_later():
    payment_provider = Mock(spec_set=PaymentProvider)
    flights_booking_provider = Mock(spec_set=FlightsBookingProvider)
    service = FlightService(payment_provider, flights_booking_provider)

    service.book(...)

    # we can make some assertions!

If you do not feel you're comfortable with mocks yet or spec_set surprises you, check out my (almost) definitive guide about mocking in Python (are you up to date with novelties introduced in Python 3.7 and 3.8, like sealing or unsafe parameter?).

So we should be good to go and safe, right? No.

How Mock can betray you?

The most basic problem with Mock-based tests is that they can give you passing tests even though the code won't work in QA/Staging/Production environment. It is simply because they are not the same implementation. Even if you are using rigorous spec_set and sealing, Mocks still can still give you a thrill of "excitement" by letting you fall into two specific problems.

Returning value different than mocked implementation

Often as a result of a human mistake of misconfiguring mocks, but this is dangerous because can propagate and waste your time:

class PaymentProvider:
    def authorize(self, amount: Money) -> str:
        ...

mocked_payment_provider = Mock(
    spec_set=PaymentProvider, authorize=Mock(return_value=True)
)
mocked_payment_provider.authorize(Money('$10.0'))  # will return bool even though actual implementation will return string.

Calling methods with wrong parameters

That involves calling it with no parameters when it needs some, with wrong types or with a wrong number of parameters (too many or too little).

class FlightsBookingProvider:
    def book(self, token: str) -> None:
        ...

mocked_flights_booking_service = Mock(spec_set=FlightsBookingProvider)

mocked_flights_booking_service.book()  # no arguments, but mock is fine with it

type checking is the only reliable solution

The only reliable way (that won't bite you in the future) is to use type annotations and mypy. If you wonder where to start with it, see my blog posts How to use mypy in my project.

However, a misused solution for the second problem is to make pedantic assertions about all calls:

def test_not_so_good():
    params = BookingParams(booking_token='booking_token', confirmed_price=Money('$10.0'))
    dummy_payment_ref = 'PAYMENT_REF'

    payment_provider = Mock(
        spec_set=PaymentProvider,
        authorize=Mock(return_value=dummy_payment_ref)
    )
    flights_booking_provider = Mock(spec_set=FlightsBookingProvider)
    service = FlightService(payment_provider, flights_booking_provider)

    service.book(params)

    payment_provider.authorize.assert_called_once_with(params.confirmed_price)
    flights_booking_provider.book.assert_called_once_with(params.booking_token)
    payment_provider.capture.assert_called_once_with(dummy_payment_ref)

Don't get me wrong, making assertions about the way mock was called is bad only in certain circumstances. After all, it is essential to interaction-based testing. But that's also a slippery slope. Luckily, there are many other red flags...

...when mocking goes too far

Overspecification

If tests are making too many assertions about implementation details we're dealing with a classic overspefication. In other words, a code under test is literally duplicated in the test. It makes refactoring really hard because whenever you do some slight change, dozens of tests start failing. The same goes for testing private methods.

Regarding the most recent example, when you make an assertion about the way a collaborator object (e.g. payment provider or flights booking service) was called, you introduce a coupling. It will not be an overspecification only if the interface is stable (i.e. is not subjected to change in any foreseeable future)

Mocks returning mocks that return even more mocks

That should speak for itself. Something is very, very wrong with the code you're trying to test. Such multi-level mocking is just a hack.

Multiple mocks/patches at once

When you see or write a code like this

@patch('mymodule.someothermodule.AyClass')
@patch('mymodule.someothermodule.BeClass')
@patch('othermodule.yetanothermodule.CeClass')
@patch('thirdpartymodule.DeClass')
@patch('thirdpartymodule.EeClass')
def test_mocks_nonsense(ay_class, be_class, ce_class, de_class, ee_class):
    ...

it may indicate that either the code is not unit-testable or it needs an integration test instead. There's no good in hacking your tests by mocking half of the codebase (including 3rd party libraries).

Multiple assertions against different mocks in the same test

    payment_provider.authorize.assert_called_once_with(params.confirmed_price)
    flights_booking_provider.book.assert_called_once_with(params.booking_token)
    payment_provider.capture.assert_called_once_with(dummy_payment_ref)

That's a classic example of mocking too hard. Naturally, a developer still needs to use a test-double instead of dependency, but they end up with tests that:

are very difficult to name precisely
can fail for a various reasons

Ideally, each of your tests should fail for only one reason. When you have multiple assertions against different mocks, it's no longer a case.

Meet stubs

Luckily, there is a relatively easy way out. Mock is merely one of few types of test-doubles. The one I want to bring your attention to is called stub.

A stub is an object that just like a mock can be used instead of a real implementation. Stubs return hardcoded data but one must not do any assertions about the way they were called.

class PaymentProvider:
    def authorize(self, amount: Money) -> str:
        ...


class PaymentProviderFailingStub(PaymentProvider):
    def authorize(self, _amount: Money) -> str:
        raise AuthorizePaymentFailed


def test__book_flight__payment_failed__booking_provider_not_called(): 
    payment_provider = PaymentProviderFailingStub()
    flights_booking_provider = Mock(spec_set=FlightsBookingProvider)
    service = FlightService(payment_provider, flights_booking_provider)

    params = BookingParams(booking_token='booking_token', confirmed_price=Money('$10.0'))
    service.book(params)

    flights_booking_provider.book.assert_not_called()

In our test, we once again mock FlightsBookingProvider (because we will be making an assertion about its book method) but instead of PaymentProvider we pass a stub - PaymentProviderFailingStub. It is a simple class that always raises an exception. The stub is not to be used in assertions. It is meant to make it easier to test our code under test (FlightService.book) UNDER CONDITION that payment authorization fails. This shift in thinking allows our tests to be more focused. As a result, they are much easier to name and are also more stable, by failing only for one reason.

If you are a proficient user of Python Mocks, you already noticed that it is actually not necessary to write a class for each stub. You could as well create a Mock and specify side_effect:

def test__book_flight__payment_failed__booking_provider_not_called(): 
    payment_provider = Mock(  # still a stub
        spec_set=PaymentProvider,
        authorize=Mock(side_effect=AuthorizePaymentFailed)
    )
    flights_booking_provider = Mock(spec_set=FlightsBookingProvider)
    service = FlightService(payment_provider, flights_booking_provider)

    params = BookingParams(booking_token='booking_token', confirmed_price=Money('$10.0'))
    service.book(params)

    flights_booking_provider.book.assert_not_called()

Even though we use Mock, payment_provider test-double is technically still a stub, as long as we do not make any assertions about it. So what makes a test-double stub or a mock, is a way how we use it. Perhaps naming class Mock was not the best choice, but now that's just random rant. :)

I believe in the majority of cases using Mock class to create stubs instead of a hand-written class is good enough unless a class looks cleaner to you and your colleagues. A hand-written class has an advantage of being stricter about arguments passed to functions calls, but mypy gives you the same advantage. Plus it would be required anyway to make sure your stub does not violate Liskov substitution principle.

Again, given capabilities of Python standard library Mock, stubbing is mostly a shift in thinking. One way to make sure it's not lost is to remember about one rule.

Only one mock per test allowed (let the rest be stubs)

Do not verify more than one test-double using Mock in a single test. That enables creation of much more focused, much cleaner tests:

@pytest.fixture()
def booking_params() -> BookingParams:
    return BookingParams('xdxdxdxd', Money('$10.0'))


@pytest.fixture()
def payment_provider() -> PaymentProvider:
    return Mock(spec_set=PaymentProvider)


@pytest.fixture()
def flights_booking_provider() -> FlightsBookingProvider:
    return Mock(spec_set=FlightsBookingProvider)

def test_book_flight__authorize_payment_failed__booking_service_not_called(
    payment_provider: PaymentProvider,
    flights_booking_provider: FlightsBookingProvider,
    booking_params: BookingParams
) -> None:
    # dependency               | test double type
    # ---------------------------------------------
    # flights_booking_provider | mock
    # ---------------------------------------------
    # payment_provider         | stub
    service = FlightService(payment_provider, flights_booking_provider)
    payment_provider.authorize.side_effect = AuthorizePaymentFailed

    service.book(booking_params)

    flights_booking_provider.book.assert_not_called()


def test_book_flight__booking_failed__cancel_payment_called(
    payment_provider: PaymentProvider,
    flights_booking_provider: FlightsBookingProvider,
    booking_params: BookingParams
) -> None:
    # dependency               | test double type
    # ---------------------------------------------
    # flights_booking_provider | stub
    # ---------------------------------------------
    # payment_provider         | mock
    service = FlightService(payment_provider, flights_booking_provider)
    flights_booking_provider.book.side_effect = BookingFailed

    service.book(booking_params)

    payment_provider.cancel.assert_called()


def test_book_flight__booking_successful__capture_called(
    payment_provider: PaymentProvider,
    flights_booking_provider: FlightsBookingProvider,
    booking_params: BookingParams
) -> None:
    # dependency               | test double type
    # ---------------------------------------------
    # flights_booking_provider | stub
    # ---------------------------------------------
    # payment_provider         | mock
    service = FlightService(payment_provider, flights_booking_provider)

    service.book(booking_params)

    payment_provider.capture.assert_called()

requests stubbing

The idea of stubbing plays very nicely with external services that we use by calling their API. Making assertions whether we called the right endpoint is like... meh. You can kill two birds with one stone - write a stub that will only respond to the expected URL and focus on testing what your code does, given the stubbed response. This can be called a self-verifying mock, it will fail if code under test requests other URL.

This can be both achieved with responses or request-mock libraries. Also, see this recipe for aiohttp.

Summary

Stubs have yet another magic feature - when your starting point is predictable behaviour of dependencies, you start thinking more about valuable testing. Focus is shifted on the code under test. Your end goal is to verify a particular code fragment by checking WHAT it does under specific conditions. Not HOW EXACTLY it does that.

Beware of chicken testing! (or mocks overuse)

Need for mocking

How Mock can betray you?

Returning value different than mocked implementation

Calling methods with wrong parameters

type checking is the only reliable solution

...when mocking goes too far

Overspecification

Mocks returning mocks that return even more mocks

Multiple mocks/patches at once

Multiple assertions against different mocks in the same test

Meet stubs

Only one mock per test allowed (let the rest be stubs)

requests stubbing

Summary

Further reading

Trending Tags