Post

Python & the Clean Architecture in 2021

It's been almost 3 years since I used the Clean Architecture in production for the first time. I made it to quite a few conferences to talk about it (e.g. see Clean Architecture in Python talk from PyGotham 2018). Also, I wrote an article about the Clean Architecture that made it to RealPython.com newsletter. ...but we're now in 2021 and the world has gone forward. Let's see how Python's advancement and a variety of new cool libraries make the implementation of the Clean Architecture easier today.

First, let's revisit why's.

Assumptions - good old friends

Independence from a framework

The goal here is to avoid leaks of framework-specific stuff to the business logic. If the latter is complex on its own, framework's stuff will make that worse. The goal was never to abstract away the framework for art's sake, but rather to keep the flow of control unidirectional. Swapping frameworks ALMOST never happens. Just don't let it get too intimate with business details.

Testability

The idea is to be able to test the most valuable pieces without too much effort. Our tests can be simpler to write, faster and cheaper to maintain.

Independence from UI/API

The most critical observation to make - your program is not just some Django/Flask/FastAPI/Starlette etc app. It's a non-material thing that provides a certain set of services for group or groups of users. Accidentally, it has an API on top of it. The more complex the underlying set of services is, the more sense it makes.

Independence from third party services

We use abstract classes (Ports) to write code with business logic against stable interfaces that wrap third parties. Not only because switching payment provider is a likely scenario, but also to keep business logic simpler and pure. That also helps with writing stubs for the external stuff we do not control while testing the system.

Assumptions - few new ones

Modularization is more critical than the Clean Architecture

The Clean Architecture is quite a huge investment. And sometimes juice is not worth the squeeze - especially if a certain part of the system is:

  • trivial proxy over some 3rd party service
  • simple, CRUD application

Using too simplistic or too complex approach throughout entire system is not gonna work well. We need to learn ways to identify different components of the app, then carefully craft. Of course, I do not mean Big Design Up Front. Be ready to refactor out parts of the app gradually into separate components as your understanding of the system grows.

Here you can find an article about building a modular monolith in Python and don't put all your eggs in one basket

Independence from databases

Although the Clean Architecture/Hexagonal architecture/Onion architecture etc are often combined with a Repository Pattern, we should not expect database switch to be trivial. The goal is to make sure your business logic shape persistence (e.g. database models), not the other way around. For MongoDB, it's quite easy to achieve with Pydantic (or built-in dataclasses + marshmallow-dataclasses). It's not that simple though for relational databases. It still means a lot of work to make our domain objects not being represented by database models. From my perspective, SQLAlchemy remains a powerful (often enough) tool.

Bear in mind, that it all becomes much more tricky when multiple databases are involved. Also, transaction management is a cross-cutting concern that's not trivial to be abstracted away. I devoted quite a few pages in my book for this particular topic.

Direct access to the database is sometimes handful and harmless

Speaking of databases... When we model operations that modify data in the system and it's complicated due to the nature of the system and requirements, all those fancy layers doesn't come into way. BUT if we think about reading data, it becomes a burden - not only from typing perspective, but also there's considerable performance hit when we're repacking objects.

So it's totally fine for read endpoints to have a shortcut to reach directly to the database. However, exposing your database schema directly may not be the best idea in the long run. Consider using a view over your tables or another way to guarantee you a way out (e.g. views in RDBMS, proxy models in Django etc)

The Clean Architecture is not "all-in" approach

I think that the success of my talk can be partially attributed to the way how the narration was built - introducing building blocks one after another. I even got positive feedback from non-technical people that the talk sounded interesting to them. The point I am trying to make is that you can introduce elements of the Clean Architecture gradually - in the same order as they appear in the talk. Let's be honest - systems are not being built like in books or talks. In reality, it's a bumpy road with many mistakes along the way and refactorings. Also, a part that's often conveniently passed over in silence is learning the rules of the game of business we work with.

What was painful in implementing the Clean Architecture in Python in 2018 and is no longer a big deal?

Validation

The biggest issue I found was with validation. Now, thanks to aforementioned Pydantic / dataclasses+marshmallow-dataclasses + mypy it is not nearly as bad. Additionally, arm yourself with a Value Object pattern and you are good to go into 2021 and beyond.

Overengineering

Discovering the possibility of modularization and adopting the Clean Architecture gradually makes it much less of a risk. Still, do your homework - I recommend to learn about strategic Domain-Driven Design and techniques like Event Storming to better estimate when it makes sense to invest more.

Non-idiomatic framework usage

Although Django philosophy hasn't changed a lot since 2018, we have now a few new trending (micro)frameworks - like Starlette or FastAPI. They are as flexible as Flask but more powerful and modern. Still, use them with caution!


For example, our dependency injection container should be universal enough so we can use it in any context (console application, task queue AND API).

Enough talking - show me the code!

Use Cases / Interactors

The first step is to identify and introduce Use Cases (AKA Interactors). There will be one for each individual action/command of an actor. An actor is a person or another system that interacts with our application. Typically, it will be a regular user.

For a meetup.com clone, it could be:

  • Confirming attendance as an attendee
  • Cancelling attendance as an attendee
  • Drafting new meeting as an organizer

For a Trello clone, it could be:

  • Assigning a task to a team member
  • Archiving list
  • Inviting a colleague with their e-mail address

For an auctioning platform, we could identify following Use Cases:

  • Placing a bid as a bidder
  • Withdrawing a bid as an administrator
  • Cancelling auction as an item owner

Now that we have a basic understanding of what users can do with the system, we can represent these actions as first-class citizens in code:

class PlacingBid:
    def __call__(self) -> None:
        ...

class WithdrawingBid:
    def __call__(self) -> None:
        ...

class CancellingAuction:
    def __call__(self) -> None:
        ...

Use Cases / Interactors will inherently have semantics of a function - they are to be called in an api view / cli command / background task. If so, why not just use a function? Because dependency injection is not quite possible with functions. Also, with classes we can easily control lifetime of objects.

Input DTOs (arguments of Use Cases)

Most Use Cases will require a bunch of arguments, e.g. identity of a user that triggered the action, which resource they wish to modifyt etc. We pack those into immutable data structures using classes. An immutable @dataclass(frozen=True) would do, but Pydantic can do as nice:

from decimal import Decimal
from pydantic import BaseModel, validator


class PlacingBidInputDto(BaseModel):
    auction_id: int
    bidder_id: int
    amount: Decimal

    @validator("amount")
    def amount_must_be_positive(cls, v) -> None:
        if v <= Decimal("0"):
            raise ValueError("Amount must be positive")
        return v

    class Config:
        allow_mutation = False


class PlacingBid:
    def __call__(self, dto: PlacingBidInputDto) -> None:
        ...

InputDTOs should be validated. Whether they do it on their own (pydantic) or using another object (marshmallow-dataclasses) inside Use Cases we should have every right to expect that arguments are valid in terms of type. We'll never know if an auction with a given ID exists before calling Use Case, but we must ensure IDs are at least looking as they can belong to an existing object.

Value Objects

In the above example, you can see that I used built-in types for things like auction_id or bidder_id and amount. This is a code smell called Primitive Obsession. Especially for the amount field - instead of playing with Decimal, it would be better if we created (or used) a dedicated Money type which would guarantee to be valid and immutable. This is a pattern called Value Object and is present in Python's standard library. datetime.datetime, uuid.UUID or decimal.Decimal are examples.

Interfaces / Ports

Whenever we need to call a 3rd party API (or another component of our modular monolith) in a synchronous way, we're gonna find it useful to wrap it for two reasons:

  • prevent pollution of our business logic with irrelevant details and names from the outside
  • improve testability - having abstraction in place allows for easier mocking/stubbing. If we do TDD, it also allows to nicely design collaboration by leveraging mocks

Under the hood, dragons may live. But when you look at Use Case and how it uses Interface / Port, it should look nice and easy to you!

In 2021, for the most of the time abstract base class (abc.ABC) will be a way to go:

import abc


class Payments(abc.ABC):
    @abc.abstractmethod
    def init_payment(
        self, bidder_id: int, amount: Money
    ) -> None:
        pass


class ClosingAuction:
    def __init__(self, payments: Payments) -> None:
        self._payments = payments

    def __call__(self, dto: ClosingAuctionInputDto) -> None:
        self._payments.init_payment(
            bidder_id=auction.winner_id,
            amount=auction.price,
        )

Bear in mind that not only a list of public methods is your Port. If there are any exceptions that you want to handle explicitly in Use Case, they should be defined alongside Port and repacked in the next building block when necessary.

Interface Adapter / Adapter

Each abstract class will have at least one implementation + stubs/mocks modelled after it. Adapter is just an implementation of a Port:

class MadeUpCompanyPayments(Payments):
    def init_payment(
        self, bidder_id: int, amount: Money
    ) -> None:
        # pull corresponding customer id for a given bidder id
        # customer id is an identifier in external system
        # we do not want to leak to auctioning platform
        ...
        # Then, we can try to charge customer with requested amount OR
        # request them to pay online


closing_auction = ClosingAuction(MadeUpCompanyPayments())

Remember that Use Case class must not know which implementation of Payments it uses - hence the type annotation for abstraction.

Dependency Injection

I hope the last line caught your eye - ClosingAuction(MadeUpCompanyPayments()) - assembling objects manually is not much of a fun. To make this painless, we can use injector - nice library modelled after Java's Guice.

import injector


class Auctions(injector.Module):
    @injector.provider
    def closing_auction(self, payments: Payments) -> ClosingAuction:
        return ClosingAuction(payments)


class AuctionsInfrastructure(injector.Module):
    @injector.provider
    def payments(self) -> Payments:
        return MadeUpCompanyPayments()


container = injector.Injector([Auctions(), AuctionsInfrastructure()])
# in a view / CLI command / background task
closing_auction = container.get(ClosingAuction)

All you have to do is to define recipes for all dependencies and injector will assemble them for you.

In the past, I used inject library. The largest con is that it relies on a global state which makes you wanna cry during testing.

Bear in mind you should rather not use container directly. This can lead to Service Locator antipattern. There are packages on pypi that integrate flask_injector. If there's none available for the framework you use, try to write your own integration and don't hesitate to share with the community :)

Another sensible alternative to injector is dependencies. I haven't used it, but it looks rock-solid. BTW, pytest's fixture system is one large dependency injection container.

Entities

While Use Cases orchestrate control flow, collaborating with Interfaces AKA Ports, we still need domain objects - the ones that will represent concepts we can talk about with stakeholders or users. In the case of auctioning platform, we'll have Auction and Bid classes.

Now, "all-in" approach would be to write pure- or almost-pure Python classes:

@dataclass
class Bid:
    _id: int
    _bidder_id: int
    _amount: Money


@dataclass
class Auction:
    _id: int
    _initial_price: Money
    _bids: List[Bid]

    def place_bid(self, bidder_id: int, amount: Money) -> None:
        ...

Note that all fields are private. We focus on Entities' behaviour (methods!) when we model them, not data that is going to be shown to the user. If encapsulation doesn't sound familiar to you, read my article about it.

Now, we need a way to store and load Entities from the persistence mechanism (whichever it is - PostgreSQL/MongoDB - shouldn't matter). Popular way (the one I was also using) is to use Persistence-Oriented Repository Pattern:

class AuctionsRepository(abc.ABC):
    @abc.abstractmethod
    def get(self, auction_id: int) -> Auction:
        ...

    @abc.abstractmethod
    def save(self, auction: Auction) -> None:
        ...

We use it in the same way as Ports - they are collaborators for Use Cases. Concrete repositories will have to repack data from Entities to:

  • models with SQLAlchemy and flush changes
  • dictionaries for e.g. pymongo and save it

As you can imagine, implementing concrete repositories could be quite tedious job. It starts to make more sense if you use Tactical Domain-Driven Design patterns and model Aggregates as small, self-contained pieces around business invariants that always have to be consistent. This is the case for Auction - it has to be consistent with its Bids. Also, we can notice that for some Use Cases we might need to load all Bids (withdrawing bid) while for others (placing bid) only a subset (i.e. winning) Bids are necessary to perform the logic. Repository Pattern helps to manage and contain these intricacies.

BUT what if your domain objects are simpler? Why not just use database models as domain objects? Previously, I was against such abominations. But in one of my projects Entities were simple. Although Use Cases and Ports / Adapters made perfect sense, investment into Entities and Repositories didn't pay off as much. I feel SQLAlchemy's implementation of Unit Of Work pattern (Session) would do fine with hiding technical details.

Now, I don't encourage you to throw The Dependency Rule away and write SQL in your Use Cases, but consider if the lightweight approach wouldn't work for you. It would be best if you read about Aggregates and Tactical Domain-Driven Design to see if you need that. I recommend Vaugh Vernon's Domain-Driven Design Distilled book for a quick and solid introduction.

The most important thing is to not let structure (fields) of our domain objects leak outside Use Cases - this is what makes systems hard to change.

Conclusion

I'm still a big fan of the Clean Architecture or similar approaches. By the way, it takes some time to understand why it works. Studying principles of Object-Oriented Programming really helps with that. Even though sometimes a function is all we need, knowledge about design always helps.

I am really glad that Python's community is building a lot of awesome tools to make it easier to build enterprise-grade software in Python.

How about you? What new additions to Python's ecosystem you find the most valuable for implementing the Clean Architecture?

Further reading

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.