Post

Dive into Python's asyncio, part 2

All examples were tested under Python 3.6.

The only asyncio rule

After reading part 1 you should already know, that a heart of asyncio is an event loop.

There is exactly one rule - do not block the event loop! Never ever. Fortunately, it's quite simple to avoid this.

  • Use only co-operative libraries for blocking I/O operations (or the other way round - do not use not co-operative ones)
  • Don't do CPU intensive operations in the same process, because they will effectively freeze whole application during calculations

First point has some serious implications. Namely, it means that you have to replace many libraries from your stack. Forget requests, for instance. Embrace aiohttp instead. An exhaustive list of available libraries can be found on github in awesome asyncio repository.

To better understand why this is so important, let's consider following example:

from aiohttp import web


async def handle(request): # 1
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)


app = web.Application()
app.router.add_get('/', handle)
app.router.add_get('/{name}', handle)

web.run_app(app, host='127.0.0.1', port=8001)

This is a very simple HTTP server, that utilizes aiohttp library. It greets everyone who bothers telling it their name. Getting http://127.0.0.1:8001/John shows Hello, John.

At #1 we see async keyword at the beginning of a line. It means that handle function is a coroutine. The latter stands for independent execution unit that may be used in asyncio flow. Couroutines represent some 'heavy' operation that will end eventually. For instant, we consider all I/O operations, like database queries or HTTP requests heavy operations.The unique thing about coroutines is that they can explicitly suspend they execution when waiting for other coroutines to finish. This is not visible in above example, but let's slightly modified one:

import async_timeout
from aiohttp import web, ClientSession, TCPConnector

import xml.dom.minidom


async def fetch(url):
    connector = TCPConnector(verify_ssl=False)
    async with ClientSession(connector=connector) as session:
        with async_timeout.timeout(10):
            async with session.get(url) as response:  #5
                return await response.text()


async def handle(request):  #1
    name = request.match_info.get('name', "Anonymous")

    url = 'https://breadcrumbscollector.tech/feed/'
    raw_xml = await fetch(url)  #2
    decoded = xml.dom.minidom.parseString(raw_xml)  #3
    last_build_date_el = decoded.getElementsByTagName('lastBuildDate')[0]  #4
    last_build_date = last_build_date_el.firstChild.nodeValue

    text = f'Hello, {name}. Recent feed was built on {last_build_date}'
    return web.Response(text=text)


app = web.Application()
app.router.add_get('/', handle)
app.router.add_get('/{name}', handle)

web.run_app(app, host='127.0.0.1', port=8001)

This is an extension of the previous example. Now, on every request (#1) we GET this blog's feed (#2), decode XML (#3) and extract build date (#4). Whole logic of HTTP client is enclosed in a seperate coroutine, fetch.

We see usage of new keyword - await. Thing to bear in mind is that asyncio's event loop can switch execution context on every await. Assuming that getting data from breadcrumbscollector.tech/feed/ takes a lot of time, #5 will probably be the place where switching between clients will occur most frequently. To illustrate this, refer to the image below. Red pointer shows asyncio's execution path:

Asyncio flow (missing image)

We see that during handling first client's request another client requested something. We are not able to serve them in parallel, so we wait until convenient moment. It occurs when first client's request handling encounters await session.get. Since event loop has nothing better to do right then, it starts executing second client's request handling logic. This execution switch is marked in the picture as Client Switch #1. Therefore, we are able to utilize our CPU more and handle more clients in the same period of time.

 

What if we couldn't suspend execution on blocking operations with no other means providing concurrent clients handling? One can easily find themselves in such situation without co-operative libraries.

Asyncio flow (missing image)

What we see in the picture above is called serialization. Application is able to handle only one request at the same time. Obviously, we can't afford it, because client's of this app shouldn't wait in such queue - no matter if they are humans or other applications - this becomes a bottleneck.

This is the case we should avoid like the plague when doing asynchronous I/O. Always use co-operative libraries and you should be fine.

However, this explanation adheres only to one of aforementioned problems. The second issue, CPU-intensive operations can not be simply adressed by using some magic 'co-operative libraries'.

First approach to this is: don't do it. At least not in an asyncio based applications. Delegate it somewhere else, push it to task queue.

If you really have to do this in the same application, then at least make sure it can handle I/O requests "in the background".

A little tricky way to do this is to manually invoke asyncio.sleep, for example

for x in range(10000):
    do_some_heavy_computation(x)
    await asyncio.sleep(1)

However, it will still lock event loop for a majority of time calculations are done.

Better way utilizes AbstractEventLoop.run_in_executor. Asyncio will then delegate our function to pool of threads (by default) or processes and return result as coroutines do.

# executor=None causes asyncio to use default pool of threads
result = await loop.run_in_executor(executor=None, func=heavy_calculus)

 

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.