Python 3.x, and in particular Python 3.5, has brought asynchronous programming to Python. While async can make code harder to read, there is a good and easy to understand use case for async that I use often: execute a batch of HTTP requests in parallel.
Before we look at asynchronous requests, let us look at the sequential case. This will give us something to compare with later. The code listing below is an example of how to make twenty synchronous HTTP requests:
# Example 1: synchronous requests import requests num_requests = 20 responses = [ requests.get('http://example.org/') for i in range(num_requests) ]
How does the total completion time develop as a function of
num_requests? The chart below shows my measurements. The curve is unsurprisingly linear:
Using synchronous requests, I was able to execute just five requests per second in my experiment.
Next, let us see how we can make asynchronous HTTP requests with the help of asyncio. The code listing below is an example of how to make twenty asynchronous HTTP requests:
# Example 2: asynchronous requests import asyncio import requests async def main(): loop = asyncio.get_event_loop() futures = [ loop.run_in_executor( None, requests.get, 'http://example.org/' ) for i in range(20) ] for response in await asyncio.gather(*futures): pass loop = asyncio.get_event_loop() loop.run_until_complete(main())
The code is now more convoluted. But is it better? In the chart below, you will see the total completion time in seconds as a function of the number of asynchronous requests made.
The stepwise curve indicates that some requests are being executed in parallel. However, the curve is still asymptotically linear. Let’s find out why.
Increasing the number of threads
Notice that there is a step pattern in the chart? The completion time is 1x for 1-5 requests, 2x for 6-10 requests, 3x for 11-15 requests, and so on. The reason that we see this step pattern is that the default Executor has an internal pool of five threads that execute work. While five requests can be executed in parallel, any remaining requests will have to wait in line for a thread to become available.
So here is an idea. To minimize the total completion time, we could increase the size of the thread pool to match the number of requests we have to make. Luckily, this is easy to do as we will see next. The code listing below is an example of how to make twenty asynchronous HTTP requests with a thread pool of size twenty:
# Example 3: asynchronous requests with larger thread pool import asyncio import concurrent.futures import requests async def main(): with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor: loop = asyncio.get_event_loop() futures = [ loop.run_in_executor( executor, requests.get, 'http://example.org/' ) for i in range(20) ] for response in await asyncio.gather(*futures): pass loop = asyncio.get_event_loop() loop.run_until_complete(main())
Let us rerun the experiment from before.
Bingo. If the executor has a pool of 20 threads then 20 requests take roughly the same amount of time as 1 request.
The next obvious question to ask is: what is the limit to this approach? How does the total completion time increase for a more substantial amount of asynchronous requests and equal amount of threads in the executor pool, say 100 or 1000? Let’s find out.
We are back to linear – the flat kind. The chart shows that we can easily make 500+ asynchronous HTTP requests in the same time it takes to make 10 synchronous requests – or 300 requests per second. Surely a vast improvement. You might have noticed that the chart only goes to 720 requests, not 1000. The reason is that “something bad” happened when I reached 720 requests/threads and my timing program became very sad and stopped working. On a different computer than mine, this limit will likely be found somewhere else.
First, you have seen that with asynchronous requests you can execute more requests per second than with synchronous requests: 300 requests per second compared to just 5 requests per second (in my experiments).
Second, you have seen how to achieve this well-known benefit of async I/O in Python 3.5 using just slightly convoluted code.
I would like to apologize to the people who host example.org for hammering their website during my experiment. I like to think that it was for a good cause.