How many requests per second can I get out of Redis?

Warning: This is not a very interesting post. I'm toying around with the Redis benchmarking tool. What would be significantly more interesting would be to toy around with the Lua API in Redis, which I'll do in a subsequent post.

In this post, I'll try to squeeze as many get/set requests out of Redis as I can. I'll use the redis-benchmark tool to test just the set and get commands. This is not meant to be a benchmark, but a learning experience to see "what works".

I'm testing the current stable version of Redis: 2.6.15.

Basic testing approach

First, compile Redis from source (it should "just work") and place the binaries somewhere useful. Next, start Redis server (I use port 7777 for no specific reason):

redis-server --port 7777

To test (set and get):

redis-benchmark -p 7777 -t set,get -q

You should use the redis-benchmark tool to benchmark Redis, for exactly the reasons mentioned in the pitfalls and misconception section on the Redis benchmarking page. The primary reason is that the tool uses multiple connections, and easily enables commands to be pipelined.

This command above uses the -p switch to set the port, uses the -t to limit the commands we test, and finally the -q switch to limit the output to just the throughput.

Additional notes

Redis is a single-threaded server. Unfortunately it does not seems possible to use the benchmark tool to load-balance over several Redis instances, say running on different ports on the same machine. I guess nothing is keeping me from using consistent hashing (or another partitioning technique) with Redis, but the benchmarking tool does not seem to support any kind of partitioning.

Antirez has a blog post about using a Redis proxy called Twemproxy for doing partitioning with Redis. It can potentially increase the throughput. Unfortunately the Proxy uses the epoll system call in Linux, which does not exist on Mac OS X (where kqueue is used instead), so I can not try it.

All in all, I'll be evaluating Redis in a purely single-node setup, using a TCP loopback connection to the Redis server running on my laptop.

A further thing that is noted on the benchmarking page is that:

Finally, when very efficient servers are benchmarked (and stores like Redis or memcached definitely fall in this category), it may be difficult to saturate the server. Sometimes, the performance bottleneck is on client side, and not server-side. In that case, the client (i.e. the benchmark program itself) must be fixed, or perhaps scaled out, in order to reach the maximum throughput.

Another reason that Redis may not be saturated by the benchmark is that Redis throughput may is limited by the network well before being limited by the CPU. As I'm running on a local machine, I'm assuming that this is not the case, but I'm not entirely sure that there are not other bottlenecks in the OS regarding communication between the benchmark process and the redis-server process. As noted on the benchmarking page: When client and server run on the same box, the CPU is the limiting factor with redis-benchmark.

Let's keep all that in mind.

1: Running Redis server on my slightly old Macbook Pro

This is the 100% lazy installation. I compiled Redis from source on my laptop, using all defaults, and simply started it.

Hardware: 2.66 GHz Intel Core 2 Duo, 4 GB 1067 Mhz DDR3
OS: Mac OS X 10.6.8 (Snow Leopard)

The result is 37K and 38K requests per second for set and get respectively:

$ redis-benchmark -p 7777 -t set,get -q
SET: 37174.72 requests per second
GET: 37313.43 requests per second

The standard test uses just a single key. To increase the number of expected cache misses, I'll run the same test using a million random keys (using the -r switch to set number of keys) to see if it makes a huge difference:

redis-benchmark -p 7777 -t set,get -r 1000000 -q

The difference is roughly 2.8% for set and 3% for get. Nothing dramatic. The performance overall is however not great for this initial setup running unmodified on my laptop.

2: Using pipelining

Now I'll read the fucking manual. Maybe it helps. Redis has a page about benchmarking Redis. The first suggestion is to use pipelining. It is enabled by using the -P switch with an argument of the number of commands to bunch together in each request. I'll try 16 as suggested on the page.

$ redis-benchmark -p 7777 -t set,get -P 16 -q
SET: 222222.22 requests per second
GET: 256410.25 requests per second

Actually the throughput varies a lot between different runs of this test, much more than the non-pipelined test. With that in mind, it seems that using a pipeline level of 100 is better than 16, about 30% higher throughput:

$ redis-benchmark -p 7777 -t set,get -P 100 -q
SET: 312500.00 requests per second
GET: 333333.34 requests per second

But using a pipeline level of 1000 is worse. Again there is a lot of variance, so I'd need to do a proper statistical analysis. Here I'm doing a rough estimation, and using pipelining of 100 dominates 1000, and that is all I care about.

Bottom line is that you can get 1 order of magnitude improvement to throughput by using pipelining, at least on my old Macbook Pro. Maybe it will be more or less on a "proper" server.

The question is, can we do better?

3: Using lua scripting

Redis supports Lua scripts that are evaluated server side. This can improve the throughput in the situation where a read is followed by a computation follow by say a write. Without scripting, even if using pipelining, there would be a roundtrip following the initial read in order to do the computation. The benefits of scripting are really application specific, and I'll not explore that further.

4: Various potential optimizations

  • Use another memory allocation library. Default is libc. Unlikely to have any dramatic effect on the test
  • Other things to consider?

I have not tried any of these optimizations.

5: Givin'er all she's got!

On the Redis page there are results posted for a high-end server, using TCP loopback (like I am) and without pipelining.

Here are the results for a 2 x Intel X5670 @ 2.93 GHz (without pipelining):

SET: 142653.36 requests per second
GET: 142450.14 requests per second

For Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining):

SET: 552028.75 requests per second
GET: 707463.75 requests per second

Note that these are not the same machines.

That is roughly 3.8x increase in throughput (compared to my laptop), in the non-pipelining case (run on the high-end server) and roughly 2x in the pipelining case (run on the not-quite-as-high-end server). Again, take the numbers with a big grain of salt. They essentially say nothing wildly interesting. The main conclusion is that pipelining and perhaps Lua scripting is a good idea. Also that partitioning may improve throughput, in which case you could try the Twemproxy code if you're on Linux.

Conclusion and next steps

Using a single-node instance of Redis running on my laptop I managed to get 300K requests per second (both get and set). This is achieved only if using pipelining (100 commands at a time). On a high-end machine someone got 700K get requests per second using pipelining, i.e. a bit more than twice the throughput.

My goal is to squeeze 1 million get requests per second out of Redis, for a "realistic workload". For this I'll use a partitioning approach. The approach is to use Twemproxy running on a multi-core Linux machine with several Redis instances. The exact setup will take some experimenting to get right.

Out of the box, both pipelining and Lua scripting are good avenues for improving performance with Redis server. I saw 1 order of magnitude improvement to throughput when using pipelining. Both approaches are quite application specific, perhaps Lua scripting more so than pipelining. I did not experiment with lua scripting. That would also be very interesting to try.

Hello GNU profiling

The profiling tool in GNU is called gprof. Here is a short, boring example of how to use it.

1) Write hello world in C (hello.c)

#include <stdio.h>
int foo() {
  int b = 54324;
  int j;
  for (j=0; j < 1000000; j++) {
    b = b^j;
  return b;
int main() {
  int a = 321782;
  int i;
  for(i=0; i<1000; i++) {
    a = a ^ foo();
  printf("Hello foo: %d\n", a);
  return 0;

2) Compile with -pg option

gcc -pg hello.c

3) Run the program to generate profiling information

./a.out # this generates gmon.out file

4) Run gprof on the program and read output in less:

gprof a.out gmon.out | less

Summary of memcached commands as used through telnet

The two main things to do with a memcached cluster is getting a key and setting a key to a value. Below are examples of this, with parameters explained (something most examples miss for some reason?).


Connect to memcached:
telnet [hostname] [port]

telnet localhost 11211

Getting a key:
get [key]

get foo # get the key "foo" from memcached

Setting a key:
set [key] [flags] [time-to-live-seconds] [bytes] (newline) [value]

set foo 0 86400 5 # set with a "flag" of 0, a TTL of 86400 seconds and 5 bytes of data
hello # value to set is entered on a separate line. Length matches the number 5 above

The "flags" is just a 32-bit integer that is stored (and returned) with the data.


I find the memcached documentation a bit lacking when it comes to finding good summaries with examples of using the different commands. The project uses a wiki for documentation, but I can't find the stuff that I'm looking for. The way my mind works is, that if I see an example of using something, I can generalize from that (it's like a one-liner example beats a thousand words).

Below is a good summary with examples of interacting with memcached using telnet. It's good because the parameters of each command are explained.

Installing nginx on Mac with pcre library

To build nginx on Mac OS X Snow Leopard, I use the following options to configure

./configure --with-ld-opt="-L/usr/local/lib" --with-cc-opt="-I/usr/local/include"

If the ld and cc options are not given, it results in an error reported in a ticket on the nginx trac.

Notice that the is no space between -L and /usr/local/lib and likewise for -I and /usr/local/include. This is different from what is written in the FreeBSD section of the InstallNotes on the nginx website.

Testing performance of HTTP server with Apache Bench

To benchmark a HTTP server, you can use the command-line utility Apache Bench (command ab).

If you have Python installed, you can start a HTTP server, e.g. running on port 10000, in any directory of your system. It will serve the files in that directory and sub-directories. Start it like this:

$ python -m SimpleHTTPServer 10000
Serving HTTP on port 10000 ...

To test the number of requests it can handle per second, you can use Apache Bench, which is invoked with the command ab:

$ ab -n2000 -c10

This will shoot off 2000 requests at the server, with a concurrency level of 10 (10 threads performing requests in parallel).

How to manage PATH in Mac OS X

I use two ways to manage paths in Mac OS X:

  • ~/.profile
  • /etc/paths.d

I use the first option when trying things out, and the second for managing the path more permanently.


And example of the using /etc/paths.d to include the git executable in PATH:

  • edit /etc/paths.d/git
  • Add line: /usr/local/git/bin

When you start a new Terminal session, the files under /etc/paths.d are read, and all paths found are appended to the PATH variable.

How to copy a file to “clip board” memory

You have a file sitting on your disc. You want to put a copy of the contents in your OS "clip board" so that you can paste it somewhere, e.g. on a website, without having to open the file in a text editor, selecting everything and hitting the COPY key combination, say Command-C on a Mac.

Copy a file to clip-board:

pbcopy < PATH_TO_FILE

Paste the contents you copied: