A distribution algorithm is used to map keys to servers in a distributed key-value store. There are several different ones, implemented in different systems, and with different properties. In this blog post I'll briefly cover the best-known key hashing schemes, before I get to vbuckets.
Really simple idea. Keys are mapped to servers, by hashing the key and indexing into a list of servers (modulo the length of the list). This approach has two severe issues, in the case that servers can join and leave the cluster:
- Stale data: If a key is found on the server, we don't know if we are reading an old value. It could be that a key was written to server A, after which the configuration changed and the key mapped to server B, where it was updated, after which the configuration changed again, and the key was once again mapped to server A, where we read the old value.
- False negatives: If a key is not found on server A, the key could have been stored on server B in a previous configuration. We could ask all the servers for the key, but server B could have left the cluster. So we basically don't know if the key was ever written to (unless we find it, in which case it could be stale).
In any event, we are screwed!
Consistent hashing partly addresses the short-coming of the modulo scheme, but actually has all the same issues. We could still read stale data, and we could still get false negatives. The probability is simply smaller, because only few keys are mapped to a new server in the event of a server joining or leaving the cluster. The point is that the servers themselves don't know if they are currently responsible for a given key, and thus can not avoid completely returning false negatives or state data.
vbuckets address the fundamental problems of false negatives and stale data inherent in both the modulo scheme and consistent hashing. A nice introduction to memcached vbuckets can be found on Dustin Sallings github pages.
The core idea is that keys are not mapped directly to server (like in consistent hashing), but rather keys are mapped consistently to a fixed number of containers called vbuckets (virtual buckets). These vbuckets are in turn statically mapped to the servers. The number of vbuckets must remain constant for this scheme to work, while the number of servers can vary
The properties of vbuckets are:
- Never service a request on the wrong server.
- Allow scaling up and down at will.
- Servers refuse commands that they should not service, but
- Servers still do not know about each other.
- We can hand data sets from one server another atomically, but
- There are no temporal constraints.
- Consistency is guaranteed.
- Absolutely no network overhead is introduced in the normal case.
As I was reading the first of the two articles, I came across phrases like "To effect a transfer, you select a set of the vbuckets that you want the new server to own and set them all to the pending state on the receiving server". There seems to be a controlling entity that is simply refered to as "you".
This raises two questions about moving vbuckets around, that are not answered in the article.
- How exactly are vbuckets selected to move to a new server?
- How are vbuckets balanced across the servers over time?
These questions are arguably orthogonal to the concept of how vbuckets work.
Balancing vbuckets is talked about in an article on rebalancing on the couchbase website. The technical white paper published on the couchbase website (btw, membase = couchbase) talks about using the vbucket protocol in a memcached environment using either "client" proxies or "embedded" proxies.
Some quick observations about vbuckets:
- The number of vbuckets has to be decided upon when first launching the cluster, and can not ever be changed, e.g. in response to a change in overall load or amount of data stored. Is this bad?
- The mapping from keys to vbuckets can not ever be changed, and is a consequence of 1) the number of vbuckets and 2) the chosen hash-function. Neither of which can be changed. Namely a method for partitioning "hot" vbuckets is not described in the article. Is this bad?
- A given key can only be served a single server. Is this bad?
I don't know if these are actually issues or just observations that happen to be true. vbuckets do solve a number problems of the fundamental problems of consistent hashing (stale data, false negatives), but I'm wondering if there is a scalability issue in some cases.
Going off on a tangent: Potential add-on schemes
If having a single server per key is a problem, a scheme that maps a given first-level key to a distinct set of second-level keys, which are then mapped to vbuckets, would yield the possibility for load balancing across multiple servers, but would also introduce inconsistency. This again could be countered by only having a single write server (mapped to a distinguished second-level key), and several eventually consistent read servers (serving reads for any second-level key). And so we could go on :-) Truth is, I don't know the extent to which important points have been left out of the article.