The last few weeks I've been working on understanding vector clocks (or rather, version vectors). When working with distributed data stores like Riak and Voldemort which are using this to manage consistency, it is important to have a deep understanding of what kind of consistency they provide, and how to design systems that work well with these structures.
So, without further ado; here's my story. We'll start somewhere else though.
Versions of shared spreadsheets
Adam and Eve are working on a shared Excel spreadsheet, and they have devised a clever scheme for versioning it, so they can detect conflicting edits.
Their idea is that the document has two version numbers embedded in the file name. When Adam saves a new version, he increments his number; and likewise for Eve. After some exchanges and edits, the document is now called doc_adam3_eve4.xls (Adam in version 3, Eve in version 4). The rules of the game are thus:
- If Adam updates the document, he shall increment from
adam3toadam4, and so name the next versiondoc_adam4_eve4.xls, whereas - if Eve updates the same document she shall increment
eve4toeve5and call the resultdoc_adam3_eve5.xls.
Because both Adam and Eve have their own version scheme, they can make edits and when they share their updated documents they can easily detect if they have made concurrent edits. Eve will bring her maximum eve-numbered version, and Adam will bring his maximum adam-numbered version. Then a mechanism depicted in this diagram can be used to detect a conflict:
Given version {adam=3,eve=4} (represented by the white square in the middle), you can see which versions are logially "after" and "before" that version.
Versions colored yellow are in conflict with doc_adam3_eve4.xls, representing logically "concurrent" edits.
The great thing is that this scheme extends to any number of participants. It's just easier to depict in a two-dimensional way, but the same mechanism works for any number of vector dimensions. Missing participants are assumed to have version 0.
Vector Time. The numbering scheme above is an example of a vector clock which can be used to describe temporal relations between events in a distributed system. Generally speaking, a vector clock is a list of (place, version) pairs, in which each place occurs at most once.
Really, we should perhaps talk about version vectors because the time-notion does not exactly refer to our intuitive notion of time, but to some monotonous increasing series of "versions" issued at each "place".
version vectors and version vectors have a partial ordering, which captures the idea of before/after a given point in vector time. It's partial because there exists pairs of version vectors so that one is neither before nor after the other; they are "concurrent".
If you have a set of version vectors, there is thus a subset hereof which are "maximal" (most recent). This set contains only one element if there is no conflict.
If you use version vectors to give versions to documents, then it is obviously great if there is only one such "most recent" document. If there is more than one "most recent" then at least you have narrowed it down to the versions to take a closer look at.
Assembling the Jigsaw
Regardless of their cleverness, Adam and Eve are often in trouble, because often they have more than one "most recent" Excel spreadsheet when they get together (say, doc_adam4_eve4.xls and doc_adam3_eve5.xls mentioned before), and so they have to sit there and take their spreadsheet apart, resolve the conflicts, and put it back together. Once they have resolved their conflicts they can name the new version doc_adam4_eve5.xls which is "after" both of the aforementioned ones and be back on track.
Now Justin (he's a really smart programmer) comes along to help them write a new spreadsheet application called Schemix that solves their issues. Schemix represents spreadsheets such that each cell value in the spreadsheet has it's own version number using version vectors.
Here's a document in Schemix
And here is how Schemix represents this data after Adam and Eve have been working on it for a while.
A1, ["Anders"], { adam:3, eve:4 }
B1, ["anders@somewhere.com"], { adam:1 }
A2, ["Brandon"], { adam:2, eve:5 }
B2, ["brandon@elsewhere.com"], { eve:4 }
The right-most column is the version vector timestamp for the corresponding cell. So, in stead of giving the entire document a single version, each cell in the spreadsheet has its own independent version number. Whenever Adam saves his document, the cells he has modifies will have their version number bumped up one.
This versioning scheme allows any two such Schemix documents to be merged, by comparing the cells individually using the scheme Adam and Eve were using above. Assume the aforementioned document was merged with this:
A1, ["Andy"], { adam:4, eve:3 }
B1, ["anders@somewhere.com"], { adam:1 }
A2, ["Bill"], { adam:3, eve:5 }
B2, ["brandon@elsewhere.com"], { eve:4 }
Merging two Schemix documents is like assembling a jigsaw puzzle from two sets of jigsaws puzzle pieces:
- For some pieces, there will be two identically versioned ones (such as cells
B1andB2) and so you just choose any one and throw the other away. - For some pieces, one is "newer" than the other (for cell
A2); and so you just keep the most recent one and discard the other. - but, but, but for some there may be a conflict, and so the cell becomes multi-valued. Say, both Adam and Eve concurrently change the cell A1, but to distinct values. Then you're in trouble; but luckily much less so than before because the conflict only pertains to that single cell.
The merged document look like this:
A1, ["Andy", "Anders"], { adam:4, eve:4 }
B1, ["anders@somewhere.com"], { adam:1 }
A2, ["Bill"], { adam:3, eve:5 }
B2, ["brandon@elsewhere.com"], { eve:4 }
When cells are in conflict as a result of merging two Schemix documents, they are allowed to have multiple values. In Schemix, this is obviously just represented as a multi-values GUI element (a drop-down), so that once you have a merged document, you can easily edit it to resolve the conflict.
Inremental Updates
Version vectors are really nice, because you don't need entire documents to be able to do consistent merges. If Adam and Eve can see each other via a peer-to-peer channel, they can just as well send updates to each other by sending the cell values incrementally. The fact that each cell carries it's own consistency structure makes this scheme applicable to bulk- or incremental merges alike.
Relation to NoSQL
As mentioned above, this idea of version vectors is important because it is a fundamental notion underlying many of the new distributed NoSQL data stores, such as Voldemort, and Riak. The cell-names A1, B2, and so on are similar to the keys used in these systems, and the notion of version vectors described here is close to what they do. A Riak store is really just one giant version of the Schemix spreadsheet I described above.
The interesting thing is that even though they do provide automatic reconciliation, there is no guarantee about the referential integrity of the result. Two independent edits to two distinct key/values can always be merged; but they could easily break some external invariant which is not expressed explicitly.
What kinds of strategies should we then employ if we want to maintain integrity anyway, say for something simple like a one-to-may relationship?