Riak doesn't come with means to do secondary indexes out of the box. This limits many applications of Riak; and it's a bit tricky to implement correctly oneself.
The module riak_link_index provides a mechanism that can be applicable for some situations, providing means to create synthetic (secondary link index) objects triggered by a Riak commit hook. The mechanism work by defining a function that names these synthetic objects as a function of the primary object body.
Basic Example
Assume the primary object contents being stored under the name dogs/ABC01 is
{
type: "Dog",
name: "Fido",
owner: "Peter"
...
}
And the indexing function looks like this:
function indexDogs(metaData, contents) {
Body = JSON.parse(contents);
if (Body.type == "dog" && Body.owner) {
return [ ["people", Body.owner] ];
} else {
return []
}
}
Then a secondary index object will be created named /riak/people/Peter, which contains a Riak link pointing back to dogs/ABC01.
That's it, basically. There are a few extra details though.
Composability by Tagging
Indexes are virtual entities not tied to a particular bucket; but the indexing function needs to be "installed" in the bucket being written to. However, multiple buckets may have objects that get indexed the same way, and the "index buckets" (in the above case the /riak/owns bucket) may contain different kinds of indexes. This kind of composability is controlled by assigning a unique tag to each indexer.
When you install an indexer, you also name it's tag. It could be that in the above example you choose the tag "owns_dog", then you can use a link-walk query like this:
curl http://127.0.0.1:8091/riak/people/Peter/_,_,owns_dog
To return all the dogs that are owned by peter. But the people/Peter object can also contain links to objects that are indexed by another indexer with it's own tag; say the houses that people own.
Under the hood
What really happens in the above example is that the indexer function runs in the precommit phase and creates links attached to the primary object, pointing to the index objects, tagged with idx@owns_dog. So, the object dogs/ABC01 will get a link
Link: </riak/people/Peter>; riaktag="idx@owns_dog"
The post-commit phase of storing the dog object, will then go to the targets of all the idx@-prefixed links (the index objects), and make sure that those each have a link pointing back to the dog.
So, in stead of using the indexer function, you can also just provide those links when you push the object in the first place; and then the indexer function doesn't have to be installed.
The index objects contain a data structure used to ensure the consistency of the resulting links; that is if two different PUTs to some primary objects mutate the same index object, then the index object may end up having a write conflict (siblings), and the data structure makes sure that such conflicts are resolved properly with it's own set of vector clocks.
Because of that, you cannot use the index objects to contain other stuff. The data structure contained therein is only for the use of the riak_link_index stuff.
Recent Comments