
Jumping into the first foray of NoSQL Riak was a bit daunting, but thanks to the guidance of some great authors I came out the other end of the chapter relatively unscathed, and maybe a bit wiser. Of all the NoSQL databases the Key Value store seemed the most useless to me, well I was wrong.
Impressions of the Book
Based on the PostgreSQL chapter I was not really sure what to expect with this chapter. My knowledge of NoSQL at best has been conceptual in that you don’t use sql, deep thoughts right. Seeing as how you kind of needed to know what was going on with RDBMS’s for the PostgreSQL chapter I was nervous.
Fortunately the authors anticipated this problem. It seems there goal was to introduce the reader to a solid Database which was basic in concept, but deep in ability so as to ease the reader into the NoSQL waters. I think Riak is a great database to do just that, and the authors did well at introducing and conveying the “how” of using a NoSQL database.
What I learned
A LOT. If I wrote about everything I learned about you would be here for a while. I am still wrestling with some concepts, but overall I am better equipped to understand what is going happening. So I will break down the most significant things “I” learned. First, though, I will bullet point a list of several things which were discussed.
- Link Walking
- Data Partitioning
- Horizontal Scaling
- MapReduce
- Eventual Consistency
- Extending Riak
Riak Reminds Me of S3
They even mention in the book that there are some similarities between Amazon S3 and Riak, which is a bit of an understatement. So with S3 in mind I progressed through the chapter and with everything framed in that way it made much more sense, faster.
Everything is partitioned into buckets and anything can live in those buckets from text to binary objects. So the organization is very similar as well. However, the thing it does do that S3 doesn’t is allowing you to link key value pairs, which brings us to the next thing.
Link Walking
One of my biggest barriers to entry into NoSQL databases has been how you retrieve data. While straight storing large amounts of data is needed you need to actually cross reference the data in some way. Most of my research hasn’t actually talked much about this or how to accomplish the same thing in similar ways.
In Riak you can associate key value pairs, but only in one direction. You can associate n to x, but x will never know it is referenced, at all. Knowing this you have to be aware of how you are going to pull your data out if you choose to use link walking, which is traversing over links from record to record.
MapReduce
I have seen hints about MapReduce a lot of different places, but never looked it up or had a need to use it. This chapter has an introduction to it which really drives home another way to get information out of a data store. It is a bit of a mind bender at first, but has started to make more sense, kind of. It also seems that it is going to be used a lot more in the book for other databases.
A quick idea of how it works is actually based in its name. First you map data together then you reduce them to a single piece of data you need. The interesting thing is it runs on the server so your application itself is not handling the processing of all the data. Another cool caveat is you can save your MapReduce functions as text in a Riak bucket and call it later, kind of like a stored procedure.
Scalability
Based on how Riak was designed it is super easy to add multiple Riak servers together and have data spread across them all so you definitely have some failover. The multiple servers are put together in what is called the Riak Ring, and everything is divided out across them as virtual nodes. As you write to one server data will eventually be written to the others. Yes, there are some things in place for data collision resolution.
Overall
I now have a better understanding of how Riak works, and if I have simple data to store while needing flexibility in my clustering it will be my choice. The book does not really explain well/much when to use Riak except for anecdotal evidence so it is still up to the reader to come up with that on their own. The examples they give work and work well, but I am not convinced a relational database wouldn’t have been a better choice.
All that being said it was a solid chapter overall, and I took a step further in my understanding of NoSQL so the future is looking bright. The next chapter is on HBase, which I have never heard of, so it shall be interesting to see how things go.

