We are on the second NoSQL database of the book. I had personally never heard of HBase before this, but it is an impressive Database. It also provides a good look at different programming techniques and stacks. Fortunately, HBase is fairly simple to get introduced to.
Impressions of the Book
The humor in the book is still going strong and is well used to prove their points. My favorite from this chapter is, roughly, “If you have less than 5 nodes you’re doing it wrong”. This was another good chapter, and I was ecstatic about the example they used of what type of data they were storing because it worked well for a NoSQL database.
What I learned
This was a difficult chapter for me because I wasn’t able to get HBase installed to do the exercises. It got to a point where I was spending so much time trying to get it installed I wasn’t reading the book, so I made sure to read the book. That doesn’t mean I didn’t learn so here are a few key things that stuck out in my head.
The thing I am liking as I go through these last 2 chapters is how easy it really is to cluster together servers for horizontal scaling. This has always been one of those things I have never needed to do so I don’t know much about it. It is very well built into HBase by default.
Java and JRuby
This has less to do with HBase than other stuff, but I tend to avoid Java altogether. It seems though that Java is a central part to interacting with HBase. The cool thing was I go to see some JRuby in action. Hopefully in the future if I need to write some Java I can use JRuby because it really is well written and makes interacting with HBase that much simpler.
The final part of the chapter, Day 3, was about scaling out HBase. Several things were discussed with regards to apache incubator software built to auto scale out, many things, including HBase. Even went so far as to show you how to scale to many EC2 instances with very little work. Seeing how easy it is, is starting to give me a bit of confidence in doing it in the future if I ever need to.
This was a cool database built specifically for “Big Data”. It was a little hard to associate much with it because I haven’t had the need to do anything with large amounts of data, especially in the upper 100′s of GB and TB levels. However, I am aware of it now so I can pay attention if I ever deal with that amount of data.
I was also ecstatic they dealt with wikipedia data because it was a great fit for NoSQL data because the relationships aren’t all that important. Each article is a single silo of data which is independent except an arbitrary link to another article. It is getting exciting about how much better things are going especially on types of data to store.
That wraps it up for HBase review, next week is the most popular NoSQL database in the Ruby community, from what I have seen, MongoDB. It shall be very interesting to see how it goes. The cool thing is its already installed on my machine from previous attempts to learn it.