HBase – Week 3 of 7 Databases in 7 Weeks

by Buddy Lindsey on February 13, 2012


We are on the second NoSQL database of the book. I had personally never heard of HBase before this, but it is an impressive Database. It also provides a good look at different programming techniques and stacks. Fortunately, HBase is fairly simple to get introduced to.

Impressions of the Book

The humor in the book is still going strong and is well used to prove their points. My favorite from this chapter is, roughly, “If you have less than 5 nodes you’re doing it wrong”. This was another good chapter, and I was ecstatic about the example they used of what type of data they were storing because it worked well for a NoSQL database.

What I learned

This was a difficult chapter for me because I wasn’t able to get HBase installed to do the exercises. It got to a point where I was spending so much time trying to get it installed I wasn’t reading the book, so I made sure to read the book. That doesn’t mean I didn’t learn so here are a few key things that stuck out in my head.

Clustering

The thing I am liking as I go through these last 2 chapters is how easy it really is to cluster together servers for horizontal scaling. This has always been one of those things I have never needed to do so I don’t know much about it. It is very well built into HBase by default.

Java and JRuby

This has less to do with HBase than other stuff, but I tend to avoid Java altogether. It seems though that Java is a central part to interacting with HBase. The cool thing was I go to see some JRuby in action. Hopefully in the future if I need to write some Java I can use JRuby because it really is well written and makes interacting with HBase that much simpler.

Scaling

The final part of the chapter, Day 3, was about scaling out HBase. Several things were discussed with regards to apache incubator software built to auto scale out, many things, including HBase. Even went so far as to show you how to scale to many EC2 instances with very little work. Seeing how easy it is, is starting to give me a bit of confidence in doing it in the future if I ever need to.

Overall

This was a cool database built specifically for “Big Data”. It was a little hard to associate much with it because I haven’t had the need to do anything with large amounts of data, especially in the upper 100′s of GB and TB levels. However, I am aware of it now so I can pay attention if I ever deal with that amount of data.

I was also ecstatic they dealt with wikipedia data because it was a great fit for NoSQL data because the relationships aren’t all that important. Each article is a single silo of data which is independent except an arbitrary link to another article. It is getting exciting about how much better things are going especially on types of data to store.

That wraps it up for HBase review, next week is the most popular NoSQL database in the Ruby community, from what I have seen, MongoDB. It shall be very interesting to see how it goes. The cool thing is its already installed on my machine from previous attempts to learn it.

Related Posts:

Was this Helpful?

If you found this article useful you might find others useful as well. Please browse the archives and subscribe to the RSS Feed to stay up-to-date.

Rob Sullivan February 13, 2012 at 3:00 pm

I’m glad I’m not the only one who hasn’t heard of this one.

Reply

Buddy Lindsey February 13, 2012 at 3:01 pm

Phew, Glad you haven’t either. I have talked about it to a couple of people and they had heard of it. One person used it while they worked at SourceForge.

Reply

Jim R. Wilson March 13, 2012 at 4:17 pm

Hi Buddy!

Thanks much for your detailed notes on each of these databases, your feedback is much appreciated! Could you explain what about HBase installation was tough? I’d be happy to help or possibly add more info to the book to explain better.

– Jim R. Wilson (co-author of Seven Databases)

Reply

Buddy Lindsey March 13, 2012 at 6:32 pm

It wasn’t really one thing. I tried to install it multiple ways and I kept running into different dependency problems. In all fairness at the time my development environment was pretty polluted so I probably crossed wires with several things that caused all the dependency problems. The most annoying part of it was that each way I tried I got different problems, never the same one. So I can’t really nail it down for you, sorry. And thank you for writing the book and taking the time to read my reviews.

Reply

Leave a Comment

Previous post:

Next post: