1. Download the featherlite framework for free
  2. Download the documentation and tutorials
  3. Check the latest news on featherlite development
Tuesday 18 January 2011 15:50:59

A Simple Key Value Server With Cassandra 0.7.0 and Hector API

There's a lot of rumour around on the Cassandra NoSQL database these days. Obviously it has some nice features which makes it interesting to try out, especially if used with our software product.

Since we are running our business with serving planning scheduling applications this is not obvious at first sight. But in most projects we run the application 100% in memory and use a database to backup the object state only, to be able to restore the state at application startup. Thus Cassandras feature of fast writing, replication and distribution makes it an interesting alternative to relational or object oriented database management systems.

As usual when evaluating new products or technologies, we started with a small test project to get some experience. And since we noticed that documentation is poor on how to start using the cassandra database from within a java program, we decided to publish it in this blog hoping that it might be helpful to other users as well, especially those, who are looking for a starting point.

To test and learn we started with a simple example of a single Cassandra node, either running on localhost from a Cassandra installation, or running embedded in a process forked from the test class. Although Cassandra supports a rich data structure, we configure it to act as a simple key-value server, storing objects under keys, taking Double objects for simplicity. Since the Thrift API turned out to be not very handy, we used the Hector API to connect, write to and to read from the database.

For easy installation we bundled the test classes, the sources and all required libraries and configuration files to a single Eclipse project ready to download from our server and to run from Eclipse version 3.6. We will not step through the sources here, since we added comments to the source code explaining the individual steps. We will instead give a short overview of what the classes do.

The project contains a single data access object implementing the calls we frequently use from our planning and scheduling framework. These are to write and read objects by key as well as to get and to remove all object in a given keyspace.

There are two test classes covering the DAO methods. These are the class ClientExample which assumes an external instance of Cassandra running on port 9160 on localhost.

For those who don't want to install an external Cassandra database we added the class EmbeddedExample and some helper classes to fork a Cassandra database in a separate process from the test class.

To run the example you need to import the project to a workspace and to run the java classes. If you use the EmbeddedExample you'll need to configure the folder Cassandra writes its files to first. To do this open the file cassandra.yaml file and set the 3 path entries to appropriate values. If you forget to configure it, Cassandra will create the files automatically at startup, instead of throwing a FileNotFound exception. And this might not be what you really want.

There is a known problem causing the Hector Client not to stop, when the Cassandra database had to start up with no data and commitlog files. But this can be ignored during testing, because the well known bug fixing strategy of "lets do a restart and see if the error persists" works fine in this case. Up to this problem it worked well when tested on my private Mac and the Windows 7 PC on my office desktop.

What caused problems was the fact, that the documentation is poor. Up to a very short tutorial, some blogs, mailing lists and sparse Wikis you'll hardly find quick help. The code is available, but not well documented, at least to my feeling. So to get even the simple example up and running it took me a few days to find out how to do it, mainly by trial and error, which reminded my of the times when I played Simon the Sourcerer, Floyd or other adventure games on my computer. This leaves us with the feeling, that although the code is working well, it might be far from being the best solution.

To summarize: the Hector API and the Cassandra database turned out to be very handy for us. Based on the Examples presented here, we implemented a plug-in to mirror the in memory for our planning and scheduling framework. During first tests we found that writing of even complex object structures turned out to be very fast, a feature which is crucial for our applications. But testing is still going on, especially testing with more than just a single database node.

If you check the source code, you'll find nothing new in there. All is already described in a different context or application field in one of the sources listed below, which should be the starting points for further reading:

Our Tutorial at http://www.featherlite-framework.com/uploads/files/Cassandra-Test.zip
Cassandra wiki at http://wiki.apache.org/cassandra
Hector API wiki at https://github.com/rantav/hector/wiki
Hector tutorial at http://www.datastax.com/sites/default/files/hector-v2-client-doc.pdf

Trackback Link  |   Share this: Share on Twitter  Share
Send a comment

Comments:

On 6/23/11 Guest wrote:
Great example! It was easy to install and run, and the code was well documented. Thank you for sharing and for providing everything that was needed.