2010 will go down as a watershed year for NoSQL solutions. Apache Cassandra itself had a year that was, in the words of Jonathan Ellis, filled with Code, Community, and Controversy. Between all that activity, the brand new release of Cassandra 0.7, and our SXSW 2011 panel on accessing Cassandra, we at Carbon Five were excited to receive our copies of O’Reilly’s Cassandra: The Definitive Guide by Eben Hewitt and others. Reading it over the holidays, I found a book filled with interesting insights into Cassandra but one that falls just short of being “definitive” for developers.
The book walks through every level of Cassandra in very solid detail. Not only does it describe Cassandra but it also explains where Cassandra sits among its fellow NoSQL solutions … and even goes into depth on some of their architectures! And while the book lands at a very interesting time for Cassandra, with the transition from 0.6 to 0.7 introducing radical changes, its author wisely focuses on the latter release to all our benefit, though some sections are vague or refer to 0.6 fuctionality, I assume due to features being in a state of flux at the time of writing. It’s all done in a highly readable style that flows well and is only occasionally dry.
Hewitt does an excellent job countering criticism of Cassandra’s “eventual consistency”, reframing it as others have as “tunable consistency” that can address those who have “critical” needs. Also, good on him for pointing out to everyone, fans and foes of Cassandra alike, that NoSQL systems are not all-encompassing solutions and a similar frenzy as one over NoSQL has happened before … when SQL was introduced!
Personally, I really enjoyed learning exactly how Cassandra works and my appreciation only grew for its features like fast-writes, self-correcting reads, ability to work across geographically separate data centers, and crash-readiness. Deployment and practical maintenance of Cassandra were always tasks I struggled with, and this book really alleviates my confusion on everything from simple installation to the minute details of how best to partition data. Its explanation of the data model is among the clearest I have read on the subject.
As a developer I was mostly looking forward to thoughts on how to model an application’s data and was thrilled to see the book document patterns that do so. Unfortunately, this led to my biggest criticism of the book where, outside of some basic cases or brief suggestions, the book doesn’t fully walk us through applying these patterns – although it tries through an example of a hotel reservation system. In that case, the application’s requirements are listed and a diagram for what it would look like in a “traditional” relational database is shown. But then, instead of working from the requirements to arrive at a design for a Cassandra keyspace, the book simply gives a diagram of the keyspace and proceeds to list the code to use it.
Besides this and some other minor quibbles, the book is an excellent guide for the “care and feeding” of your Cassandra nodes and clusters. Hopefully future editions or other works will provide more insight on data modeling with Cassandra. On an arbitrary grade school scale of quality, I would give it a B+.