I recently returned from the Apache Cassandra Summit in San Francisco, where I had the opportunity to take the first sitting of the Apache Cassandra Developer certification. After taking the certification exam, I attended the conference where I was presented with fascinating solutions to data opportunities using Apache Cassandra. By the end of the conference, my brain was full and my imagination was overflowing with ideas about how to use Cassandra, and build upon my newly-minted developer certification. As the plane took off and I started to head back to the east coast, I reflected on how I got here: an Economics major with SQL skills and limited application programming experience - how did I get a certification in one of the 'hottest' NoSQL tools being used today? I think there are three ways that a leader in the NoSQL space like Cassandra makes it easy to learn the application and get certified.
Cassandra's "newness" is one of the reasons why it's so easy to learn. Unlike RDBMS systems and their 20-30 year-old history, Cassandra's first release was in 2008. To explain why that is a pro and not a con for learning Cassandra, I have to explain what it was like to try to learn SQL basics after graduating college as an Economics major with a lot of math skills but no direct programming experience. Since SQL and RDBMS systems were so established in work environments and in people's experience, there was often a huge amount of 'tribal' knowledge that everyone else but me seemed to instinctively know. When concept "B" was explained to me, it was simply assumed that I had known concept "A" beforehand and could use it to build on concept "B". This was frustrating and a disincentive for me to really dig into questions about SQL and RDBMS basics - I felt awkward asking so many questions. For Cassandra, the opposite is true: very few people (relatively speaking) have years and years of experience so the playing field is much more level and, I think, more friendly to asking questions to a wide range of users. Whether at conferences or on online forums, all levels of Cassandra users are eager to bring more people on board and are quick to answer both basic and complex questions.
DataStax - the company that develops and provides commercial support for an enterprise edition of the Cassandra - is another huge reason why learning Cassandra is made possible for all. Since the beginning, DataStax has wisely realized that in the "cambrian explosion" of NoSQL and Big Data technologies, the companies with the most compelling and easily accessible training will grow the largest user base, and eventually, market share. DataStax seems to have understood this more than most other companies and has gone to great lengths to provide interesting, intuitive, and in-depth training - for free! The DataStax core concepts curriculum is organized in a really efficient manner, broken up into small manageable chunks and strikes the right balance between concepts and hands-on demonstrations. At times I found myself eager to log into DataStax academy to learn more about bloom filters, or the key cache because the information was presented in a very easy-to-ingest manner. I can't think of any RDBMS training that went to such lengths to make the material so friendly for learning. The newly instituted certifications (by O'Reilly) also help provide a baseline for the technology and way to recognize core competencies without necessarily being an expert in the technology.
Cassandra has also realized that in order to make a new technology adopted by the masses, it helps to provide a baseline that everyone can understand. The baselines for learning Cassandra are three really valuable tools for the application: Cassandra Querying Language (CQL), Cassandra Command Line Interface (CLI), and Cassandra Cluster Manager (CCM). The syntax for CQL is very similar to SQL and makes the initial querying and "playing around" with the tool easy for those with even the most minimal RDBMS background. Once the first step of interacting with Cassandra is taken via CQL, the continued exploration and experimenting with the tool is that much easier. I found it particularly useful to explore the data within Cassandra using CQL and then to view the same data using Cassandra's Command Line Interface (CLI) to understand how Cassandra actually stores a partition. CCM takes the concepts of CQL and CLI and allows users to mock up a test cluster of multiple Cassandra instances on their local machine and interact with each just as if they were each independent nodes within a cluster. Cassandra's tools allow you to sample the full breadth of the application by providing equal access for drilling down to the most minute timestamp of an individual cell to conducting operations across any number of nodes within a cluster.
Taken together, the "newness" of Cassandra, the support of DataStax, and the familiarity of CQL allow any user to start the process of learning Cassandra. With more and more users empowered to understand Cassandra and its capabilities, solutions that benefit from distributed computing and scalability will become easier to create and data professionals will possess many more ways to drive insights from data.