Apache Cassandra: Iterate over all columns in a row

Posted on Apr 1, 2012

Recently I have been using Cassandra for one of my projects, and one of the needs is to iterate over all columns of a row. Each column represents an individual data, of type identified by row id, and keeps changing. So I can’t simply use a set of known column names. Using the setRange call on a SliceQuery and setting a large count is also not an option, since Cassandra will try to load the entire set of columns into memory. Instead I’ve written this iterator which takes a query on which row key and column family has been set, and will load columns as they are requested. By default it loads a 100 columns at a time. You could make it take the count as a parameter and all, but this works for me for now.

The one ‘problem’ with this is the removal of the last column to ensure that there are no duplicates, but still having a start point for the next query. This is because each column is independent, so you cannot ask a column who it’s next neighbour is and start the next query from there. If anybody has a tip to make it more elegant, I’d love to hear it.