Tuesday, May 29, 2012

Using filters in HBase to match certain columns

HBase is a column oriented database which stores the content by column rather than by row. To limit the output of an scan you can use filters, so far so good.

But how it'll work when you want to filter more as one matching column, let's say 2 or more certain columns?
The trick here is to use an SingleColumnValueFilter (SCVF) in conjunction with a boolean arithmetic operation. The idea behind is to include all columns which have "X" and NOT the value DOESNOTEXIST; the filter would look like:

List list = new ArrayList<Filter>(2);
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("fam1"),
 Bytes.toBytes("VALUE1"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("fam2"),
 Bytes.toBytes("VALUE2"), CompareOp.DOES_NOT_EQUAL, Bytes.toBytes("DOESNOTEXIST"));
FilterList filterList = new FilterList(list);
Scan scan = new Scan();

Define a new filter list, add an family (fam1) and define the filter mechanism to match VALUE1 and compare them with NOT_EQUAL => DOESNOTEXIST. Means, the filter match all columns which have VALUE1 and returns only the rows who have NOT included DOESNOTEXIST. Now you can add more and more values to the filter list, start the scan and you should only get data back which match exactly your conditions.