jeudi 15 septembre 2016

How to unit test HBase in Spark streaming scala

I was trying to unit test doSomethingRdd which requires to read some reference data from HBase in rdd transformation.

def doSomethingRdd(in: DStream[String]): DStream[String] = {
    in.map(i => {
        val cell = HbaseUtil.getCell("myTable", "myRowKey", "myFamily", "myColumn") 
        i + cell.getOrElse("")
    })
}

Object HBaseUtil {
    def getCell(tableName: String, rowKey: String, columnFamily: String, column: String): Option[String] = {
    val HBaseConn = ConnectionPool.getConnection()
    //the rest of the code will use HBaseConn 
    //to get a HBase cell and convert to a string
    }
}

I read this Cloudera article but I have some problem with their recommended methods.

This first thing I tried was using ScalaMock to mock HBaseUtil.getUtil method so I can bypass HBase connection. I also did some workaround in order to mock Object singleton suggested by this article. I updated my code a bit like below. However, doSomethingRdd failed because mocked hbaseUtil is not serialization which also explained by Paul Butcher in his reply

def doSomethingRdd(in: DStream[String], hbaseUtil: HBaseUtilBody:HBaseUtil): DStream[String] = {
    in.map(i => {
        val cell = HbaseUtil.getCell("myTable", "myRowKey", "myFamily", "myColumn") 
        i + cell.getOrElse("")
    })
}

trait HBaseUtilBody {
    def getCell(tableName: String, rowKey: String, columnFamily: String, column: String): Option[String] = {
    val HBaseConn = ConnectionPool.getConnection()
    //the rest of the code will use HBaseConn 
    //to get a HBase cell and convert to a string
    }
}

object HBaseUtil extends HBaseUtilBody

I think getting data from HBase in RDD transformation would be a very common pattern. But I'm not sure how to unit test it without connecting to a real HBase instance.

Aucun commentaire:

Enregistrer un commentaire