I was trying to unit test doSomethingRdd
which requires to read some reference data from HBase in rdd transformation.
def doSomethingRdd(in: DStream[String]): DStream[String] = {
in.map(i => {
val cell = HbaseUtil.getCell("myTable", "myRowKey", "myFamily", "myColumn")
i + cell.getOrElse("")
})
}
Object HBaseUtil {
def getCell(tableName: String, rowKey: String, columnFamily: String, column: String): Option[String] = {
val HBaseConn = ConnectionPool.getConnection()
//the rest of the code will use HBaseConn
//to get a HBase cell and convert to a string
}
}
I read this Cloudera article but I have some problem with their recommended methods.
This first thing I tried was using ScalaMock to mock HBaseUtil.getUtil
method so I can bypass HBase connection. I also did some workaround in order to mock Object singleton suggested by this article. I updated my code a bit like below. However, doSomethingRdd
failed because mocked hbaseUtil is not serialization which also explained by Paul Butcher in his reply
def doSomethingRdd(in: DStream[String], hbaseUtil: HBaseUtilBody:HBaseUtil): DStream[String] = {
in.map(i => {
val cell = HbaseUtil.getCell("myTable", "myRowKey", "myFamily", "myColumn")
i + cell.getOrElse("")
})
}
trait HBaseUtilBody {
def getCell(tableName: String, rowKey: String, columnFamily: String, column: String): Option[String] = {
val HBaseConn = ConnectionPool.getConnection()
//the rest of the code will use HBaseConn
//to get a HBase cell and convert to a string
}
}
object HBaseUtil extends HBaseUtilBody
I think getting data from HBase in RDD transformation would be a very common pattern. But I'm not sure how to unit test it without connecting to a real HBase instance.
Aucun commentaire:
Enregistrer un commentaire