How to test the scala spark UNIT test in cluster( Hortonworks/Cloudera) mode. Please can you help me how to approach on.
lets say, have class and it has a method and need to test the same.
Sake of learning referred below example.
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
class wordcountlogic {
** def wc1(file: String, sc: SparkContext): RDD[(String, Int)] = {**
** val lines = sc.textFile(file, 2)**
** lines.flatMap(.split(" ")).map((, 1)).reduceByKey( + )**
** }**
}
Test Class as below.
class WordCountTest extends FunSuite with BeforeAndAfterAll {
var sparkConf: SparkConf = _
var sc: SparkContext = _
override def beforeAll() = {
sparkConf = new SparkConf().setAppName(“test wordCount”)
sc = new SparkContext(sparkConf)
}
val wordcount = new wordcountlogic
test("get word count rdd ") {
val result = wordcount.wc1("file.txt", sc)
assert(result.take(10) == 10)
}
override def afterAll() = {
sc.stop()
}
}
How to run this WordCountTest on cluster. once i get to know this then i can scale it to the my production applications.
If there are any better approach to perform UNIT test of scala spark applications.
Please provide suggestions/inputs.