Hi there
val t1=Seq(("a","1"),("b","2")).toDF("id","value")
val t2=Seq(("a","3"),("c","2")).toDF("id","value2")
t1.join(t2,Array("id"),"left").show()
t1.join(t2,Array("id"),"left").na.drop().show()
def countMatch(s:String,pattern:String):Int={
s.sliding(pattern.length).count(_==pattern)
}
val counterWord = udf((s:String)=>{
keywordSeq.map(x=>(x,Util.countMatch(s,x))).filter(_._2!=0)
})
when i run
t1.join(t2,Array("id"),"left").na.drop().withColumn("counter",counterWord(col("value2"))).withColumn("size",size($"counter")).filter($"size" === 1).show()
i got a java null exception
but when i run
t1.join(t2,Array("id"),"left").na.drop().withColumn("counter",counterWord(col("value2"))).withColumn("size",size($"counter")).persist().filter($"size" === 1).show()
it worked.
i wander know why i can get the right answer when i use persist(). it seems that without persist(), the code run in the wrong order.