I have two data frame. First have 4 column, Second have 1 column.
I joined this two dataframe (code under this post).
First DF was made from three .json.
Second DF was made from Array on which have hashCode from one of column from first DF.
All it works. All without SQL querries.
When I made SQL querries on “merged” DF … in added columned (from secound DF) values was this same. In other column it was done to SQL querries, this one no.
Why?
val ArrayA = mergeDataFrame.select("USER_id").rdd.map(r => r(0)).collect()
val xyz = ArrayA.map{_.hashCode }
val rdd = sc.parallelize(xyz)
val HashCode = rdd.toDF("HashCode")
val mergeDataFrameHashCode = mergeDataFrame.join(HashCode)