Scala how to flat first few rows of a dataframe into string and save it in another dataframe

#1

Hi, I have a dataframe which is 3xN dimension I want to choose only the string columns and flat the 3 rows in to a string with a space in between and get the result into another dataframe. Specifically, I want to loop through the columns (which contain all the data types possible) and filter the string columns only and flat them. For example:
I have got DF like
A(Int) B(String) C(String) D(Double) E(long) etc.

1 MR M 51278 33231
2 CO F 55321 33421
3 MR M 32411 33411

What I want is
B(String) C(String)

1 [MR CO MR] [M F M]

Thanks in advance SSM

#2

There is probably a better way but with collects, you can do this

val data = List(Row(1, "MR", "M", 51278, 33231), Row(2, "CO", "F", 55321, 33421), Row(3, "MR", "M", 32411, 33411))
val rdd = sc.parallelize(data)
val struct =
  StructType(
    StructField("c1", IntegerType, false) ::
    StructField("c2", StringType, false) ::
    StructField("c3", StringType, false) :: 
    StructField("c4", IntegerType, false) ::
    StructField("c5", IntegerType, false) :: Nil)
val df = spark.createDataFrame(rdd, struct)
val result = (df.select("c2").collect().map(_(0)), df.select("c3").collect().map(_(0)))