Show key and value in the scala dataframe without adding null

I have a Spark Dataframe

Col1 col2

val1 result1
val1 result2
val1 result3
val1 result4
val1 result5
val2 string1
val2 string2
val2 string3

I need to convert scala dataframe as,

val1 val2

result1 string1
result1 string2
result1 string3
result1
result1

I don’t want to print any null values in the dataframe table (at any empty value) I tried ,but for empty value, “null” prints. If I use dataframe.na, it is remove the entire row at null column. I just ant to remove null value where ever it comes and arrange the column values as above result table.

My second question, how to convert rdd[string,string] to rdd[string] with and new line applied. Thank you.

Could you please make your first question more concise or reword it? I don’t understand the idea and what are you trying to achieve.

If I got the second question correctly - the following code should do the trick:

rdd.flatMap(x => Array(x._1, x._2)).map(string => string + "\\n")

Hi spaszek,
thank you.
When I create dataframe and for empty column values, it shows as “null” as follows,

   val1      val2

result1    string1
result1    string2
result1     null
result1     null 
result1     string3

In the above, I want to remove null in column val2.
If more column comes with null ,I want to remove those specific null values.

So, for the above dataframe, I want result as,
I need to convert scala dataframe as,

val1 val2

result1 string1
result1 string2
result1 string3
result1
result1

In the above I need to remove null in column val2.
I tried with dataframe.na.isNotNull, but it removes rows wherever null comes.
But I need to remove null values at each column and don’t want to remove entire row.
Thank you…

Oh, okay.

Unfortunately, that is not really possible. There has to be some kind of indicator whether a specific row in specific column has a value or not. Think of dataframe as a 2D N x M sized matrix - every column has the same length.

Why would your data in this format though? What is your use case? We can figure something out if you tell us more :slight_smile:

What would you like to see instead of null? An empty String?

Hi,
I have two column with key and value pairs,
Such as,
Column A. Column B
A a1
B. a2
A a2
B. a2
C. a3
A. a2
B. a3

I need to collect A,B,C values as individual columns without delete duplicate values. But when I use pivot, agg methods, duplicate values deleted and null comes.

I want column as below,
A. B. C

a1. a2. NA
a2. a2. a3
a2. a3. NA

How can I group this and remove null.

Thank you…