Show key and value in the scala dataframe without adding null

kumarraj · December 9, 2017, 10:08pm

I have a Spark Dataframe

Col1 col2

val1 result1
val1 result2
val1 result3
val1 result4
val1 result5
val2 string1
val2 string2
val2 string3

I need to convert scala dataframe as,

val1 val2

result1 string1
result1 string2
result1 string3
result1
result1

I don’t want to print any null values in the dataframe table (at any empty value) I tried ,but for empty value, “null” prints. If I use dataframe.na, it is remove the entire row at null column. I just ant to remove null value where ever it comes and arrange the column values as above result table.

My second question, how to convert rdd[string,string] to rdd[string] with and new line applied. Thank you.

spaszek · December 10, 2017, 11:19am

Could you please make your first question more concise or reword it? I don’t understand the idea and what are you trying to achieve.

If I got the second question correctly - the following code should do the trick:

rdd.flatMap(x => Array(x._1, x._2)).map(string => string + "\\n")

kumarraj · December 10, 2017, 5:21pm

Hi spaszek,
thank you.
When I create dataframe and for empty column values, it shows as “null” as follows,

   val1      val2

result1    string1
result1    string2
result1     null
result1     null 
result1     string3

In the above, I want to remove null in column val2.
If more column comes with null ,I want to remove those specific null values.

So, for the above dataframe, I want result as,
I need to convert scala dataframe as,

val1 val2

result1 string1
result1 string2
result1 string3
result1
result1

In the above I need to remove null in column val2.
I tried with dataframe.na.isNotNull, but it removes rows wherever null comes.
But I need to remove null values at each column and don’t want to remove entire row.
Thank you…

spaszek · December 10, 2017, 7:28pm

Oh, okay.

Unfortunately, that is not really possible. There has to be some kind of indicator whether a specific row in specific column has a value or not. Think of dataframe as a 2D N x M sized matrix - every column has the same length.

Why would your data in this format though? What is your use case? We can figure something out if you tell us more

curoli · December 11, 2017, 4:13pm

What would you like to see instead of null? An empty String?

kumarraj · December 12, 2017, 7:27am

Hi,
I have two column with key and value pairs,
Such as,
Column A. Column B
A a1
B. a2
A a2
B. a2
C. a3
A. a2
B. a3

I need to collect A,B,C values as individual columns without delete duplicate values. But when I use pivot, agg methods, duplicate values deleted and null comes.

I want column as below,
A. B. C

a1. a2. NA
a2. a2. a3
a2. a3. NA

How can I group this and remove null.

Thank you…