Hi,
I tried to merge two dataframes, but facing duplicate rows problem,
DATAFRAME1 (df1)
±--------------------------------±-----------±-------------------+
|Val_1 |RES_1 |OWNER_1 |
±--------------------------------±-----------±-------------------+
|val-a |PASS |OWN-1 |
|val-b |PASS |OWN-2 |
|val-c |FAIL |OWN-2 |
±--------------------------------±-----------±-------------------+
DATAFRAME-2 (df2)
±------------------------------±--------------±------------------+
|val_2 |RES_2 |OWNER_2|
±------------------------------±--------------±------------------+
|val-d |FAIL |OWN-3 |
|val-e |PASS |OWN-4 |
|val-f |FAIL |OWN-5 |
±------------------------------±--------------±------------------+
I need final merged dataframe as,
±-------------------±---------±------------------±---------------±-----------±---------------+
|Val_1 |RES_1 |OWNER_1 |val_2 |RES_2 |OWNER_2
±-------------------±---------±------------------±---------------±-----------±---------------+
|val-a |PASS |OWN-1 |val-d |FAIL |OWN-3|
|val-b |PASS |OWN-2 |val-e |PASS |OWN-3|
|val-c |FAIL |OWN-2 |val-f |FAIL |OWN-3|
I tried with,
val df3 = df1.join(df2,df1(“val1”)=!= df2.col(“val2”))
df3.show()
But it creates duplicate rows in merged dataframe.
How can remove duplicate rows.
I used df3.dropDuplicates(), but no use.
thank you.