How would I repeat rows in a DataFrame?

scala_user · August 21, 2020, 2:19pm

Hi how’s it going?

If I have a Spark Dataframe like this…

  val df = spark.read
    .format("csv")
    .option("sep",",")
    .option("inferSchema","true")
    .option("header","true")
    .load(dbPath+"data"+".csv")

and I want to repeat the dataframe rows so that each row has 7 copies of the row in the dataframe, how would I do that?

here’s an example of the before and after I’m looking for, except done in Python.

Thank you!

mohitjaggi · August 21, 2020, 3:26pm

union the df with itself 6 times

scala_user · August 21, 2020, 7:37pm

This does not work. Is there anyone who can give me a response that works? I’ve been reading online about the spark.sql functions repeat and explode, could those be possible solutions? The rows that are repeated need to stay next to each other.

Thanks

mantovani · August 23, 2020, 11:08am

Hi,

That’s how you can do it.

val df = ...
val f = (0 to 5).foldLeft(df)((d,n) => d.union(df)).orderBy($"col1",$"col2")