Saving spark SQL file as csv

Maninderpreet · May 28, 2020, 3:47pm

I am trying to save the output of SparkSQL to a path.

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
import spark.implicits._

// Define case classes for input data
case class Docword(docId: Int, vocabId: Int, count: Int)
case class VocabWord(vocabId: Int, word: String)

// Read the input data
val docwords = spark.read.
schema(Encoders.product[Docword].schema).
option(“delimiter”, " ").
csv(“hdfs:///user/ashhall1616/bdc_data/assignment/t3/docword-small.txt”).
as[Docword]
val vocab = spark.read.
schema(Encoders.product[VocabWord].schema).
option(“delimiter”, " ").
csv(“hdfs:///user/ashhall1616/bdc_data/assignment/t3/vocab-small.txt”).
as[VocabWord]

// Task 3a:
// TODO: *** Put your solution here ***
docwords.createOrReplaceTempView(“docwords”)
vocab.createOrReplaceTempView(“vocab”)

val writeDf = spark.sql(""“SELECT vocab.word AS word1, SUM(count) count1 FROM
docwords INNER JOIN vocab
ON docwords.vocabId = vocab.vocabId
GROUP BY word
ORDER BY count1 DESC”"").show(10)
writeDf.write.mode(“overwrite”).csv(“file:///home/user204943816622/Task_3a-out”)

But this shows an error:

error: value write is not a member of Unit
writeDf.write.mode(“overwrite”).csv(“file:///home/user204943816622/Task_3a-out”)
^
Can someone tell how can I make it work?

BalmungSan · May 28, 2020, 4:35pm

Remove the .show(30) which makes the value to be an Unit instead of the DataFrame.

PS: This shouldn’t have been that hard to fix by yourself; if you would have take the time to understand the error message a read the docs.