Need to read a csv file having nested structure for a column using scala

data sample is in csv and i have to read and write as parquet

id_a,id_b,temp,TRF
1000391,10003421,{“decider”:false,“Vectrums”:[0,0,0,1,1,1,1,0]},FALSE
100090441,1000091555,{“decider”:false,“Vectrums”:[0,0,0,1,1,1,0,0]},FALSE

schema:


id_a:string
id_b:string
temp:struct
decider:boolean
Vectrums:array
element:integer
trf:boolean



i am trying with following schema:

val structSchema=StructType(StructField(“temp”,StructType)
(StructField(“decider”,BooleanType(),True),
(StructField(“Vectrums”,ArrayType(IntegerType()),True)
)),StructField(“id_a”,StringType(),True),
StructField(“id_b”,StringType(),True),
StructField(“TRF”,StringType(),True))


error:
not found: value BooleanType
not found: value ArrayType


tried to import scala datatypes

expectation: Need to read the data which is in csv and write to parquet.how can i apply schema and read , please suggest if there is any sample code or outline for this kind of scenario

here is a working example using scala 3.2.0 and org.apache.spark:spark-sql_2.13:3.3.1

import org.apache.spark.sql.*
import org.apache.spark.sql.types.*
import org.apache.spark.sql.functions.*

val tempStructSchema =
  StructType(
    List(
      StructField("decider", BooleanType, true),
      StructField("Vectrums", ArrayType(IntegerType), true)
    )
  )

val structSchema =
  StructType(
    List(
      StructField("id_a", StringType, true),
      StructField("id_b", StringType, true),
      StructField("temp", tempStructSchema, true),
      StructField("TRF", StringType, true)
    )
  )

@main def readCsv(path: String, out: String) =
  val spark = SparkSession.builder
    .config("spark.master", "local")
    .getOrCreate()
  import spark.sqlContext.implicits.given
  val logData = spark.read
    .option("header", true) // reading the header
    .csv(path)

  val logData2 = logData.withColumn("temp", from_json($"temp", tempStructSchema))
  assert(logData2.schema == structSchema)
  logData2.write.parquet(out)
2 Likes

tempStructSchema creating a tempstruct within a structfield
thankyou this helped me