Parse json array to multiple fields

Hi, I’m calling an api GET method that has json payload to parse into multiple columns.

Schema:

root
 |-- Payload: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- ActiveDate: string (nullable = true)
 |    |    |-- BusinessId: string (nullable = true)
 |    |    |-- BusinessName: string (nullable = true)

JSON:

 {
    "Payload": 
	[
        {
            "ActiveDate": "2008-11-25",
            "BusinessId": "5678",
            "BusinessName": "ACL"
        },
        {
            "ActiveDate": "2009-03-22",
            "BusinessId": "6789",
            "BusinessName": "BCL"
        }
	]
}

CODE: In pyspark

  data = response.json()
  from pyspark.sql import *
  df = spark.read.json(sc.parallelize([data]))
  from pyspark.sql.functions import explode 
  df.select(explode("Payload").alias("x")).select("x.ActiveDate", 
  "x.BusinessId",           
  "x.BusinessName"                                                   
  ).show()

This is not splitting the JSON array as multiple columns. I’m trying to accomplish this in Scala. Could you please guide me.

Thank you

Hi @venkatb,

your question is very much Spark specific and this is a general discussion forum for Scala. I think you’re better off asking your question in a Spark forum or on Stackoverflow.

Hi @cbley , Sure, I will. Thank you for your response.

Regards

Spark’s JSON reader expects files to be in JSON Lines format. jq may help with conversion if you can pre process it before loading into Spark.