Trying to parse JSON

I’m trying to parse a json file. I have written a really ugly expression which parses it correctly. And I have extractor-based solution which silently parses to an empty list.
I found the extractor-based approach in a discussion on stack overflow which at first looked straighforward until I actually tried to use it on a real example.

I have several issues.

  1. I’m importing import scala.util.parsing.json.JSON which works for me, but when I try to copy the code to scastie I get an error that object parsing is not a member of package util. Should I avoid using scala.util.parsing.json?

  2. The ugly version of my code filled with lots of .asInstanceOf[List[List[List[Double]]]] works. But the version using for comprehension silently fails. Unfortunately inserting _ = println(this and that does nothing. Is this related to this previous issue where the for comprehension is not optimized for debugging? How can I convince the println to actually print the value rather than being lazily ignored?

  3. Using the style of JSON parsing suggested by shahjapan, how can I iterate two different ways depending on a condition? e.g., depending on the value of geometryType I need to interpret the type of coordinates differently, so I still fill out the for comprehension with lots of .asInstanceOf[thisandthat]

    val try1 = for {
      // L(features) <- geo.asInstanceOf[Map[String,Any]]("features")
      M(countryFeature) <- features.asInstanceOf[List[Any]]
      _ = println(countryFeature)
      S(countryName) = countryFeature("name")
      M(geometry) = countryFeature("geometry")
      S(geometryType) = geometry("type")
      L(coordinates) = geometry("coordinates")
      perimeters <- geometryType match {
        case "Polygon" => extractPolygons(coordinates.asInstanceOf[List[List[List[Double]]]])
        case "MultiPolygon" => coordinates.asInstanceOf[List[List[List[List[Double]]]]].flatMap(extractPolygons)
        case _ => sys.error("not handled type="+geometry("type"))
      }
    } yield (countryName,perimeters)

Well, curiously enough when I type in the example from the stack overflow post, it also simply parses to empty list. So perhaps the problem isn’t my interpretation of the example, the example itself seems wrong.

Don’t reinvent the wheel. Use spray-json, play-json, µPickle,…

2 Likes

Not really re-inventing. I searched for JSON in Scala and got lots and lots of different solutions. So I picked one from the scala standard library scala.util.parsing.json.

I took a look, just now at micro pickle, the documentation talks a lot about how to write JSON, but not so much about reading it.

Thanks for the recommendations.

In current versions of Scala, the scala.util.parsing package was moved to a separately developed and versioned package, so it still can be used with the dependency added. But the json part of it was actually deprecated and removed.

I personally use circe for JSON handling, with which the parsing of one of your feature objects would have similar structure to your current code. I don’t think that you can reduce specifying of types much, as the types of JSON are not known until runtime and resolving the extractPolygons method requires knowing that it gets passed a 3-dimensional list.

case class Feature(id: String, properties: Map[String, Any], geometry: List[List[Location]])

import io.circe._
import io.circe.parser._

val json = parse("""
      {"type":"Feature","id":"ARE",
       "properties":{"name":"United Arab Emirates"},
       "geometry":{"type":"Polygon",
       "coordinates":[[[51.579519,24.245497],[51.757441,24.294073],[51.579519,24.245497]]]}}""")

val featureDecoder = new Decoder[Feature] {
  override def apply(c: HCursor): Decoder.Result[Feature] =
    for {
      id <- c.downField("id").as[String]
      properties <- c.downField("properties").as[Map[String, String]]
      geometryType <- c.downField("geometry").downField("type").as[String]
      coords = c.downField("geometry").downField("coordinates")
      perimeters <- geometryType match {
        case "Polygon" => coords.as[List[List[List[Double]]]].map(extractPolygons)
        case "MultiPolygon" => coords.as[List[List[List[List[Double]]]]].map(_.flatMap(extractPolygons))
        case _ => sys.error("not handled type=" + geometryType)
      }
    } yield Feature(id, properties, perimeters)
}

println(json.map(featureDecoder.decodeJson))

I’m not sure, how you would decode properties to a Map[String, Any], you’d probably have to specify some implicits explicitly.

2 Likes

To put a sharper point on it: do not use scala.util.parsing.json – it’s basically abandonware, and is way behind the times.

Instead, I would strongly recommend one of:

  • uPickle
  • Circe
  • Possibly play-json, particularly if you are already using Play

(There are other valid alternatives, but AFAIK these are the most popular.)

They’re all conceptually similar: for a given type, you declare typeclass instances of a Reader typeclass and a Writer typeclass. In all cases, there is a macro that makes this into a one-liner for regular data structures. (Tuples, case classes and ADTs.)

You may be expecting it to be more complicated than it really is. It’s nothing more than declaring the typeclass instance (which, like I said, is a one-liner call to macroRW if you have a straightforward case class), and then calling .read[MyClass]. There’s not all that much to say.

1 Like

Yes, that’s not so different than what I already have. Thanks.
One problem with the extractor method from stackoverflow, is that if it fails it just fails, with no clue, and inserting println into the for comprehension apparently has no effect. Does circle do a better job of helping you figure how what’s going wrong?

I’ve been using lift-json for 9 years and am quite happy with it.

Circe’s parse method returns an Either with a message describing the position (line, char, and if applicable field name) and error description.

2 Likes

I’m starting to look into this topic again. Thanks for the sample circe code.
My old project which is using Scala 2.11 also uses scala.util.parsing.json, so I’d like to convert this to something that is Scala 2.13 compatible.

It’s not clear to me how to generalize this to parse the actual json string that I have.

val jsonStr = """{"type":"FeatureCollection",
                     "features":[
                                 {"type":"Feature","id":"AFG",
                                  "properties":{"name":"Afghanistan"},
                                  "geometry":{"type":"Polygon",
                                              "coordinates":[[[61.210817,35.650072],[62.230651,35.270664],[60.803193,34.404102],[61.210817,35.650072]]]}},
                                 {"type":"Feature","id":"AGO",
                                  "properties":{"name":"Angola"},
                                  "geometry":{"type":"MultiPolygon",
                                              "coordinates":[[[[16.326528,-5.87747],[16.57318,-6.622645]]],
                                                             [[[12.436688,-5.684304],[12.182337,-5.789931],[11.914963,-5.037987],[12.436688,-5.684304]]]]}},
                                 {"type":"Feature","id":"ALB",
                                  "properties":{"name":"Albania"},
                                  "geometry":{"type":"Polygon",
                                              "coordinates":[[[20.590247,41.855404],[20.463175,41.515089],[20.605182,41.086226],[20.590247,41.855404]]]}},
                                 {"type":"Feature","id":"ARE",
                                  "properties":{"name":"United Arab Emirates"},
                                  "geometry":{"type":"Polygon",
                                              "coordinates":[[[51.579519,24.245497],[51.757441,24.294073],[51.579519,24.245497]]]}}
                                ]}"""

It seems to me that I need to call c.downField("features") to get a List and then iterate over the list. What I don’t understand is how to iterate over each item in the List

This is what I tried. Is feature <- ... the correct way to iterate over a list of Maps ?
Is this binding feature to a Map[String,Any] or is it binding feature to some sort of cursor object which I need to call downField on?

    val featureDecoder = new Decoder[Feature] {
      override def apply(c: HCursor): Decoder.Result[Feature] =
        for {
          feature <- c.downField("features").as[List[Map[String,Any]]]
          properties = feature("properties").asInstanceOf[Map[String, String]]
          nationName = properties("name").asInstanceOf[String]
          geometryType <- c.downField("geometry").downField("type").as[String]
          coords = c.downField("geometry").downField("coordinates")
          perimeters <- geometryType match {
            case "Polygon" => coords.as[List[List[List[Double]]]].map(extractPolygons)
            case "MultiPolygon" => coords.as[List[List[List[List[Double]]]]].map(_.flatMap(extractPolygons))
            case _ => sys.error("not handled type=" + geometryType)
          }
        } yield Feature(nationName, properties, perimeters)
    }

Here’s the well-intentioned but opaque error I get

Error:(81, 48) diverging implicit expansion for type io.circe.Decoder[A]
starting with lazy value decodeZoneOffset in object Decoder
          feature <- c.downField("features").as[List[Map[String,Any]]]

BTW, I didn’t really see in the circe documentation how to json lists.

I see you want to parse JSON into List[Map[String, Any]]. I don’t know what it means to parse JSON to Any, could you explain what you expect it to do?

If you want to be able to handle arbitrary JSON, tell Circe you want io.circe.Json, as in List[Map[String, io.circe.Json]].

@curoli, as you see in the example above “features” should somehow map to a list of maps. and each key thereof maps to a different type depending on its name. So How should I iterate over these features?

      override def apply(c: HCursor): Decoder.Result[Feature] =
        for {
          features  <- c.downField("features").as[List[Map[String,io.circe.Json]]]
    } ...

IntelliJ tells me that features has type Seq[Map[String, Json]]. That is confusing to me.

I found a magical incantation. Thanks to Djoe Pramono. There is an example in section Traversing JSON with Cursor where he uses the magical focus method.

Apparently I have to create a throwaway case class to use as a type parameter for an implicit value; I called it Feature.


    case class Feature(name:String, perimeters:List[List[Location]])

    implicit val memberDecoder: Decoder[Feature] =
      (hCursor: HCursor) => {
        for {
          name <- hCursor.downField("properties").downField("name").as[String]
          geometryType <- hCursor.downField("geometry").downField("type").as[String]
          coords = hCursor.downField("geometry").downField("coordinates")
          perimeters <- geometryType match {
            case "Polygon" => coords.as[List[List[List[Double]]]].map(extractPolygons)
            case "MultiPolygon" => coords.as[List[List[List[List[Double]]]]].map(_.flatMap(extractPolygons))
            case _ => sys.error("not handled type=" + geometryType)
          }
        } yield Feature(name,perimeters)
      }

    val features: Option[Json] = json.hcursor.downField("features").focus

    features match {
      case None => sys.error("cannot find members in the json")
      case Some(features) => {
        val maybeFeatureList = features.hcursor.as[List[Feature]]
        maybeFeatureList match {
          case Right(features) => features.map{f:Feature => (f.name, f.perimeters)}.toMap
          case Left(error) => sys.error(error.getMessage)
        }
      }
    }

This evaluates to

Map(Afghanistan -> List(List([35.7;61.2], [35.3;62.2], [34.4;60.8], [35.7;61.2])), Angola -> List(List([-5.9;16.3], [-6.6;16.6]), List([-5.7;12.4], [-5.0;11.9], [-5.7;12.4])), Albania -> List(List([41.9;20.6], [41.1;20.6], [41.9;20.6])), United Arab Emirates -> List(List([24.2;51.6])))

This is very different than my intuition which was to use a for comprehension.

I don’t know circe, but I’d be somewhat surprised if this was really required.

The most common mode for all these JSON libs is automagic mapping to Scala types through implicits/macros. At some lower level, they provide an API for manipulating their JSON object/array/value representations directly. Sometimes it may be convenient to mix, i.e. apply some custom low-level operations on the outer structure and defer handling of nested objects to the auto mapping level. (This seems to be going on in the blog post you refer to.) But you shouldn’t have to.

Here’s how I would approach your example using spray-json with json-lenses, sans error handling and the coordinate transmogrification you seem to apply.

def parsePerimeters(jstr: String): Map[String, List[List[Location]]] = {
  def parseLocation(coords: List[Double]): Location =
    coords match {
      case List(lat, lon) => Location(lat, lon)
    }
  def parseFeature(jso: JsObject): (String, List[List[Location]]) = {
    val name = jso.extract[String]("properties" / "name")
    val geometry = jso.extract[JsObject]("geometry")
    val coords =
      geometry.extract[String]("type") match {
        case "Polygon" => geometry.extract[List[List[List[Double]]]]("coordinates")
        case "MultiPolygon" => geometry.extract[List[List[List[List[Double]]]]]("coordinates").map(_.flatten)
      }
    name -> coords.map(_.map(parseLocation))
  }
  jstr.parseJson.extract[JsObject]("features" / *).map(parseFeature).toMap
}

Result on your sample data:

Map(
  Afghanistan -> List(List(Location(61.210817,35.650072), Location(62.230651,35.270664), Location(60.803193,34.404102), Location(61.210817,35.650072))),
  Angola -> List(List(Location(16.326528,-5.87747), Location(16.57318,-6.622645)), List(Location(12.436688,-5.684304), Location(12.182337,-5.789931), Location(11.914963,-5.037987), Location(12.436688,-5.684304))),
  Albania -> List(List(Location(20.590247,41.855404), Location(20.463175,41.515089), Location(20.605182,41.086226), Location(20.590247,41.855404))),
  United Arab Emirates -> List(List(Location(51.579519,24.245497), Location(51.757441,24.294073), Location(51.579519,24.245497)))
)

I’d expect something similar to be possible with the circe low-level API, without having to introduce an intermediate type. (The type might still be a good idea for other reasons, but then it wouldn’t be “throwaway”.)

1 Like

hi Patrick, thanks for the suggestion. Could you make a few comments about your code for anyone not familiar with spray-json and json-lenses?
What is (“features” / *)?
Which imports were necessary?
I notice you use .map(_.flatten) in the MultiPolygon case where I was using .flatMap. Is that just a style difference or is it a semantic difference? Could I have also used .map(extractPolygons) and .map(_.flatMap(extractPolygons)) as well.
Perhaps you rewrote it simply because the code for extractPolygons was missing from the posting?

A combination of lenses.

import spray.json._
import spray.json.DefaultJsonProtocol._
import spray.json.lenses.JsonLenses._

Probably the former. Basically it’s just dropping the third, redundant, list nesting level, in order to arrive at the same shape for the coordinates structure as for the polygon case. (In the polygon case, the outer level seems redundant at the JSON level, too, but it serves as the “single element multi-polygon” bracket wanted in the output.)

@jimka, @crater2150, @sangamon, @curoli, and All, I read the above comments So with the hope asking help, please help me with this question, here you can see Input Json and expected Output json.

I have tried with this code to find the relation tree for each node, but this is not giving output for List of Maps. I want the relation tree for List of Maps too, and with incremental identity for each similar siblings node. Output example is below:

val jsonFile = “C:/Users/Meenu/Desktop/Test.json”
val jsonData = scala.io.Source.fromFile(jsonFile).mkString

def convertJsonToMap(jsonData: String): Map[String, Any] = {
  implicit val formats = org.json4s.DefaultFormats
  parse(jsonData).extract[Map[String, Any]]
}
val flattenJson = convertJsonToMap(jsonData)

def traverse(el: Any, acc: List[String] = List.empty[String]): Map[String, Any] = el match {
case leaf: Int => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: BigInt => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: Double => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: List[JString] => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: String => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: List[List[DefaultJsonFormats]] => Map(acc.reverse.mkString("~>") -> leaf)
case leaf: List[JArray] => Map(acc.reverse.mkString("~>") -> leaf)
case data: Map[String, Any] => data flatMap {
case (k, v) => traverse(v, k :: acc)
}
} println(traverse(flattenJson))

OUTPUT is : For DISK and partition I am not getting the relation tree, which I need with incremental Number… Please help.

Test~>Hardware~>ThrottlePct -> 100, Test~>Hardware~>Disk -> List(Map(Name -> DISK0, Bytes_Write_MB -> 694, HardwareID -> 650FEC74, Write_Pct -> 4, Idle_Pct -> 91, Model -> TOSHIBA MQ01, Bytes -> 263, Size -> 476937, DiskID -> F91B1F36, Partition -> List(Map(Name -> Not Used, HardwareID -> 650FEC74, Size_MB -> 500, DiskGUID -> F91B1F36, Index -> 0))

I want like : Test~>Hardware~>ThrottlePct -> 100, Test~>Hardware~>Disk~>Disk-1~>Partition~>Partion-1 // Because disk has multiple instance with multiple Partition in each Disk.

Finally, I have to split the Json for each children along with its relation tree. Looking for help :slight_smile: