Convert a List of string to any case class

yummydum · April 28, 2021, 7:21am

I have a csv file and a case class that models each row of the csv file.
However, it is cumbersome to initialize a case class with many attributes.

For a example, given a case class such as below,

case class Airline(
    airline: String,
    avail_seat_km_per_week: Long,
    incidents_85_99: Int,
    fatal_accidents_85_99: Int,
    fatalities_85_99: Int,
    incidents_00_14: Int,
    fatal_accidents_00_14: Int,
    fatalities_00_14: Int
)

I have to write a function like this to initialize the case class

def parse(row: Seq[String]) = {
  Airline(
    row(0),
    row(1).toLong,
    row(2).toInt,
    row(3).toInt,
    row(4).toInt,
    row(5).toInt,
    row(6).toInt,
    row(7).toInt
  )
}

Since this kind of situation frequently occurs for my project, I thought it would be nice to have a function to convert a List of String to a case class with the correct type in a type-safe manner.

I noticed that what I’m trying to do is to make an inverse function of the function described here, so I followed this approach.

Here is what I made so far.


trait FieldDecoder[A]:
  def decodeField(a: String): A

trait RowDecoder[A <: Tuple]:
  def decodeRow(a: Row): A 

given FieldDecoder[Int] with
  def decodeField(x: String) = x.toInt

given FieldDecoder[Boolean] with
  def decodeField(x: String) = x.toBoolean

given FieldDecoder[Long] with
  def decodeField(x: String) = x.toLong

given FieldDecoder[String] with
  def decodeField(x: String) = x

given RowDecoder[EmptyTuple] with
  def decodeRow(row: Row) = EmptyTuple

given [H: FieldDecoder, T <: Tuple: RowDecoder]: RowDecoder[H *: T] with
  def decodeRow(row: Row) =
    summon[FieldDecoder[H]].decodeField(row.head) *: summon[RowDecoder[T]]
      .decodeRow(row.tail)

def csvToTuple[X <: Tuple: RowDecoder](row: Row): X =
  summon[RowDecoder[X]].decodeRow(row)

The code compiles, and I wrote a test

  @Test def testcsvToTuple(): Unit = {
    assertEquals(
      (42, true, "Hello"),
      csvToTuple(List("42", "true", "Hello"))
    )
  }

However this fails; csvToTuple produces a empty tuple.

java.lang.AssertionError: 
Expected :(42,true,Hello)
Actual   :()

I wonder where I went wrong?

The current code I implemented is here, which also includes conversion from tuple to List of String ( I got nice help in this topic)

Jasper-M · April 28, 2021, 7:41am

The csvToTuple method can’t know to which type it has to convert your List. It works when you explicitly provide a type argument, or when the expected type can be inferred from the call site.

csvToTuple[(Int, Boolean, String)](List("42", "true", "Hello"))

val tup: (Int, Boolean, String) = csvToTuple(List("42", "true", "Hello"))

Jasper-M · April 28, 2021, 8:33am

You can make this a bit less cumbersome by converting to a case class:

import scala.deriving.Mirror.ProductOf

def csvToProduct[P](row: Row)(using p: ProductOf[P], d: RowDecoder[p.MirroredElemTypes]): P = 
  p.fromProduct(d.decodeRow(row))

case class Foo(i: Int, b: Boolean, s: String)
val foo = csvToProduct[Foo](List("42", "true", "Hello"))

sangamon · April 28, 2021, 10:57am

Just an aside… Note that the encoding and decoding scenario are not as symmetrical as it may seem at first glance.

Here you declare that the empty tuple should always be encoded as an empty list, which matches intuition and doesn’t leave any ambiguity.

Here, on the other hand, you claim that any row can be decoded as an empty tuple (which is what’s triggered in your failing test case), whereas intuitively you’d probably want to constrain this decoding to the empty row.

I have no immediate suggestion how to “fix” this - I’m a Scala 3 beginner, as well, and I don’t have much experience with tuple/HList magic, so far. (And I fear it’d make things considerably more complicated.) Plus, it’s probably fine for now - you don’t have actual type safety here, since you need to explicitly declare your desired result type, anyway, as demonstrated by @Jasper-M. But this is the kind of type level issues that Scala encourages to think about more than other languages.

Another asymmetry is that field encoding will always succeed (there’s a #toString for every type), whereas field decoding can fail - not every string can be parsed as an int, etc. Again, this is fine for now, failures will raise exceptions (which you’ll likely want to handle at the call site or above). At some point you may (or may not) want to make this failure mode explicit in the types, though, e.g. by using Either[Throwable, A] as a return type rather than plain A and catching/wrapping exceptions accordingly at the implementation site.

yummydum · April 28, 2021, 11:41pm

@Jasper-M @sangamon
Thank you so much for the suggestions!
As @Jasper-M mentioned, csvToTuple has a type parameter but I did not supply the type argument, that was the problem.
Also the method for the case class is much better, this is what I wanted!
Now I’m playing with Mirror to try to implement the inverse-ish function productTocsv as a practice

The asymmetry of the encoder vs decoder @sangamon suggested is true, I think this is a nice opportunity to practice functional error handling. I’ll play around with it. Thanks.

Ideally, I wanted to capture this error at compile-time, but this time I couldn’t.
Points I couldn’t figure out are

Why did csvToTuple use EmptyTuple as the type argument when I forgot to pass any type argument? Where did it find it?
How can I represent an empty List at type-level? I expected there is a List equivalent for EmptyTuple, but that doesn’t seem to exist. I found an compact library for empty list at type level, so I will try to understand this.

I will take a further look at these issues. Maybe it worths an independent topic, so I may post another topic.

sangamon · April 29, 2021, 1:25pm

I don’t have a formal, spec-based explanation, but informally: The compiler is looking for some type to fill in for X in csvToTuple that it can conjure up a RowDecoder for. There are two givens for RowEncoder. The one for EmptyTuple is applicable to any Row (as discussed) and up for taking. For the other one it would have to “invent” some tuple type, triggering a recursive search for one or more given encoders (for the “tail” component(s)) that ultimately would need to end with the base case (i.e. EmptyTuple). Since you don’t provide any further constraints, EmptyTuple is a straightforward match.

There is, it’s Nil (and the equivalent to *: is ::). The problem is that compiler would need to unravel the tuple and list types in lockstep in order to “unify” the empty list with the empty tuple, i.e. (again, informally)

String :: String :: Nil => Int *: Boolean *: EmptyTuple
String :: Nil => Boolean *: EmptyTuple
Nil => EmptyTuple

The structure (i.e. size ~ “nesting depth”) of a List is not known at compile time though. It may be possible to get something this to work with some advanced type system magic, but I definitely wouldn’t want to go there.

Options:

Just leave as is. This means “trailing” entries in the row will silently be ignored, e.g. csvToTuple[(Int, Boolean)](List("42", "true", "a")) == (42, true).
Raise a failure at runtime(!) if the passed row is not empty in the empty tuple decoder.
Use fixed size string tuples as input, convert from list in earlier stage.

Perhaps somebody has other/better suggestions…

hmf · February 1, 2022, 2:25pm

Could not get this to compile.

For the record here is a minimal working example

HTHs anyone in the future.