Creating Dataframe from text file as per column number


where 1-3 is Serial No (XXX)
4-7 is Day (1234 or 2345)


I would start with a Dataset[String] or even an RDD[String] and map it to the form that you want. If you really want a DataFrame, you could map each String to a Row with the data that you want in it. Personally, I’d make a case class for your data and map to that. It might look like this.

case class Part(serial: String, day: String, ...)

val parts = { line =>
  val serial = line.substring(0, 3)
  val day = line.substring(3, 7)
  Part(serial, day, ...)

This gives you a Dataset[Part] that you can then do whatever you want with referring to the columns by the field names of the case class.


Thanks Mark.
I was thinking same. But thought is there another way some of doing that.