Multiline string parsing

I have multi-line String(email) that I have to parse. I would like to get the below information from the string using Scala. Sometimes, few lines could be missing like the “To:” line in the email and the code should be able to handle that.

MessageId = 26025617.9085860263913.JavaMail.gia@basi
MessageDate = Mon, 20 Aug 2012 06:02:09 -0800 (PST)
MessageFrom = [email protected]
MessageTo = [email protected]
MessageSubject = Various Offers

Sample Email

val strMail = “”"Message-ID: 26025617.9085860263913.JavaMail.gia@basi
Date: Mon, 20 Aug 2012 06:02:09 -0800 (PST)
From: [email protected]
To: [email protected]
Subject: Various Offers
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Matt, Spartz </O=GMAIL/OU=NA/CN=RECIPIENTS/CN=MSPARTZ2>
X-To: Dave, Edgar </O=YAHOO/OU=NA/CN=RECIPIENTS/CN=Dedgar>
X-cc:
X-bcc:
X-Folder: \Dedgar (Non-Privileged)\Edgar, Dave\Inbox
X-Origin: Edgar-D
X-FileName: DEDGAR (Non-Privileged).pst

Dave:

Do you think you’ll have a chance to work on the various offers today? If not, could you give me an idea of when this week you think you’ll have some time.

Thanks, Matt"""

Any help? Thanks.

You could use a parser library like atto or fastparse, or you could do it ad hoc.

For the libraries, see their respective documentation. For the ad hoc approach, split the string at the first empty line (\n\n). The second part is the body. The first part is a header per line. For each line in linesIterator, split at the first :, the part before is the key and after the value and Bob is your proverbial uncle.

1 Like

There is a joke here omewhere about the lack of types in splitting strings, and that your uncle Bob wouldn’t mind, because you have to write tests anyway, but I can’t find the zinger to make it work.

As much as I’m a fan of parser combinators I think I might just use JavaMail to do this.

Actually, it’s not (always) one header field per line–don’t forget
that a header field can be folded across multiple lines.
Daniel

1 Like

I didn’t realize that. Somewhat less appealing then.

val headerBlock = strMail.linesIterator.takeWhile(line => !line.isEmpty)
val headers = headerBlock.foldLeft(List.empty[(String, String)]) {
  case ((k, v) :: rest, line) if line.startsWith(" ") || line.startsWith("\t") => (k -> (v + line)) :: rest
  case (l, s"$key:$value") => (key -> value) :: l
}

headers.toMap

will still be enough to do the trick for the specific (at least on 2.13), but using javamail will indeed probably be better.

I get the below error when I run the code in a notebook.

command-1082574194325029:5: error: method s is not a case class, nor does it have an unapply/unapplySeq member
case (l, s"$key:$value") => (key -> value) :: l

Thanks

support for using the s interpolator in pattern matching like that wasn’t added to the Scala standard library until Scala 2.13. (and Spark doesn’t have support for 2.13 yet, just 2.11 and 2.12)

1 Like