I have multi-line String(email) that I have to parse. I would like to get the below information from the string using Scala. Sometimes, few lines could be missing like the “To:” line in the email and the code should be able to handle that.
val strMail = “”"Message-ID: 26025617.9085860263913.JavaMail.gia@basi
Date: Mon, 20 Aug 2012 06:02:09 -0800 (PST)
From: [email protected]
To: [email protected]
Subject: Various Offers
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Matt, Spartz </O=GMAIL/OU=NA/CN=RECIPIENTS/CN=MSPARTZ2>
X-To: Dave, Edgar </O=YAHOO/OU=NA/CN=RECIPIENTS/CN=Dedgar>
X-cc:
X-bcc:
X-Folder: \Dedgar (Non-Privileged)\Edgar, Dave\Inbox
X-Origin: Edgar-D
X-FileName: DEDGAR (Non-Privileged).pst
Dave:
Do you think you’ll have a chance to work on the various offers today? If not, could you give me an idea of when this week you think you’ll have some time.
You could use a parser library like atto or fastparse, or you could do it ad hoc.
For the libraries, see their respective documentation. For the ad hoc approach, split the string at the first empty line (\n\n). The second part is the body. The first part is a header per line. For each line in linesIterator, split at the first :, the part before is the key and after the value and Bob is your proverbial uncle.
There is a joke here omewhere about the lack of types in splitting strings, and that your uncle Bob wouldn’t mind, because you have to write tests anyway, but I can’t find the zinger to make it work.
I get the below error when I run the code in a notebook.
command-1082574194325029:5: error: method s is not a case class, nor does it have an unapply/unapplySeq member
case (l, s"$key:$value") => (key -> value) :: l
support for using the s interpolator in pattern matching like that wasn’t added to the Scala standard library until Scala 2.13. (and Spark doesn’t have support for 2.13 yet, just 2.11 and 2.12)