case class MailRecord(MessageID: String, MailDate: String, MailFrom: String, MailTo: String, MailMessage: String)
And I have a RDD[(String,String)] output from wholeTextFiles. How do I flatMap RDD[(String,String)] to a RDD[MailRecord]? The second string in RDD[(String, String)] has the data for case class delimited by ‘\n’.
Yes, ever tuple in RDD is a single message. The first string in the tuple is the path of the message file in a folder. and the second string in the tuple is the entire message itself. how do I use map function to get the data from the message?
val rddIn: RDD[(String, String)] = ...
val rddOut = rddIn.map({ case ((messageFileLoc, message)) =>
//...here you use messageFileLoc and message to retrieve any data you need to create a YourCaseClass instance...
YourCaseClass(/* data gathered immediately above as input parameters */)
})
Thanks Brian.
When I retrieve any data from ‘message’ string, can I assign it to a variable within the block before I assign it to a case class instance. can you please provide an example?
Example: I want to retrieve the line that contains string "Message-ID: " in the string message within the tuple and assign that to MessageID in my case class.
Try it out before asking. Indeed you are allowed to create local variables within a function. I suggest you try things out, and if you get surprising results then post your full examploe and the errors that are confounding you to the list.