Punctuation removal in file


#1

I am newbie to programming and I am learning scala program.
I need to remove the puctuations in a text file and write the output to another file. Below code I am using and got struct with error. Can anyone please help.

scala> import scala.io.Source
import scala.io.Source

scala> import java.io.File
import java.io.File

scala> import java.io.FileWriter
import java.io.FileWriter

scala> import java.io.BufferedWriter
import java.io.BufferedWriter

scala>
| object FileReader {
|
| def main(args: Array[String]): Unit = {
| val file = Source.fromFile("/home/hadoop/Desktop/TheCompleteSherlockHolmes.txt")
| val outputFile = new File("/home/hadoop/Desktop/TheCompleteSherlockHolmesStripped.txt")
| val writer = new BufferedWriter(new FileWriter(outputFile))
|
| // use curly brackets {} to tell Scala that it’s now a multi-line statement!
| file.getLines().foreach{ line =>
| file.replaceAllIn("""[\p{Punct}&&[^.]]""", “”)
| println(line)
| writer.write("***" + line + "*")
| writer.newLine()
| }
| writer.flush()
| writer.close()
| }
| }
:36: error: value replaceAllIn is not a member of scala.io.BufferedSource
** file.replaceAllIn("""[\p{Punct}&&[^.]]""", “”)


#2

The error message is telling you: the file doesn’t have a method called replaceAllIn. You want to use a string, not a file.


#3

Thanks for your reply. Can pls guide where I need to change?


#4

Just change the line reading “file.replaceAllIn(”""[\p{Punct}&&[^.]]""",
“”)" to:

val strippedLine = line.replaceAllIn("""[\p{Punct}&&[^.]]""", “”)

and change the next couple of lines like so:

println(strippedLine)
writer.write("***" + strippedLine + “***”)

Explanation


#5

thanks bmaso… After changing the lines got the below error.
:36: error: value replaceAllIn is not a member of String
val strippedLine = line.replaceAllIn("""[\p{Punct}&&[^.]]""", “”)

so changed the lines like the below
val s = Source.fromFile("/home/hadoop/Desktop/TheCompleteSherlockHolmes.txt").mkString
s.replace("""[\p{Punct}&&[^.]]""", “”)

and able to compile to code…


#6

replaceAllIn is a member of RegEx, not string, so you can use code of the form:

val expr = """[\p{Punct}&&[^.]]""".r // defines a regular expression
val strippedLine = expr.replaceAllIn(line, “”)

#7

thanks… Could see some characters are removed but still the character " is still there in the replaced file. Can pls tell how to append your expression


#8

Please look up the documentation for regular expressions and adjust the regular expression to include exactly the characters you want (be warned that some of them may need to be escaped).

regards,
Siddhartha