@cbley many thanks for your suggestions.
from my test with your code, it got improvement, but not much.
please see below:
$ scalac -Xscript Hackwords words-parse3.scala
$ time scala Hackwords > words3.txt
real 0m29.338s
user 0m17.411s
sys 0m13.278s
$ cat words-parse3.scala
import scala.io.Source
import scala.io.Codec
import java.nio.charset.CodingErrorAction
implicit val codec = Codec("UTF-8")
codec.onMalformedInput(CodingErrorAction.REPLACE)
codec.onUnmappableCharacter(CodingErrorAction.REPLACE)
val patt1 = """[^0-9a-zA-Z\s].*""".r
val patt2 = """[a-z0-9]+""".r
val ws = """\s+""".r
val lines = Source.fromFile("msg.txt").getLines()
for {
line <- lines
if ! patt1.matches(line)
word <- ws.split(line)
if patt2.matches(word.toLowerCase) && word.size < 30
} {
println(word)
}
any further suggestion?
Thanks!