Without regard to efficiency, which style should I take?

pengyh · February 10, 2022, 12:20pm

sorry I am really newbie to scala. I found I can write scala with the different styles, the code can be like Java, Ruby or something else.

The two pieces of code below do the same job, without regard to the efficiency, which style is better?

code 1:

import scala.io.Source

val patt1 = """[^0-9a-zA-Z\s].*"""
val patt2 = """[a-z0-9]+"""

val lines = Source.fromFile("msg.txt").getLines().filter(! _.matches(patt1))

for (x <- lines) {
  x.split("""\s+""").map(_.toLowerCase).filter(_.matches(patt2)).filter(_.size < 30).foreach {println}
}

code 2:

import scala.io.Source

val patt1 = """[^0-9a-zA-Z\s].*""".r
val patt2 = """[a-z0-9]+""".r

val lines = Source.fromFile("msg.txt").getLines()

for {
  line <- lines
  if ! patt1.matches(line)
  word <- line.split("""\s+""").map(_.toLowerCase)
  if patt2.matches(word) && word.size < 30
} {
  println(word) 
}

I want to force myself to write scala with a stable style, it should be more scalish.

Thanks.

jducoeur · February 10, 2022, 1:01pm

Honestly, the question is kind of a non-sequitur – both versions are reasonable, both are more or less equally “Scalish”, both are pretty common. Some folks prefer one style, some the other, but they’re both reasonable.

(That side-effecting println in the middle of both versions is the only part that I would say isn’t standard Scala idiom – folks tend to avoid side-effects like that, so any time you have a foreach() or a for-comprehension without a yield it’s kind of automatically a code smell. But for this tiny program it’s probably the appropriate way to do it.)

markehammons · February 11, 2022, 2:58pm

Personally I’d prefer code 2. It’s more readable, and your intention is clearer. That being said, code 1 can be cleaned up by putting each of the function calls on a separate line:

x.split("""\s+""")
  .map(_.toLowerCase)
  .filter(_.matches(patt2))
  .filter(_.size < 30)
  .foreach(println)

Looks nicer don’t you think?

pengyh · February 12, 2022, 12:59am

b/c i am pretty familiar with spark rdd operations ( i have many years ops in it), I personally prefer the first indeed.