I found a bug for split

scala> "1|12|13".split("|").foreach(println)
1
|
1
2
|
1
3

scala> "1,12,13".split(",").foreach(println)
1
12
13

As you see the first split with “|” for sep can’t work?

Thanks.

Use single quotes, or escape the pipe. The problem is that you have a regex there.

3 Likes

To be more explicit, split is actually a method in Java and the argument for the delimiter is interpreted as a regular expression. The pipe, |, is a special character for regular expressions. So if you want to split with that as the delimiter you either need to use "\\|" or """\|""" as the argument.

4 Likes

Thanks for the explaining.
before knowing this I have handled the problem by using a direct regex.

  val regex = """^(.+?)\|.*""".r.unanchored
  val names = Source.fromFile("people.txt").getLines().map { case regex(z) => z }.toList

“1|12|13”.split(“\|”).foreach(println)

That doesn’t work, but I like to use raw interpolator for regex:

scala> "1|12|13".split("\|").foreach(println)
                         ^
       error: invalid escape character

scala> "1|12|13".split(raw"\|").foreach(println)
1
12
13

Also no one has suggested

scala> "1|12|13".replace('|','\n').lines.forEach(println)
1
12
13

as a way to avoid split and regex.

2 Likes

I found a case match can fit many requirements for ops of string.

in ETL a case match is easy to extract the data such as:

scala> val regex = """^(.+?)\|.*fax:(.*?)\|""".r.unanchored

scala> val li = List("Luca X.G.|1981-11-10|1,m|tel:55684,fax:98965|Glazier,Architect","Sullivan O.H.|1989-6-14|1,m|tel:28415,fax:27232|Cashier,Database Administrator")

scala> li.map{ case regex(name,fax) => (name,fax) }
val res0: List[(String, String)] = List((Luca X.G.,98965), (Sullivan O.H.,27232))

Thanks