Unicode escapes in triple quoted strings are deprecated - perhaps reverse that decision?

Upgrading a large 2.12 codebase to 2.13 and ran into the following:

Unicode escapes in triple quoted strings are deprecated; use the literal character instead
[warn]   def nonascii: Parser[String] = """[^\u0000-\u0177]""".r

I don’t know precisely what glyphs those particular unicode characters are represented by, but I’m 100% certain I want to see \u0000-\u0177 instead of single characters that I’ll have no idea what they are.

While I can of course write "[^\u0000-\u0177]".r in this particular case, I really like having regexes in triple quotes (it’s very useful) and it would be a shame to not be able to mix unicode escapes & regexes.

Any chance we could revert this deprecation?

(I recognize that for “normal” strings there’s a case for just inserting the character)

The back story is that Unicode escapes used to be a preprocessing step when reading source, as a legacy of Java.

They got rid of that, but for migration, they kept Unicode escapes where you don’t normally see escape processing, in triple quotes and the raw interpolator.

For regex, you want the new behavior where the escapes are ignored, which is how Scala 3 works or Scala 2 under -Xsource:3. That leaves the escape as regex syntax, which is why the raw interpolator is so handy.

The switches for Scala 2 are currently baroque:

$ scala -Xsource:3 -Xmigration -Wconf:cat=scala3-migration:s
Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 21).
Type in expressions for evaluation. Or try :help.

scala> raw"[^\u0000-\u0177]".r
val res0: scala.util.matching.Regex = [^\u0000-\u0177]

In 2.13.13, I think it’s -Xsource:3cross to get the behavior (“ignore Unicode escapes”) without warnings. That cross-compiles with Scala 3. There will be a different option -Xsource:3migration for warnings without behavior.

2 Likes

A FAQ entry on the (single- / triple quote) x (interpolator types) x (Scala 2 / 3) behavior matrix would be good to have. (Not saying you need it, @jxtps.)

I think the best cross-building option right now is pulling the unicode characters out into a local in cases where the rest of the regex benefits from triple quotes / raw strings:

def nonascii = {
  val range = "\u0000-\u0177"
  raw"""[^$range]""".r
}

(Using raw requires escaping $$, but most likely an existing $ triggers a compile-time error when adding the raw interpolator).

Going back to interpreting unicode escapes in triple quoted strings would be a Scala 3 language change. The goal is to treat unicode escapes the same as all other escapes, which I think makes sense.The difference to Scala 2 doesn’t show up in the Incompatibility Table though.

1 Like

Oh, and I finally got what @som-snytt is saying, that regexes support \uNNNN escapes, so another option is using @nowarn.

Scala 2.13:

scala> @annotation.nowarn("msg=Unicode escapes") val ab = """[\u0061-\u0062]""".r
val ab: scala.util.matching.Regex = [a-b]

scala> ab.matches("b")
val res3: Boolean = true

Scala 3:

scala> val ab = """[\u0061-\u0062]""".r
val ab: scala.util.matching.Regex = [\u0061-\u0062]

scala> ab.matches("b")
val res0: Boolean = true

after all these years! I already feel like I’ve broken through in 2024.

2 Likes

An FAQ would be fantastic - I definitely do need it :wink:

For now I’m just ignoring these warnings. My aspiration is to upgrade all the way to Scala 3 where I take it they will go away by themselves.

I submitted a PR, per lrytz, putting my money where my mouth is, as the saying goes.

A different saying is, In for a penny, in for a pound.

There is some sort of CI failure which requires further doc process expertise.

1 Like