A little disappointed with unions in Scala

kiuhnm · July 6, 2024, 6:01am

I’m glad that Scala 3 introduced unions (and intersections), but I’ve just discovered there’s no narrowing and no support for structural typing.

The following would work both in Python (pyright / mypy) and TypeScript:

def f1(x: Int | String) = x match
  case _: String => "str"
  case i => String(i + 3)  // no narrowing

def f2(x: Int | String) =
  if x.isInstanceOf[String] then
    "str"
  else
    String(x + 3)     // no narrowing

class A(val x: Int)

class B(val x: Int)

def f3(x: A | B): Int =
  x.x            // no structural typing

Narrowing is not that useful in the simple cases above, as it actually interferes with exhaustiveness checking, but it’s a good feature to have in more complex scenarios involving type guards and nested logic.

Structural typing is also kind of nice, especially when working with unions.

Of course, Scala has tons of features Python doesn’t have or I wouldn’t even consider leaving Python for Scala.

spamegg1 · July 6, 2024, 7:19am

I think structural typing might have some soundness issues that need to be worked out (obviously Python wouldn’t care about that). It’s similar to type projections isn’t it? (which were removed in Scala 3 due to soundness issues) I could be wrong.

Scala 3 has path dependent types instead. For narrowing kind of stuff people use Refined or Iron Interestingly “refined types” is part of Scala 3 spec

The way to get around common member issue is extension methods. Not very nice but that’s how it is.

It looks like they decided not to make this into a SIP.
Maybe someone can make a SIP? Not sure how much work it is. Since it was discussed and not done, probably a decent amount of work.

Not related to unions, but somewhat relevant: Structural types in Scala 3

Sporarum · July 6, 2024, 11:42am

Note that the refinements from the spec are structural refinements “This type with this field (of this type)” (aka structural types, …), whereas the refinements from the library are predicate refinements “This type satisfies this (boolean) property” (aka refinement types, qualified types, …).

There is a project to bring predicate refinements to Scala 3, but it is still being researched.
For more info, even if slightly out of date, feel free to consult the slides I made about proposed syntaxes for it: go.epfl.ch/QT-slides

Sporarum · July 6, 2024, 11:57am

The reason this doesn’t work is that there is no notion of type difference in the Scala type system, which you would need here:

(Int | String) - String  =:=  Int

(This case is obvious, but a feature like this would need to work for arbitrarily complex types)

Note however that the following works:

def f1(x: Int | String) = x match // compiler sees this match as exhaustive
  case _: String => "str"
  case i: Int => String(i + 3)

This is because the type system can reason about unions:

(Int | String) =:= (String) | (Int)

The same applies to f2.

The reason this does not work is that a.x and b.x do not mean the same thing at the binary level, this is one of Scala’s (inevitable) leaky abstraction.

a.x might be at offset 0 from the start of the object, while b.x might be at offset 4, this is particularly true if there are many other fields.

This is not the case if x is defined in a parent type of A and B, as then the offset is guaranteed to be the same.

class Parent:
  val x: Int
class A extends Parent

class B extends Parent

def f3(x: A | B): Int =
  x.x            // works

The other way to solve this is as mentioned with structural (refinement) types:

import scala.reflect.Selectable.reflectiveSelectable

class A(val x: Int)

class B(val x: Int)

def f3(x: {val x: Int}): Int =
  x.x            // works

This however is discouraged, as the reflection used to make it work is a serious performance penalty.

Incidentally, the reason this works in both Typescript and Python is that in both all member access essentially goes through reflection

Bersier · July 6, 2024, 3:32pm

In Scala (and similar compiled languages), member names only serve as ways to identify a definition associated with a type. They are “transparent”, in the sense that the name itself is irrelevant, and is only meant to refer to a specific member. It’s similar to how two variables with the same name but in different scopes are completely unrelated. It is by design and has advantages beyond performance. I think calling that a leaky abstraction is misleading.

posco · July 6, 2024, 4:37pm

Well, this could be handled. For instance with x: A | B then x.y could compile to:

x match {
  case a: A => a.y
  case b: B => b.y
}

That seems like the natural meaning of x.y and annoying that you have to manually write it out.

Sporarum · July 6, 2024, 4:56pm

This only works if A and B can be told apart, which is not all the time, for example:
List[A] and List[B] erase to List, so cannot be told apart (without a Typetest instance)

I called it that because this comes up again and again, and highlights a difference between the syntax “two class members of the same name” and the underlying representation “two bitsets at potentially different offsets”.

This feels very similar to 0.1 + 0.2 != 0.3, and I’d call floating point arithmetic similarly a leaky abstraction, even though it works perfectly as designed.

It’s not impossible I’m using the term leaky abstraction wrong however ^^’

bjornregnell · July 7, 2024, 1:13pm

Here are my slides prepared from the SIP-meeting discussion:

Bersier · July 7, 2024, 1:57pm

Some advantages of having it written out:

Performance is clear.
Changing the name of y in A or B won’t break the code.
.y has a clear meaning, rather than sometimes being syntactic sugar for something else.

A little annoyance when writing in exchange for clearer code is almost always the right trade off. And in well written code, such cases should be rare.

Moreover, this syntactic sugar has very limited application, as it only works when both members of different types just so happen to have the same name. One could even argue that the applicability of syntactic sugar should never depend on whether two names coincide, as these are supposed to be transparent.

kiuhnm · July 9, 2024, 2:53am

You say that “type difference” would need to work for arbitrarily complex types, but maybe a specialized version would still be useful in most cases.

Efficient structural typing would require vtables-like data structures, but associated with the unions in the code instead of with the single types. JIT would probably help keep the vtables small in most cases, but I think it’s better to leave this stuff to dynamic languages where structural typing comes at no extra cost (emphasis on “extra”).

kiuhnm · July 9, 2024, 3:08am

… different types that most of the time we distinguish just by their having different names

What I’m saying is that names are important, and your “just so happen to have the same name” is a little unfair.

EDIT: Well, technically the name of types isn’t important either. What matters is their identity. Maybe “nominal typing” is a misnomer. It should be “Identity typing”.