What does "type-safe" mean in Scala?

eggo · October 30, 2024, 5:41am

I’m new here and I’m not sure if I understand “type-safe” in Scala 3 correctly. To me, “type-safe” means, simply put, that I am prevented from using apples when I would need pears. This should be detected by the compiler at compile time, so that no runtime errors or crashes occur.

What does Scala 3.4, Library 2.13.12 do now - here are three examples.

“a” == 1 gives a type error, as expected.

Set(“a”) == Set(1) results in ‘false’ at runtime, but no type error, why not?

The same behavior is shown by e.g.
Set(1,2,3).contains(None) or
Set(1,2).subsetOf(Set(“a”, “b”, “c”)), etc.
and similarly for lists etc. - they return ‘false’ and no type error.

The documentation of Scala’s scala.collection.immutable.Set:

trait Set[A] extends Iterable[A], Set[A], SetOps[A, Set, Set[A]], IterableFactoryDefaults[A, Set]
and there
def contains(elem: A): Boolean
or
def subsetOf(that: Set[A]): Boolean

So according to the documentation of Set[A], the parameters of the contains operation should be of type A and of subsetOf of type Set[A], shouldn’t they?

If the described behavior is not an error in the compiler or library implementation, but corresponds to the language design, then I’ll quickly say goodbye to Scala and stick to completely type-free languages like Python. There it is at least clear that the programmer has to find such nonsense himself.

So let’s get to the heart of the matter again: Is it really the design of the Scala language or its standard library that certain operations are usable for arbitrary types, even though the documentation gives the impression of “type-safe”?

Thanks for any comments or feedback.

spamegg1 · October 30, 2024, 8:04am

Welcome to the Scala community @eggo

My guess is that the two sets are inferred as a union type, like Set[String | Int].

If you manually provide type annotations, then you get the desired compiler error.

Also, if you define the two expressions as separate values, then you get the compiler error as well. Because you now forced the two collection’s types to be inferred separately.

The solution that always works is to provide the type annotations yourself.

scala> Set[Int](1,2,3).contains(None)
-- [E007] Type Mismatch Error: --------------------------------------------------------
1 |Set[Int](1,2,3).contains(None)
  |                         ^^^^
  |                         Found:    None.type
  |                         Required: Int
  |
  | longer explanation available when compiling with `-explain`
1 error found

Now I forced the type to be A = Int which cannot be used on None (which has type A = None.type).

But without explicit types being provided, what is A? The compiler has to pick one. We all have our expectations: some of us expect them to be inferred separately, some of us expect them to be inferred together. Type inference makes the language feel like it’s dynamic, like Python or Ruby, at times it feels like magic so we expect it to read our minds, but it can’t read minds (And some of us will always be unhappy with the default choice.)

But if you mix them in a single expression, type inference will work differently. Try to think from the type inference algorithm’s perspective: I am given a single expression that involves two sets, one of which has a string, the other has a number, and I need to figure out the type A in the signature of the method .contains[A]. I need to use the smallest type that contains both of them. That is the definition of the union type.

Type inference does not always do what you are thinking of! It takes some time to get used to it, to know when you can rely on it, when you can’t. Always tell yourself: “type inference CANNOT read my mind.” Type inference is an undecidable problem (at least in Scala’s type system), so in some situations it’s a best-guess effort. Type inference expectations is a big point of discussion and debate in the Scala community

Unrelated to type inference but related: Equality is a tricky concept in object languages (normally in Java we have to provide equals method ourselves), and in Scala 3 it got a bit trickier. I highly recommend reading Programming in Scala, 5th Edition especially Chapter 23.4 “Multiversal Equality” which has a discussion on how you can use strict equality with CanEqual.

I think that’s a bit harsh and hasty, but of course it’s your choice and I respect it. If you give it a chance you might like it! I also came from Python (and I like both languages). I’d say that Scala is a language that takes quite a while to learn and appreciate. Took me 2-3 years to really fall in love with it (and of course I still have my issues with it! Gotta have realistic expectations )

Make sure to use official resources to learn Scala, don’t use online tutorials or web searches.

Best of luck!

Quafadas · October 30, 2024, 8:35am

To add concretely to the link added by spamegg, this snippet may be of interest. I think it is the behaviour you want by default - just one import away.

The error message here is the compiler telling you that it’s willing to compare two sets, but not willing to compare a String and Double. Unless you ask nicely, and put a CanEqual instance of String and Double in context, and then it will compare string to double. To be clear, you should not do this unless you understand the (potentially painful) implications.

Yes, this provides an astonishing degree of flexibility surrounding what looks like a simple operation, and if you sit back and think, it’ll take you all the back to the nature of equality and what being “equal” actually means.

Sane by default, hackable by skill.

When writing scala, I regularly need to regularly sit back, and actually think. It is a different, more frustrating, and infinitely more rewarding experience (for me personally) than ragging a python repl - but YMMV…

alexelcu · October 30, 2024, 10:10am

To add to what the others have said, and what you’re probably thinking as well, regarding this…

To me, “type-safe” means, simply put, that I am prevented from using apples when I would need pears. This should be detected by the compiler at compile time, so that no runtime errors or crashes occur.

Scala is a statically typed language. This means that, indeed, the errors happen at compile-time, a distinction that changes the nature of those errors dramatically. When Scala code compiles successfully, it means that the compiler proved certain properties about your code. Compilers of static languages are essentially theorem provers that can eliminate entire classes of common errors. There is a fine print here, obviously, as due to pragmatism, such languages also have features that can make the type system less expressive or unsound when used, e.g., nullness, runtime reflection, OOP (implicit upcasting and forced downcasting); although Scala has been evolving solutions for common gotchas (e.g., explicit nulls, or compile-time reflection / reification).

Set("a") == Set(1)

Universal equality is unfortunately a legacy of Java, an OOP language that started out without type parameters (generics), so for versions older than Java 5, there was no difference between the two objects. Note, however, that OOP subtyping makes things complicated. For instance, should these be equal?

List(1, 2) == Vector(1, 2)

Set(1, 2) == List(1, 2)

Obviously, the former example shows two different types with different performance characteristics, however, semantically they are mostly the same, being sequences. Whereas for the latter, sets don’t have ordering, or the possibility of duplicated elements, so they can’t be equal, even if it can be iterated in some order.

So for what’s worth, the default behavior of Scala has been tightened in Scala 3:

To add an example to what was already mentioned by @spamegg1:

Things, however, get more complicated:

val list = List(1)
list.contains("a")
//=> false

This happens because of Scala’s OOP. List is a data type with a covariant type parameter. This means that a List[Int] is a subtype of List[Any]. So the above is like writing:

val list: List[Int] = List(1)
// No forced casting necessary
val listAny: List[Any] = list
// Obviously, it works because `String` is also a subtype of `Any`
listAny.contains("a")

This doesn’t happen with Set because Set has its type parameter invariant, so Set[Int] is not a subtype of Set[Any]. That it happened to you, as explained above, was due to type inference driven by the whole expression (which in other cases you actually want).

Scala is a very static and expressive language, but it’s also an OOP language. This has some pros and cons. Personally, I love its OOP nature, I love subtyping, but this comes with some flexibility and with implicit variance that people need to be aware of (a general rule of thumb for all OOP languages).

spamegg1 · October 30, 2024, 11:22am

To give another example @eggo have you tried .union?

scala> Set("a").union(Set(1))
val res0: Set[String | Int] = Set(a, 1)

As you see, the union type is inferred.
Of course some people might prefer that this .union is entirely outlawed… but Scala chose to enable this. It’s a trade-off.

BalmungSan · October 30, 2024, 2:01pm

Note that due the fact of universal equality.

"a" == 1

Should be valid.
Especially, because, you could do this:

val someData: Any = "a"
val otherData: Any = 1

someData == otherData

And that is valid due to liskov.
Something similar happens with contains.

Whenever or not universal equality is a good or bad idea, or if a typeclass based solution is always better or not. Are two different and long discussions which are better had with a humble attitude, understanding that programming languages are complex projects; where decisions aren’t taken in a vacuum.

Having said that.
While the above is technically correct, everyone agrees that "a" == 1 is always a typo / mistake / error.

As such, the compiler actually tries very hard to be helpful here and avoid instances where the equality check while sound is illogical. For the record, the previously mentioned multiversal equality is another attempt at improving this (but, as everything, it is a trade-off).

However, the compiler can’t catch all the situations where the check is illogical due a lot of reasons. Complexity of that being one, but also that other features of the language go in the opposite way; like union types.

So in one hand you want the compiler to be very strict and catch all errors. But in the other you want the compiler to be flexible enough to infer stuff like A | B.
As usual with Scala, the langue tries to be in a middle point where is not perfect for anyone but pretty great for everyone.

Additionally also note that type-safe is an spectrum.

e.g.

final case class Point(x: Double, y: Double)

val x = 3.5
val y = 2.1
val point = Point(y, x)

Everything there is typesafe yet the end result is a bug.
Another example:

final case class User(name: String, age: Int)
val luis = User(name = "Luis", age = -27)

Again, a negative age doesn’t make sense.

A fun enough, this language actually can help you avoid these errors, using refined types, opaque types, named arguments, etc.
But, if you try to push it very hard sooner than later you will find that the compiler can’t keep up.

There are actually other languages with even stronger and more powerful type systems. Yet, they see rare to none commercial use. And while there are multiple reasons for that, the main one is that pushing type safety to its limits is usually impractical.

There is a sweet spot where the language can catch a lot of things. But proper automatic testing and code review is still useful.
And, IMHO (and probably the opinions of many others), Scala lives in that middle point, with the possibility of moving both into the stricter and looser direction if you so desire.
The biggest caveat is universal equality, which we all simply learn to deal with.

Final note.
While indeed we all have been bitten by universal equality here and there. At least in my experience, it doesn’t happen that frequently. And for the cases where it matters most there are some alternatives like cats Eq.

Not wanting to sound rude, and just as a closing comment.
But this is kind of saying: “Well, seatbelts don’t save everyone so I may as well don’t use it”; of course respecting that the contexts are vastly different, but you get the idea.

I do understand where you are coming from. But I hope this and everything else everyone has mentioned kind of help you realize that is not as black-and-white.

spamegg1 · October 30, 2024, 2:35pm

Indeed! Learning Lean at the moment. Not practical but great for math proofs It’s actually kind of impressive how far Scala pushes its types while remaining practical and seeing industry use.

alexelcu · October 30, 2024, 3:38pm

BalmungSan:

Note that due the fact of universal equality.
"a" == 1
Should be valid.
Especially, because, you could do this:
val someData: Any = "a"
val otherData: Any = 1

someData == otherData
And that is valid due to liskov.
Something similar happens with contains.

In fairness, that example isn’t necessarily a valid one. One can argue that Any should not have an equals method on it, and that all the equality checks one should be able to do on Any are identity checks (i.e., eq in Scala).

And even though some ML languages also do this, e.g., OCaml, it’s only a good idea in the absence of something else (notably, OCaml lacks the means for ad-hoc polymorphism and its use of OOP isn’t as pervasive).

What I’m trying to say is that Object#equals is mostly an artifact of Java (with pros and cons), not of OOP, or Liskov’s substitution principle, or what-have-you. And for us fans of static typing, it sucks, regardless of historical reasons for being in the language, which is why it makes sense for Scala to try to alleviate its effects (and it tries doing that in Scala 3 with multiversal equality).

UPDATE: sorry, I edited the message multiple times

BalmungSan · October 30, 2024, 4:06pm

That is literally the point / definition of universal equality.

I said:

That is because:

trait Any {
  def equals(that: Any): Boolean
}

And both String and Int are subtypes of Any, as such both must have a equals(Any): Boolean defined in them.

The second snippet and the mention of Liskov is just to show why such equals has to accept an Any.

I didn’t say that was the case.
I just said that this language decided to also have universal equality, regardless of the reasons.

While yes it has its gotchas it also has its advantages.
AFAIK, mostly nobody uses cats-collections, and again there are multiple reasons for that but the typeclass heavy interface is one of those.
Dynosaur decided to have its own NonEmptySet mostly because the cats one needs an Ordering.
And as much as I like Eq in the context of tests, I rarely use it on main code. The same goes for Show, it is just too obnoxious to use; at least for me.
And the last time I tried to use strictEquality I simply desisted after 10 minutes of having to fix compiler errors.

It may be due to the fact that the language already has universal equals and toString, it could be because I am already used to it and hard to change, it may be because the alternative is not strictly better, or it may be because the alternatives are half-done and could be improved.
TBH, I am not sure.

eggo · October 30, 2024, 10:04pm

Many thanks to everyone for the informative and detailed answers, I have read them all (and more, e.g. Beyond Liskov Type Safe Equality in Scala) and received valuable advice.

My conclusion briefly summarized: Scala - like any other programming language - implements a type-checking compromise (since in full generality it is an undecidable problem). The path chosen here is rather dominated by OOP features (sub-typing with variant/invariant type constructors). This results in advantages and disadvantages that everyone simply should be aware of. Or simly put, like it or dislike it.

The reason for my (possibly harsh and premature) statement “say goodbye to Scala and stick to completely type-free languages” is my desire for the “principle of least surprise”. Admittedly, of course, this depends on my background and expectations. My preferred language style is functional programming, these languages often come with Hindley-Milner-like type systems and avoid some issues of OOP-like type systems. As spamegg1 mentioned: “type inference expectations are a big discussion point in the Scala community”.

However, general expressiveness of Scala, simplicity, usability, tool support, performance, level of surprise, etc. - all such characteristics make up a language, and the overall package is what counts in the end.

Thanks again to all for your help.

spamegg1 · October 30, 2024, 10:33pm

Hindley-Milner is really a wonderful world to live in, it spoils you for all other type systems so I totally get where you’re coming from (SML / Haskell was also my first functional). But I had already gotten over my “honeymoon phase”, if you will Subtyping is a whole other beast, and Scala 3’s Dot Calculus is actually quite impressive considering how hard subtyping makes things.