Why does Scala-lang not just provide approximately equals comparisions?

I know this topic might be discussed already, but googling I didn’t find any satisfying answers, yet.

So, why doesn’t scala-lang (not talking about unit-testing, where in those frameworks there exist work arounds) provide an “approximately equals” comparison for floats/doubles? Those types seem in all kind of languages (also JVM-based ones) to be affected by the short comings of the IEEE https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats .

That being said, the problem doesn’t get solved automatically. But solving it just inside your unit-tests (using any of the good frameworks around there) won’t make this problem go away in production.
Shading the problem in unit-tests would in my opinion not really help, because it would just move the time when you face the issue to on-runtime in production.

I would really prefer to have the language itself provide a solution to be able to choose in between flaky equality comparisons and approximately comparisions.

I would guess the reasons are: (a) it is not that commonly used and (b) it is hard to find a solution that fits everyone’s use cases.

How would you imagine the approximate equals to work?

1 Like

Like described in a lot of places, like this one: https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch02s06.html

I really do not understand why people is against including any dependency on their code. This is the kind of functionality that is better on a library because the library can be optimized for multiple use cases and fix bugs more quickly than the stdlib.

1 Like

Sure can, but since this might be hitting anyone using floats/doubles, all those people need to find the solution. So, why not just provide this simple thing from scala-lang? Would help to not let people re-invent the wheel here many times.

First using a library is not reinventing the well again. Second, as @curoli said, this is not as common as you are saying it is and also some people may need different solutions.

Additionally it is straight forward to implement it yourself, math.abs(d1 - d2) < error

True, it is not hard to implement it, but might need overriding changes to also <=, >=, etc.
And it seems to be so common, that even books mention it. Why not give it a try?

The problem for a general solution to this problem is one of scaling, If the two numbers are around 1e10, you clearly don’t want to use the same distance threshold as you would if they were around 1e-10. So my solution was to compare the ratio:

  def areClose(x: Double, y: Double) =
    if (y == 0) x == 0 else abs(x / y - 1) < 1e-13

This should find two numbers equal if the only difference is due to numerical misrepresentation (e.g., 0.1 + 0.2 == 0.3).

This is also useful for my Scalar class with physical units (because the units cancel out in the ratio if they are the same, as they must be for the comparison to be valid):

  def areClose(x: Scalar, y: Scalar) =
    if (y == 0) x == 0 else abs(x / y - 1) < 1e-13

However, I rarely use this little function.

1 Like

== has a contract (reflexivity, symmetricity, transitivity). “approximately equals” is not transitive. If a is approximately equal to b (math.abs(a - b) < e) and b is approximately equal to c (math.abs(b - c) < e) then a can be not approximately equal to c (math.abs(a - c) can be >= e ).

1 Like

While it is true that an “approximately equals” function is not transitive, that is practically irrelevant if it is used only to avoid spurious inequality due to numerical rounding error of 64-bit floating-point numbers. Consider the following:

scala> 0.1 + 0.2 - 0.3
res0: Double = 5.551115123125783E-17

So let’s say the rounding error is on the order of 1e-16. In my function, I used a threshold of 1e-13:

  def areClose(x: Double, y: Double) =
    if (y == 0) x == 0 else abs(x/y-1) < 1e-13

That is three orders of magnitude larger than the typical rounding error for 64 bits. The probability of transitivity becoming a practical issue is virtually zero. I could safely reduce my threshold to 1e-14 or perhaps 1e-15.

Note that an error of 1e-13 corresponds to a precision of less than a micrometer on 2000 miles – close enough for virtually all practical purposes. 64 bits is extremely high precision!

Having said all that, I see very little use for this function in actual operational software, but it could perhaps be useful in testing.

Errors can accumulate. Imagine array of numbers a(1) ... a(n) such that a(i) == a(i+1) for all i but a(1) != a(n).

I don’t think that could happen if the errors are all due to numerical misrepresentation.

If a(i) == a(i+1), doesn’t that mean they are all exactly equal?

But, yes, errors do propagate and accumulate.

For every calculation y = f(x1, ..., xN), the error of y is the sum of all errors of the xi, each multiplied by some factor (the derivative df/dxi), plus an additional error.

If the propagation factor is close to one, the errors cancel in part, since they have different signs, but they don’t cancel completely. It is like a random walk, and the total error is the error of one step multiplied by the square root of the number of steps.

However, many calculations are unstable, in that errors get propagated and grow exponentially. This is known as chaotic behavior. Other calculations are stable.

Knowing the characteristics of your calculation is key. You can also make an experiment: change an input parameter by a little bit and check how it influences the result.

I was talking about defining == for doubles as “approximately equals” and consequences of non-transitivity.

I agree that numerical errors can propagate and become unstable, but that is irrelevant to the potential use that I see for this “approximately equal” comparison function.

Let me give an example. Suppose I am calculating some quantity, and the implementation of the calculation may be unnecessarily complicated and/or inefficient (but I am confident that it provides a correct result). So I want to simplify it and/or improve it’s efficiency. To test and verify the new version, I would compare the results with the results from the old version. The results could be essentially the same but differ ever so slightly due to numerical roundoff. That is where this function could be useful.

Unit-testing libraries tend to have such functionality for this purpose.

OP is explicitly excluding testing though for their usecases.

Op does say though

Which gives me pause: what is exactly the usecase here for what’s flaky? While floating point arithmetic has rounding, it’s deterministic and should never be flaky.

1 Like

@martijnhoekstra The use case where I am using Scala is in data transformations area, Big Data / Hadoop / Spark. There I need to apply mass data transformations and this issue can cause an effect, where I transform data once into something and later need to transform it back. To be more specific: I encountered this when doing GPS longitude / latitude calculations, where values can be represented in either degree-like system (-180.0D to 180.0D and -90.0D to 90.0D), or arc-seconds-based (which is degree * 3600). So, when transforming value into the one representation system and then back to other other, I get different numbers. Which then will lead to data being in-equal.

So you have a very specialized and complex domain, and you are already depending on one of the biggest dependencies outhere which is Spark and you want something that is vital for your project to be implemented in a generic and simple way in the stdlib? Instead of using an appropriate library or doing your own tailored implementation yourself.

I may even say that you shouldn’t be doing those calculation using plain Doubles, but rather you should be using an appropriate data type that represents Locations / Coordinates; which may already be implemented in some library.
Or at least, instead of wanting an approximate equality, you actually should be using a BigDecimal with an appropriate MathContext. Which, surprise, that is actually part of the stdlib.

you mean BigDecimal?

Yes, sorry.