Deprecated range syntax

som-snytt · June 14, 2018, 4:10pm

That’s just a bug. The code comment is:

// XXX This may be incomplete.

And with the appropriate override,

scala> Range.Double(0, .7, .1).last
<console>:12: warning: method apply in object Double is deprecated (since 2.12.6): use Range.BigDecimal instead
       Range.Double(0, .7, .1).last
             ^
res1: Double = 0.6

som-snytt · June 14, 2018, 4:15pm

I still insist that in the age of literal types and macros, it’s not too much magic to insist on literals or at least take warning action. 0 to .7 by .1, give me BDs or Doubles or whatever seems to be expected.

Or at least enable a Propensive library to do it for me.

Russ · June 14, 2018, 5:42pm

That doesn’t work right on 0.1 to 0.299999999999 by 0.2.

But it’s “close enough for government work,” as they say!

I actually use something like this to discretize a bounding area for a numerical algorithm, and I definitely need to capture the end point. But I need to capture the end point even if it is in the middle of a step, which is a slightly different problem. I could just add the end point to the end of the sequence, but then I would usually be repeating the end point. So I came up with this little scheme:

def scalarStepsx(start: Scalar, end: Scalar, step: Scalar): Vector[Scalar] = {
// same as scalarSteps except guaranteed to include end point

val steps = scalarSteps(start, end, step)
if (areClose(steps.last, end)) steps else steps :+ end
}

def areClose(x: Scalar, y: Scalar) =
if (y == 0) x == 0 else abs(x / y - 1) < 1e-13

Ichoran · June 14, 2018, 6:10pm

That one also has counterexamples where it does the wrong thing (e.g. 0.1 to 0.300000000001 by 0.1). None of these are suitable for a library method that should act “intuitively”.

Ichoran · June 14, 2018, 6:14pm

@som-snytt - I don’t have any objection to a working macro. I’m not likely to be able to write one in a reasonable amount of time myself, though.

As Russ’s examples indicate, it’s tricky to get it working. The only really safe thing to do is pass literal numeric arguments into the BigDecimal string constructor, picking them directly out of the text of the code (not the Double literal computed by the compiler).

Russ · June 14, 2018, 7:48pm

I can’t sneak anything by you!

Seriously though, a person is extremely unlikely to actually use a number like 0.300000000001, and roundoff error will be a couple orders of magnitude less than 1e-12. Hence, I don’t see it as a practical issue. Nevertheless, I can understand that you cannot allow even the tiniest “loophole” in the standard language and library.

Ichoran · June 14, 2018, 9:22pm

Some applications actually hinge upon these kinds of differences–those that have chunked intervals where the intervals are used as a denominator, for instance, or those that count on hitting the endpoint exactly in order to generate a difference between a to b and a until b. This can be really important to get right if you’re, say, trying to generate angles between 0 and 2*Pi; overshooting on the last endpoint giving you a second approximately-zero angle can be a big deal.

I’d love to have a better story here, but unfortunately it is all too easy to have an “intuitive” result that’s just wrong. For example, people will reason, “Well, if I hit the endpoint exactly, to and until will be different, so I’ll just boost the endpoint up/down a tiny bit to make them the same,” and then they get weird unexpected behavior because it’s fighting secret heuristics in the algorithm put there to try to preserve a different kind of intuition.

Jasper-M · June 15, 2018, 8:00am

I can see that arithmetic with Doubles is imprecise, and that makes a naive range of Doubles unintuitive. But I don’t really see the problem anymore when you can make the steps of the range precise by using a BigDecimal underneath. The argument now is that you can give an imprecise result of a calculation with Doubles as input to the range (e.g. 0.1 until 3*0.1 by 0.1). But isn’t this just the case for everything one might do with Doubles? If that’s a reason not to have a range of Doubles, then shouldn’t you just remove Double itself?

For instance:

scala> Ordering[Double].equiv(0.3, 0.1 * 3)
res0: Boolean = false

Should we now deprecate Ordering[Double]?

Also, if you force people to use Range.BigDecimal instead, this is what’s going to happen:

scala> def someInput = 0.1 * 3
input: Double

scala> val range = BigDecimal("0.1") until someInput by 0.1
range: scala.collection.immutable.NumericRange.Exclusive[scala.math.BigDecimal] = NumericRange 0.1 until 0.30000000000000004 by 0.1

scala> range.last.toDouble
res1: Double = 0.3

Uglier code for the same result.

som-snytt · June 15, 2018, 4:35pm

I was already thrown by

scala> BigDecimal(.1 * 3)
res0: scala.math.BigDecimal = 0.30000000000000004

As was mentioned on the other thread, I think at least the folded constant should do the more obvious thing:

.1 * 3 : BigDecimal

where the expected type has to guide something somehow. Probably just every term is BigDecimal.

In the meantime, I think a lint rule is called for. Abide, abide. I mean Scalafix.

If anyone watched Agents of shield, I’d like the t-shirt that says, “I can Scalafix this!”

Ichoran · June 15, 2018, 4:55pm

There’s a limit to how much we can protect people. But the bottom line is that Double represents decimal fractions imprecisely, and NumericRange has an API that presupposes accurate treatment of endpoints. There’s an inherent conflict there. We shouldn’t present an API and then blame the user for assuming that it works reliably because of course Double is imprecise.

So either we need an alternate API, e.g. 0.1 to 0.7 size 7 and 0.1 to 0.7 every 0.1 where you promise you will hit the endpoints regardless (and the step size for every is not strictly adhered to); or we need to bail on Double entirely and/or leave the deprecations forever that tell people that what they’re trying to do can’t be made reliable because of the mismatch between endpoint assumptions requiring something that Double can’t deliver.

curoli · June 15, 2018, 7:12pm

I would add that arithmetic mixing floating points (e.g. Double) with fixed points aka decimals (e.g. BigDecimal) strongly smells like a broken design.

Decimals are only precise if your numbers don’t have more digits than your BigDecimal is configured to handle. For example, BigDecimal is imprecise for one third (0.3333…) and, with default precision, for (1e50 + 1).

Decimals are much more expensive than Doubles. Doubles are 8 bytes big and operations are hardware-supported simple atomic transactions, i.e. very fast. Decimals are user-level objects some dozens of bytes large, and every operation is complex and user-defined, i.e. very slow.

Doubles work excellent for most science, engineering and applied math use cases when used properly. If you ever find that Doubles are not precise enough, then almost always one of the following three is true:

(1) You have a pure math problem requiring many digits, like calculating the first million digits of pi, or finding the next biggest known prime number. In that case, neither Double nor BigDecimal will save you, and you will need your own custom types. (Ok, maybe BigDecimal may somehow work, but only if used very cleverly)

(2) You have some financial or legal use case that calls for decimals. For example, calculating an account balance, or appointing seats in parliament according to election results. In this case, BigDecimal will work, but only after you have made sure the rounding (MathContext) is exactly according to the rules.

(3) You are using the wrong algorithm. Ask yourself whether the end result will critically change if some numbers are slightly altered. If yes, your algorithm will not work. In particular, testing for equality is almost always an error. Testing for ordering is only fine if you can tolerate an unexpected ordering of numbers that end up close to each other.

For example, to get a Range of Doubles, a valid algorithm would be to first calculate the first and last number and number of intervals and then calculate all numbers from Int indices. On the other hand, repeatedly adding and comparing to some boundary is probably not useful.

Best, Oliver

RichType · June 16, 2018, 12:02pm

For starters what on earth is the problem with “until” which has also been deprecated?

Double is an imprecise class. The use of precision-less equality for Doubles should certainly be removed from the language. But the use of the comparison operators is precisely what the floating point classes were designed for. As range built on top of comparison, it is perfectly legitimate use of Double.

Have I ever been caught out by Double’s imprecision? Yes of course, but this part of the problem domain. It is not accidental complexity. Big Decimal should not be the default.

Ichoran · June 17, 2018, 8:27am

to includes the right endpoint. until does not. When you can’t tell where the endpoint is because Double is imprecise, this results in a pretty non-intuitive API.

The default isn’t supposed to be BigDecimal. The default should be that we don’t provide a confusing API. You can then get the desired functionality some other way that is predictable (like mapping Int ranges, if you want to be fast, or BigDecimal if you want to avoid having to do the math yourself).

jxtps · October 18, 2023, 10:42pm

Sorry for reawakening such an old discussion, but I just ran into this issue.

It seems like the source of the conundrum lies in the step size and its potentially fraught relationship with the start & end.

We can solve that by instead specifying the number of steps to take and have the step size be computed by the framework, thereby guaranteeing sensible behavior.

Example:

0.0 to 3.1415 in 4 steps  // => 0, 1.047166667, 2.094333333, 3.1415
0.0 until 3.1415 in 4 steps  // => 0, 0.785375, 1.57075, 2.356125

So a to b in k steps is guaranteed to start at a, hit k-2 ~evenly spaced points in between, and end at precisely b.

And a until b in k steps is guaranteed to start at a, hit k-1 ~evenly spaced points in between, and then stop before hitting b.

steps is just syntactic sugar:

def steps: this.type = this

So you can equivalently write:

0.0 to 3.1415 in 4 // => 0, 1.047166667, 2.094333333, 3.1415
0.0 until 3.1415 in 4 // => 0, 0.785375, 1.57075, 2.356125

depending on how much you want to Englishify your Scala

Thoughts?

som-snytt · October 19, 2023, 12:55am

To further anglicize the DSL:

0.0 to 3.1415 w/in 4

Probably any make-up due to rounding comes in the last interval?

jxtps · October 19, 2023, 6:07am

I was thinking something equivalent to this:

case class MyDoubleRange(start: Double, end: Double, steps: Int, inclusive: Boolean) {
  if (inclusive && steps < 2 || !inclusive && steps < 1) throw new IllegalArgumentException(s"Too few steps: $steps, inclusive: $inclusive")

  def foreach(f: Double => Unit): Unit = {
    val steps = if (inclusive) this.steps - 1 else this.steps
    f(start)
    for (i <- 1 until steps) f(start + ((end - start) / steps) * i)
    if (inclusive) f(end)
  }
}

I’m not sure I’m loving the exclusive semantics, maybe just do:

    val steps = this.steps - 1

Then we hit the same numbers for both cases, and exclusive-mode just omits the end.

That’s kind of equivalent to 1 to steps vs 1 until steps.

Since adding 0 to start is presumably a safe floating point operation, we could then simplify it to:

case class MyDoubleRange2(start: Double, end: Double, steps: Int, inclusive: Boolean) {
  def foreach(f: Double => Unit): Unit = {
    for (i <- 0 until (steps - 1)) f(start + ((end - start) / (steps - 1)) * i)
    if (inclusive && steps > 0) f(end)
  }
}

I’d basically be fine with either.

(And maybe steps should be renamed to points?)

jxtps · October 19, 2023, 7:11am

Hmm… that was all mildly confused and had some subtle bugs.

Here’s the mental model I think makes more sense:

0.0 to 3.1415 in 4 is equivalent to taking the integer range 0 to 4 and remapping it to an evenly spaced double range where 0 => 0.0 and 4 => 3.1415.

0.0 until 3.1415 in 4 is equivalent to taking the integer range 0 until 4 and remapping it to an evenly spaced double range where 0 => 0.0 and 4 => 3.1415, thereby omitting 3.1415 (since 4 gets omitted in the integer range).

a to b in 0 is ill-defined, you could make an argument for any of a, (a+b)/2, and b, but I think a is the right call.

a until b in 0 is empty.

a to|until b in k for k>0 is well defined.

Corresponding code (hopefully without bugs ):

case class MyDoubleRange3(start: Double, end: Double, steps: Int, inclusive: Boolean) {
  def foreach(f: Double => Unit): Unit = {
    if (steps == 0) {
      if (inclusive) f(start) // or f(end) or f((start+end)/2)
    } else if (steps > 0) {
      f(start)
      for (i <- 1 until steps) f(start + ((end - start) / steps) * i)
      if (inclusive) f(end)
    }
  }
}

We can bikeshed if the integer range should start at 0 or 1 - C vs Pascal anyone!?

I guess it depends on if you interpret k as the “steps between the points” (= start at 0), or the “number of points I encounter in the foreach()” (= start at 1).

For the latter:

case class MyDoubleRange4(start: Double, end: Double, steps: Int, inclusive: Boolean) {
  def foreach(f: Double => Unit): Unit = {
    if (steps == 1) {
      if (inclusive) f(start) // or f(end) or f((start+end)/2)
    } else if (steps > 1) {
      f(start)
      for (i <- 1 until (steps - 1)) f(start + ((end - start) / (steps - 1)) * i)
      if (inclusive) f(end)
    }
  }
}