Detecting integer overflow

jimka · February 24, 2019, 10:08am

I have a Scala application which explores a HUGE state space of a particular graph transition function. I’m currently using a Map[String,Int] to detect whether I’m visiting the same space twice. Each key of the Map is a String of exactly 24 characters, and each value is a small integer representing the shortest path length to a given graph node (this integer is always less than 11 as far as I can tell, although the goal of the experiment is to find the maximum such shortest path length). However, there are only 6 possible characters. Thus I can represent the string as an integer between 0 and 6^24-1.
Thus my maximum integer is 555,555,555,555,555,555 (base 6), which I calculate to be 4,738,381,338,321,616,895 (base 10).

Question: Which data type should I use for the integer?
Question: Is there some sort of compressed-string object I can use rather than Integer?
Question: Assuming to use Integer, what is the best (fastest) way to convert back and forth between such Integer and String?

The way I know is to use a fold/reduce to convert the String to Integer and accumulate the integer by successive add and multiply by 6. And to convert from Integer to String, by divide/remainder 24 times, computing 24 characters, then concatenating them into a string.

And as a related question, Can I ask the Scala Integer arithmetic to give me an exception on integer overflow rather than just giving me the wrong answer?

tarsa · February 24, 2019, 10:23am

6^24 is smaller than 2^63 so probably you can use Long for storing that number. To convert it to base-6 String use: java.lang.Long.toString(longValue, 6). To convert it from base-6 String use: java.lang.Long.parseLong(stringValue, 6).

Note that converting from Long to String very frequently can be less efficient that e.g. keeping the String representation directly or using byte arrays of size 24 (to keep all digits separately) wrapped in something that provides proper equals, hashCode, toString and other methods.

Use java.lang.Math.addExact and related methods.

jimka · February 24, 2019, 10:36am

@tarsa, thanks for the suggestion, But recall my strings are not strings of digits, the strings look like "WWWWGGGGYYYYBBBBOOOORRRR" … My though was just to encode these as if they were digits, because as a coincidence I have only 6 possible such characters. I doubt java.lang.Long.parseLong will implicitly understand my character encoding.

jimka · February 24, 2019, 10:40am

Here is my current attempt. I.e., doing it more or less from scratch.
Although, it would be great if I could get the quotient and remainder in a single call, rather than having to divide / and then mod %, as that performs the same calculation twice.

  val cubeColors = Array('W','G', 'Y', 'B', 'O', 'R')

  def cubeToInt(cube:Cube):Long = {
    // convert from string to base 6
    cube.foldLeft(0L)((acc:Long,ch:Char)=>(acc * 6 + cubeColors.indexOf(ch)))
  }

  def intToCube(n:Long):Cube = {
    var arr = new Array[Char](24)
    def recure(residue:Long,digit:Int):Unit = {
      if ( digit == -1)
        Unit
      else {
        arr(digit) = cubeColors((residue % 6).toInt) // have to convert to Int because apparently cannot index an array with a Long even it is guaranteed to be ```0 <= it < 6```
        recure(residue/6,digit-1) // residue/6 is unfortunately a duplicate operation as we just called % on the line above
      }
    }
    recure(n,23)
    arr.mkString
  }

tarsa · February 24, 2019, 10:49am

I think it’s not provided in standard Java, C++ or whatever. You could rely on JIT optimizations, or if you don’t want to then you can replace one division with multiplication:

val quotient = dividend / divisor
val remainder = dividend - divisor * quotient

jimka · February 24, 2019, 10:56am

It would be great if there was a /% function which returned a tuple of (quotient, remainder). I remember Knuth also complaining about this in his MMIX series. His argument is that any calculation of one must necessarily also calculate the other, so why not return both to the user. (sigh)

jimka · February 24, 2019, 10:59am

@tarsa, the addExact looks interesting. Can you explain (for a Java illiterate person such as myself) how to use these methods in Scala. I assume there is some special include or syntax.

tarsa · February 24, 2019, 11:06am

There are two problems:

how to return two values from one function?
are they really always computed together? maybe there are CPUs that don’t compute remainder? C++ and Java were designed to be portable.

Java doesn’t provide /% operator so Scala can’t do it without some sort of emulation, i.e. doing the same thing you’re doing, e.g.
def /* (divisor) = (value / divisor, value % divisor)

Adding import java.lang.Math.{addExact, multiplyExact} will allow you to use it in following code, e.g:

val a = b + c * d // overlowing arithmetics with automatic operator precedence
val a = addExact(b, multiplyExact(c, d)) // exception throwing arithmetics with manual operator precedence

Probably you can write your own class (SafeLong?) which would do that exact arithmetics for + and * operators, but custom AnyVals have their problems: Failed Experiments: The High Cost of AnyVal subclasses...

jimka · February 24, 2019, 11:14am

Interesting point that Scala cannot do it if Java doesn’t provide it. However, as I recall from Knuth’s discussion, every CPU which computes / or % also calculates the other. I can go back and try to find his discussion and cite it here, but as I understand mathematically it is not possible to calculate one without the other. As far as returning multiple values, in Scala we have tuples and pattern matching.
Theres some syntax like

  val (q,r) = numerator OP denominator

But as you say, if OP would be forced to do the division twice because of a Java limitation, then you indeed make a good point.

jimka · February 24, 2019, 1:08pm

Unfortunately the conversion of String to Long fails to make the problem computable on my laptop. The program runs for about about 90 minutes, eventually builds a Map with 14,500,000+ entries, and then dies with the following error.

However, without this optimization, if I use the strings as Map keys, the memory overflow happens in 5 to 10 minutes.

I wonder whether there’s a way to decrease the memory size even more? Or do I just need a larger computer?

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at scala.collection.GenSeqLike$$Lambda$8/592179046.get$Lambda(Unknown Source)
	at java.lang.invoke.LambdaForm$DMH/1099983479.invokeStatic_L_L(LambdaForm$DMH)
	at java.lang.invoke.LambdaForm$MH/559450121.linkToTargetMethod(LambdaForm$MH)
	at scala.collection.GenSeqLike.indexOf(GenSeqLike.scala:146)
	at scala.collection.GenSeqLike.indexOf$(GenSeqLike.scala:146)
	at scala.collection.mutable.ArrayOps$ofChar.indexOf(ArrayOps.scala:227)
	at scala.collection.GenSeqLike.indexOf(GenSeqLike.scala:130)
	at scala.collection.GenSeqLike.indexOf$(GenSeqLike.scala:130)
	at scala.collection.mutable.ArrayOps$ofChar.indexOf(ArrayOps.scala:227)
	at rubik.Rubik$.$anonfun$cubeToInt$1(Rubik.scala:43)
	at rubik.Rubik$.$anonfun$cubeToInt$1$adapted(Rubik.scala:43)

jimka · February 24, 2019, 1:12pm

I’m actually trying to compute the state space of the 2x2x2 Rubiks cube. Apparently there are supposedly only 3 million states reachable by rotations. Since I’ve reached 14 million, perhaps I need to search for an error in my program

BTW, I’m converting the program from my primary implementation in Common lisp which computes 3.6 million states in about 80 seconds. There’s either a bug in my translation to Scala, or (and) something I really misunderstand about the Map class.

I’ll look further.

tarsa · February 24, 2019, 1:24pm

I recommend monitoring your Java process with e.g. VisualVM which is included in OracleJDK and OpenJDK. This way you would detect excessive GC activity sooner and not need to wait 90 minutes for slow death of JVM resulting in OoME. During high GC activity you could take a heap snapshot and analyze it then (also in VisualVM, but there are also other tools, like YourKit Java Profiler) to find what takes majority of memory space.

14 million entries doesn’t sound like much unless you set your Java heap to e.g. 1 gigabyte or something like that. If you’re low on memory you can use some specialized collections that avoid boxing of keys and values. This project references many such optimized collections: https://github.com/mikvor/hashmapTest/blob/master/pom.xml
For example: https://www.eclipse.org/collections/javadoc/9.2.0/index.html?org/eclipse/collections/impl/map/mutable/primitive/IntIntHashMap.html
Unfortunately Java doesn’t yet have generics over primitives, so if you write Map[Int, Long] then both Int and Long will be boxed to objects. The optimized collections I’ve linked avoid that by generating many specialized implementations of the same collection type.

jimka · February 24, 2019, 1:35pm

My Mac is about 8 years old, and I have 8GB memory. Does that sound like enough to handle a Map of 14 million Longs, corresponding to 14 million Integers, plus boxing?

tarsa · February 24, 2019, 1:44pm

I tried allocating 14 million entries in mutable.Map[Long, Int] and it took around 1 GiB on my computer on Scala 2.12. I’ve read somewhere that Java by default allocates 1/4 of RAM to itself on start, so that would amount to 2 GiB maximum in your case. In theory that should suffice, but you can try manipulating -Xmx yourself: https://stackoverflow.com/questions/5374455/what-does-java-option-xmx-stand-for

wrbriggs · February 25, 2019, 12:03pm

The JVM by default allocates 1/4 of RAM, or 1 GB, whichever is smaller. You most likely will need to provide values for -Xmx. I also agree with @tarsa that it would be a good idea to attach to your process with VisualVM and see where the time and memory resources are being spent.

jducoeur · February 25, 2019, 1:17pm

On the general point here: Java functions can be imported and used exactly like Scala functions, in general. There are a few edge cases here and there, but plain old import will usually do what you expect, and you call the functions just as if they had been written in Scala. (This is important: Scala leverages the Java standard libraries pretty heavily.)

shawjef3 · February 25, 2019, 4:23pm

Something cool about Scala is that you can create your own operators. I wouldn’t try to redefine +, but certainly you could create a new operator, like +!, which would call addExact.

Operators can be overused but this is a case I’d be for it.

jimka · February 25, 2019, 5:08pm

@shawjef3, That could be very to have +!, -!, *!, and /!.

Could you post what the syntax for that would be, including the correct incantation of includes?

curoli · February 25, 2019, 5:11pm

Your numbers just about fit into Long, which goes up to 9,223,372,036,854,775,807.

You probably want to use a value class that wraps Long. That would be tough to beat in terms of memory-efficiency.

Adding overflow checks to integer operations makes these quite a bit more expensive. Try to avoid or tolerate integer overflow.

shawjef3 · February 25, 2019, 5:27pm

@jimka https://docs.scala-lang.org/overviews/core/implicit-classes.html will get you started.

One annoyance with regular implicit classes, is that each use of a method in an implicit class creates an instance of that class that is thrown away after it returns. If that is something you need to avoid, then with a little more work you can use an implicit value class. See https://docs.scala-lang.org/overviews/core/value-classes.html. I almost always make implicit classes value classes, since it’s hardly any work and can greatly reduce garbage, depending on how much you’re using it.