Repeatable random numbers

jducoeur · December 1, 2023, 3:41am

Can you say anything more about the nature of the inconsistency? Have you tried removing the .par? This sort of problem always comes down to something non-deterministic in the code, so it’s all about eliminating/controlling every possible source of indeterminacy if absolute consistency really matters.

(I would probably wind up inserting a ton of log statements to try to narrow this down myself, but it depends on the nature of the program.)

som-snytt · December 1, 2023, 5:41am

You might consider posting the code, either in a gist or repo, as a warm-up for folks about to start AoC.

Russ · December 4, 2023, 6:43pm

Quick question: If I run the same program on two different machines (with identical inputs), should the hashCodes of the case class instances be the same?

Ichoran · December 4, 2023, 7:28pm

If you use the same bytecode, yes. Case classes use MurmurHash3 (implemented separately in runtime.Statics); nothing depends on the state of the machine. It’s completely deterministic.

If you build it separately on the new machine, I think it should still be the same, but I’m less confident that there’s no way anything nondeterministic could sneak in.

Russ · December 4, 2023, 10:00pm

My program is a simulation that can be run for an arbitrary number of deconflicted aircraft trajectories. The first eight trajectories appear to be identical across the two machines, then the divergence starts. However, the hash codes are different even for the early part that appears repeatable.

som-snytt · December 4, 2023, 10:44pm

Are we absolutely sure it’s not an Advent of Code problem about Santa’s flight path?

case class hashes depend on the hashes of the product elements. Maybe something in the object graph doesn’t have a proper hash in the sense you need.

Ichoran · December 5, 2023, 12:40am

That sounds like it has to be that you’re hashing something that doesn’t have a content-based hash code. You just need to figure out what. If you find a hash code that doesn’t match across the two instantiations of the same case class with the “same” data, it has to be something contained within it. Eventually you’ll find something which is, almost certainly, a plain class instead of a case class, or an Array instead of a Scala collection.

Russ · December 5, 2023, 1:07am

If a case class contains a Boolean, an Int, or a Double, does that have the same effect of making the hash code unrepeatable? I guess I could run a simple test myself, but it’s easier to just ask the experts.

Ichoran · December 5, 2023, 1:21am

If you’re not using strict math, and the default is to not, then Double calculations might vary a tiny bit between machines (especially if you use things like trig functions; usually stuff like multiplication is the exact same algorithm), and the hashcodes will also consequently vary.

Ichoran · December 5, 2023, 1:26am

If you’re doing the math, you can use the strictfp annotation to make sure the JVM doesn’t do anything sketchy: Scala Standard Library 2.13.12 - scala.annotation.strictfp

If you need math that comes from elsewhere, you might be out of luck. The trig functions and so on have a StrictMath implementation in Java. More exotic stuff like inverse hypergeometric function might not, depending on where you get it from.

SethTisue · December 5, 2023, 1:36am

The strict-math thing shouldn’t be an issue on JDK 17+, thanks to JEP 306: Restore Always-Strict Floating-Point Semantics

Russ · December 5, 2023, 2:29am

I don’t even know which version of Java is used with Scala 3.3.1. Is it JDK 17+?

SethTisue · December 5, 2023, 2:49am

Scala 3 works with any JDK from 8 up, as per JDK Compatibility | Scala Documentation

jducoeur · December 5, 2023, 3:02am

Which brings up a corollary question: are you sure that your environment isn’t the source of the inconsistencies? The JVM tends to stay mostly compatible, but if you’re running different JVMs in different places, that could possibly introduce some sort of inconsistency.

Russ · December 5, 2023, 3:10am

I just use whichever JVM is provided with the Scala version I am using, which is currently the latest stable version, 3.3.1 (with the latest stable verion of sbt as well, or close to it). I know next to nothing about which version of the JVM or the Java compiler is used under the hood.

jducoeur · December 5, 2023, 3:26am

While I have to caution that this probably isn’t the cause of your inconsistency, I think you may be making a conceptual mistake here. Some Scala installers happen to install a JVM when you set them up, but there’s no consistency to that, because Scala isn’t generally dependent on a specific JVM. Unless you’re talking about identical machines, configured the same, with the same path structure, and Scala was installed using identical tooling on all of them, there’s no particular reason to believe they’re using the same JVM.

(Indeed, it’s very easy to accidentally use a different JVM than what you intended, depending on how your environment is set up. Scala uses the environment’s JVM – it doesn’t define that JVM unless you specifically install it that way.)

Russ · December 5, 2023, 6:10am

When I run java -version from the bash shell on my local Linux machine, I get

openjdk version “1.8.0_342”
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)

on the remote machine I get

java version “1.8.0_91”
Java™ SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot™ 64-Bit Server VM (build 25.91-b14, mixed mode)

Are these necessarily the versions that are used for my runs based on the Scala 3.3.1 compiler? If not, how would I determine those versions?

Can these version differences explain the inconsistencies that I am getting?

Ichoran · December 5, 2023, 6:30am

Those versions will allow architecture-specific optimizations to give very slightly different floating point results, as I mentioned before, which would change hashcodes of Double values.

Generally, if you’re hashing Double, something is going to break if you ever independently calculate the same value, because you might not actually calculate the same value.

I guess with JDK17+ as long as you run the exact same code it’s supposed to return identical results. But all it takes is two different ways that are mathematically equivalent but not in the face of floating point operations, and your hashcode is borked.

For example, (2.0 / 7.0 + 3.0 / 7.0)*7.0 is not 5.0 on my machine. It’s 4.999999999999999. The hashcode of 5.0 is 5. The hashcode of 4.999999999999999 is -1075052544.

sageserpent-open · December 5, 2023, 6:33am

Reading back in this thread, you cut over to using Vector instead of Set, so hashing shouldn’t be an issue, unless you use it elsewhere.

You also mentioned that in the early stages of the simulation, you got consistency of results between the two environments, despite the hash codes being different.

You are using floating point values, presumably doing arithmetic on them, and you have parallel operations.

In the absence of seeing your code, I wonder if this is an ill-conditioned arithmetic causing the divergence, or possibly you have mutable state being closed over in the parallel operations? For example, the Random instance.

I’d go with @som-snytt ’s advice and publish the code, or a toy model if it is proprietary, there is a lot of second guessing going on here.

Not saying you’ll definitely get help if you do, especially if it’s War and Peace that you’ve written, but you might get lucky.

Do you have unit tests? If so, do you see consistency in each test? If you are only looking at an all-up output then you’ll have a hard time finding the root cause.

Making that toy model might help there too.

(If you are still using hashing, then I’d heed what @Ichoran mentioned too.)

jducoeur · December 5, 2023, 2:38pm

As @Ichoran says, it could possibly explain the differences, although we’re all speculating wildly, not being able to see your code. Hashing entirely aside, keep in mind that doing math with Double involves approximation down in the low digits, which could vary on different versions of the JVM. If you’re iterating through operations on Double, it’s especially dangerous to assume that that will always produce precisely the same results, since those approximations could multiply and diverge more widely. (This apparently changes from JVM 17 on, but you’re showing JVM 8 here.)

We still don’t know much about your problem, or what kind of inconsistency you’re seeing. Are you doing math operations and expecting the results to be identical to an arbitrary number of significant digits? If so, that’s a fairly likely pain point.

(Backing up to a higher-level question, which came up much earlier in the thread: why are you expecting the results to be 100% reproducible? While I strive for that as a goal during testing personally, it’s rarely true in the real world, and tends not to be a realistic objective. Is it actually a requirement? What’s the content that you are outputting, and in what way is it differing from machine to machine?)

To these questions:

Honestly, it’s hard to be certain. It’s pretty likely that those are the versions you are using, and those are completely different implementations of the JVM, so it’s plausible that there are subtle differences in how they work.

Whether they’re actually being used depends on how you’re invoking the program. Remember, Scala is just a language that compiles to the JVM. You’re actually running the program on the JVM. So the question depends on your environment variables – things like PATH and CLASSPATH. (At this level, this really isn’t a Scala question, it’s a JVM one.)