Scala on a Cluster

  1. I am aware Scala Akka exists. There might be an Akka library that solves this problem. I have not found it, but would appreciate pointers if it existed.

  2. Here is the problem: I want to run JVM on a cluster (say a bunch of AWS EC2).

  3. We can assume that all the machines have the same CPU / Memory (same EC2 type).

  4. We can assume that all the machines are in the same datacenter + vpn. (So network latency between any pair is approximately equal).

How can we program this model ?

Right now, the best I have is to fire up a JVM on each machine, have them send messages to each other.

I was wondering if there was some JVM library that tries to abstract this about a bit, where instead of having to think about N JVM’s, I have one “virtual JVM” that has N “nodes” attached to it, and when I allocate memory, I specify which “node” the memory is allocated on. When I fire off threads, I specify which “node” it is fired off on.

The point here is that instead of reasoning abut N JVM’s, I would prefer to reason about 1 JVM which has certain memory/compute constraints.

EDIT: map-reduce is not the answer; the application is the server side of a real-time server; so I need things that can do < 100ms latency

What exactly do you want to do on that cluster?

You have a big problem that you want to parallelize / distribute? Or you only want that bunch of machines to act like a single unit of processing that accepts http petition?

Imagine we are doing the server side of a gigantic Quake deathmatch that we can not fit on a single machine.

Conceptually, this is “one problem”, and I would prefer to reason abut 1 JVM instead of N JVM’s.

However, because we can not fit all on a single machine, we have to fire up N JVM’s.

So the world is split into shards / grid. For the most part, each JVM acts independent of each other and is responsible for a few shards / grid units. Occasionally, a user goes from one shard to another (or a bullet / rocket flies from one to another).


I am looking for a library that allows me to think of the server as “one program running on multiple JVMs” rather than “N JVMs each running it’s own program.”

I doubt that you want to completely abstract over the multiple nodes. If you want low latency you probably want to have strict control over where in the code you potentially cross node boundaries.


Could you explain a bit more on why being able to control which node memory is allocated / which node threads run on is insufficient ?

What is the use case in mind that requires more control (and how can you get more control than controlling the node where memory is allocated / threads run).