Why am I running out of heap space?

I do not understand why I am running out of heap space.

I read in 720 assigned aircraft trajectories from an XML file that is around 8 MB. No problem.

Now I want to step through and check each trajectory for conflicts with every other trajectory that preceeded it. (Actually it is a bit more complicated than that, but I don’t think it matters here. It is possible that a flight could get reassigned a new trajectory, and I only want to check for conflicts with the latest one.)

The assigned trajectories are stored in a Vector[Traj]. The Traj class is an immutable case class by the way. I step through the index numbers of the Vector and call a function for each check, passing the Vector and the index number. The first check is for time overlaps. If trajectories do not overlap in time, there can be no conflict, so those are filtered out. The number of trajectories that overlap in time in each case varies from around 70 to 90 or so.

It runs fast and works fine – as long as the number of trajectories is less than about 350. When I get to index 350 or so of the 720 trajectories, it bogs down hard and runs out of heap space. I’ve tried several variations, even calling System.gc() directly, but nothing seems to work. I don’t understand why the heap size continues to grow. What is the best way to deal with a problem like this? Is there something that I should look for?

I am running on Red Hat Linux with
export JAVA_OPTS=“-Xmx16g -Xms16g”

I think the first way to narrow this down is just to figure out what is accumulating. You can do this with the jmap command-line tool, among other things. If you use external tools rather than VisualVM or somesuch, you might need to add some explicit delays as you work through trajectories to give you time to manually run jmap on the process.

It seems pretty obvious that something is keeping a hold of a lot of old copies of something, but without knowing what it’s hard to give advice. Maybe some non-tail-recursive handling of Vector that is keeping track of every modification ever made? But even that shouldn’t be exhausted after only 350 items; that feels like cubic scaling. (Note–you can test the scaling by dropping down to 8GB of memory and see where the slowdowns start.)

2 Likes

@Russ

Some random musings…

  1. A great way of running out of heap space is to park things in a mutable cache of static lifetime, ie. in a field in a Scala object and never remove the entries. Could this be happening? A variation is to have a cache that is local to a long-running algorithm….
  2. Even better if the things being cached are closures that pull in lots of other state, sometimes without the knowledge of the author.
  3. Even nice pure functional algorithms, or ones that use local mutable state, can chew up lots of memory by working with two- or three-dimensional mappings where some coordinate maps to an associated entry. If the mapping isn’t sparse, or something like an array (or array of arrays etc) is used, then this can use a lot of space. I was bitten by that one recently working on a variant of the longest common subsequence algorithm. You may need to restructure things to avoid this, which is what I did.
  4. If you use VisualVM, you can set a conditional breakpoint for the some trajectory index evaluation, say the 200th, and use VisualVM to produce a memory dump for the stopped process, including the object counts by class. Do this for various numbers of trajectories >= 200 and plot the results in Excel, OpenOffice, Jupyter or whatever with regression lines. You’ll spot a gang of offending classes, and that usually leads to a ringleader class whose instances aren’t being reclaimed. Don’t forget to configure the breakpoint to only suspend the current thread, else VisualVM will hang. I’m sure jmap can be used for this too, as per what @Ichoran stated.
  5. As an aside, once you’ve sorted out your memory issue, consider using an interval tree or some other data structure to do your overlap checking. There are several out there to choose from. I tend to end up using the Scala fingertree written by that Sciss chap, the API and comments are a bit quirky, but it works really well. It will save you doing a quadratic comparison of each trajectory with another, assuming that is what you’re alluding to in the original post. You may be doing this already, though…
3 Likes

Thanks for the replies. I tried jmap -histo and got this:


num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:     129012267     3096294408  java.lang.Double (java.base@21)
   2:      31608903     1264356120  scala.Tuple2$mcDD$sp
   3:      11067053      833175304  [Ljava.lang.Object; (java.base@21)
   4:      22127325      531055800  scala.Tuple2
   5:      13044632      521413984  [B (java.base@21)
   6:      19487019      467688456  scala.collection.immutable.$colon$colon
   7:      13044355      313064520  java.lang.String (java.base@21)
   8:       6496899      259875960  scala.xml.Elem
   9:       3160995      227591640  trajspec.RoutePt
  10:      13014855      208237680  scala.xml.Text
  11:       6489743      207671776  trajspec.Position
  12:       6469991      207039712  trajspec.Track
  13:       6327742      101243872  scala.collection.immutable.Vector1
  14:        157701       21426968  [[Ljava.lang.Object; (java.base@21)
  15:         12334       11762304  Ljdk.internal.vm.FillerArray; (java.base@21)

Not sure what to make of it. Any ideas?

Any clue about what this means?

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects

Not much to go on there. You can tell the JVM to output a heap dump (hprof file) on out of memory error with a couple of args. You can use that file with something like visual VM which can allow you to dig into the what objects are hanging onto elements.

Some general thoughts:

  • be care with immutable collections and functions like map/flatMap with large amounts of data, these functions produce new collections. You may want to look into iterators instead for your data processing
1 Like

Well, that indicates that you are probably doing a lot of very heavy Vector manipulation. The only reason why Vector is as fast as it is is that it optimizes away a lot of the little fiddly allocations–each Vector has an indexing structure that it builds to help with sequential access. I would guess that these work for a bit, then need to be put back on the heap and are the source of the “scalar replaced objects”.

You have an absolutely staggering number of boxed doubles, which are in a huge number of object arrays (seemingly), which are at least half are in Vector1 which is part of the internals of Vector.

So, part of your calculation is apparently repeatedly updating vectors without releasing the old ones. Because of all the ::s (parts of a list), I would look for something that looks like

list = updatedPositions(v1, v2) :: list

where updatedPositions returns a 2-tuple of vectors that have been altered in only one spot.

Something like that would explain the results you see–tons of crunching on vectors, but keeping them all around anyway.

Can you get another memory map from very early in the calculation, after you’ve loaded all the data but before you actually do the trajectory processing?

It might be helpful to see what is growing.

2 Likes

@Russ Have you tried any of my previous advice yet?

The problem with just a single memory histogram, and one where possibly only one of the readership knows either the source code or the point in execution where it was generated is that there isn’t much context.

However, let’s assume there are 750 trajectories in play, and that we are somewhere in the implied loop over trajectory indices being vetted. I’m unsure as to whether all the trajectories are in memory upfront, or are loaded incrementally, or are being whittled down imperatively on the fly, or are being filtered, thus making some new gathering of vetted trajectories.

So several lots of things in the 6 millions, several lots of things in the 13 millions. Presumably we’re looking at clustering of data into instances here, so I’d guess we have Track possessing an XML element (or maybe derived from one), it has a Position and a vector of something or other. Or perhaps the vector contains tracks, and more often than not they are just populated with one track? Would the vectors be owned by trajectories, then? I can’t see 6 million Traj instances, so this feels off.

I can see 3 million RoutePt, though.

Each track seems to be associated with two lots of XML text, a string and an array of Booleans.

Hopefully I’m on the right track here, ahem.

So if I’m right about a trajectory containing a single vector of tracks, why are there so many of these vectors when there are just 750 trajectories?

If we’re talking real-world data here, would you normally expect so many tracks in relation to the trajectories?

What is in those arrays of Booleans, btw? Could they be replaced by sets of integers, if the arrays are sparsely populated with true values?

Something that @Ichoran mentioned also makes me wonder: you have a lot of cons-cells of ::. Are you ripping through lists to make some insertion elsewhere than at the front, or are deleting items midway or at the end, or concatenating lists together. If so, are you holding on to the before and after lists? You won’t get much structure sharing with this approach, so if those lists hold on to track instances, that would bloat things up.

At this point, I’m going to bow out and wish you good luck. Hopefully there something here to might give you a eureka moment. There is real satisfaction (and learning) to be gained when successfully debugging some mystery, so enjoy it…

PS: 750 squared is roughly half a million, so while that’s not 3 or 6 million, it might get you somewhere to explaining those numbers if you have some Cartesian product of trajectories going on.

PPS: One last thing to consider; perhaps there is a genuine need for many tracks per trajectory in the original dataset. If so, then you may be doing the right thing conceptually, but are suffering from having a data-heavy problem. You could optimise the footprint, say by replacing arrays with maps and sets if there are a lot of sentinel entries, or else go the route of loading via proxies or storing things in a database. Why do you have to load them all up at the same time? Could you do your overlap checks with just the time intervals instead?

2 Likes

@Russ Have you tried any of my previous advice yet?

I downloaded visualvm and figured out how to use it. I ran my program to the point of starting to bog down and did a heap dump. It took around an hour to run, but it provided no new information beyond what I got from jmap. I then took the next step, and it sat there for over an hour at “computing GC roots 0%”. I went home from work, and I have not yet returned to find out if it will provide any useful information. I guess I should do it again but start earlier in the run, before the memory is bogged down.

I also tried to look for the kind of things you mentioned. I sometimes use a var in a companion object to accumulate performance data, for example, but those are all deactivated at this point as far as I can tell. It would be nice if there were some way to identify things like that.

I really appreciate all the suggestions I am getting here, but I need more time reread the posts and think about it. Unfortunately, I am being forced to focus on other work now, so this is more or less a background job. It is not a “show-stopper” for me now, but it is like a cancer in my code that I need to root out.

OK, thanks again for your time, and here are some answers to your questions:

However, let’s assume there are 750 trajectories in play, and that we are somewhere in the implied loop over trajectory indices being vetted. I’m unsure as to whether all the trajectories are in memory upfront, or are loaded incrementally, or are being whittled down imperatively on the fly, or are being filtered, thus making some new gathering of vetted trajectories.

This little script is actually a redundant check for conflicts among trajectory assignments that were generated by another program. The other program dumps the assigned trajectories (which include dynamic tolerances) into an XML file, and this script reads them as the first step. So in this case, the script starts by reading a Vector[Traj] of length 720 (where Traj is an immutable case class). The check for conflicts is a fairly complicated algorithm that has to account for dynamic tolerances, but that should not matter here.

Each trajectory in the Vector[Traj] is supposed to be free of conflicts with all preceeding trajectories. So I step through the indices of the Vector, and for each index, I call the “take(i)” method to take the first i trajectories. I then check the last trajectory in that subset for conflicts against all previous trajectories. Now my understanding is that the “take” method does not create a new Vector but rather just creates a “window” on the first n elements. When I am done checking for that index, it should be garbage collected – but even if it is not collected and removed, it should require little if any additional memory because it is shared with the original Vector[Traj].

So several lots of things in the 6 millions, several lots of things in the 13 millions. Presumably we’re looking at clustering of data into instances here, so I’d guess we have Track possessing an XML element (or maybe derived from one), it has a Position and a vector of something or other. Or perhaps the vector contains tracks, and more often than not they are just populated with one track? Would the vectors be owned by trajectories, then? I can’t see 6 million Traj instances, so this feels off.

Yes, 6 million Traj instances would be totally bonkers. There are 720 of them in this run. But each Traj instance is composed of several other types, including a Route, a RefTraj, and dynamic tolerances around the reference trajectory in 4D space. The RefTraj class contains a Vector[Track], where a Track is a time-tagged position in 3D space.

Each track seems to be associated with two lots of XML text, a string and an array of Booleans.

The only XML is due to reading the stored Traj objects from a file at the start of the script, as explained earlier. As for an array of Booleans, I have no idea where that came from.

Something that @Ichoran mentioned also makes me wonder: you have a lot of cons-cells of ::. Are you ripping through lists to make some insertion elsewhere than at the front, or are deleting items midway or at the end, or concatenating lists together. If so, are you holding on to the before and after lists? You won’t get much structure sharing with this approach, so if those lists hold on to track instances, that would bloat things up.

The whole process of constructing the trajectories involves concatenating segments (e.g., climb, cruise, descent), but that is not done in this particular script, and once a trajectory is complete there should not be much more of that.

By the way, the conflict checks are very fast (several per second) until it gets to around 320, then they slow down, and by 350 they come to a grinding halt.

Is all that clear enough?

@Russ

Good morning from the (surprisingly) sunny North of England.

I should have made it clear that my questions were rhetorical, so no pressure to answer, although there is definitely some interesting information there, so apologies and a thank you.

I’m sure @Ichoran and @BalmungSan will be on this like hawks… :grin:

There are several things that come to mind here. They might have a bearing on your memory issue, but are worth considering anyway. Yet another navigational pun, sorry.

Amongst all the other advice and questions you got, I would be thinking about the 6 million Track instances in relation to the 720 trajectories: is this a reasonable proportion in your opinion? What happened to those mythical beasts, the RefTraj instances? I didn’t see them in the memory dump.

From what I’ve seen, I’m at a loss to diagnose whether you have a ‘leak’ - global or local, a transient over-allocation of objects, a systematic superlinear relation of smaller objects to bigger objects or simply a large data set.

Ponder on, experiment and enjoy the bug hunting.

1 Like

If that’s a reference to the [B in the memory dump, that’s actually a byte array.

1 Like

I’m very eager to see if you discover the route cause.

1 Like

Ah - ‘B’ is for byte, not bool. My fault for leading everyone down the wrong track.

Honest I wasn’t just foolean’ around.

1 Like

Then that is probably where you should start looking, because you clearly do have over a hundred million Double objects, so there is some mismatch between what you intend and what is actually happening.

It’s not the Traj themselves that are taking all the space. It’s the internal routes within the Traj entries. I also think you might be loading the XML file more than once? It seems very odd that you could have 6M XML Elems in only an 8 MB file–that’s almost one per character.

Anyway, somehow or other you’re holding on to a lot of data. Either you’re doing something like

def anyCollisions(trajs: Vector[Traj], i: Int): Boolean =
  if i >= traj.lengths then false
  else oneCollision(trajs.take(i-1), trajs(i)) || anyCollisions(trajs, i+1)

then you can be accumulating recursively all the sub-vectors which aren’t freed until the whole thing completes.

Now, even that isn’t all that bad–it’s still under a million. But if you manage to do it one more time in compareOne or something like that, then you get into the realm of the numbers you are seeing.

If you cannot analyze your algorithm in your head to figure out where something like this might be occurring, I recommend the following.

(1) Stop using Vector. For Double, use Array[Double]. For lists of objects, like Vector[Traj], use List.

(2) Instead of slicing up Vector, which has so-so structural sharing, walk through your data by maintaining two lists: a seen list and a pending list. Then your algorithm looks like

def collideAny(seen: List[Traj], pending: List[Traj]): Boolean =
  pending match
    case Nil => None
    case t :: rest =>
      if collideOne(seen, t) then true
      else collideAny(t :: seen, rest)

By using a pair of lists for the prefix/suffix, you’re guaranteed to have maximum structural sharing. And I bet you’re using a similar strategy for points on a route, and I bet that’s where the real problem is.

Alternatively, keep the vectors but don’t keep rebuilding them. Index into them instead. They’re okay for indexing–much better at indexing than rebuilding.

The reason it’s fast up to 320 or whatever is that it’s fast to consume all the memory on your machine. You can fill the heap in a matter of seconds, if you’re just creating a slew of objects and not releasing them.

2 Likes

You could also use vector.view.take(n) if you truely just need a view.

2 Likes

I wrote a simplified version of the code that is causing the memory leak. (Yes, I should have done this before my original post. Sorry) I still do not understand how this code can have a memory leak.

Before I get to that, let me also correct something that I wrote earlier. The XML file that I read the 720 trajectories from is around 300 MB, not 8 MB as I said.

OK, here is a simplified version of the function with the memory leak:

def conflictsByPair(trajList: Vector[Traj]): Vector[Encounter] =
    // find conflicts by pair for a list of assigned trajectories
    println(s"\nseparation tests started for ${trajList.length} assignments")

    var conflicts = Vector[Encounter]()

    for ind <- trajList.indices.drop(1) do

      print(s"\r   processing index $ind ")
      val traj1 = trajList(ind)

      var ind2 = ind
      while ind2 > 0 do
        ind2 -= 1
        val traj2 = trajList(ind2)
        val enc = Encounter(traj1, traj2)
        if enc.isaConflict then conflicts :+= enc

    println("\nseparation tests ended")
    conflicts

As I said, the Traj class is an immutable case class. Ditto for the Encounter class, which takes a pair of Traj instances and checks for conflict (loss of minimum required separation). As I said before, the check for conflict is a fairly complicated algorithm that has to account for dynamic tolerances defined in Traj, but I don’t think that should matter.

I am still baffled as to how this simple function can have a memory leak that crashes the program with a heap overrun. The Encounter that is generated in the loop should be garbage collected, shouldn’t it?

By the way, I know from running this code on the same data on another computer that there are no conflicts in this set of trajectories, so it is not the number of conflicts found that is overrunning heap memory.

What can cause this kind of memory leak? Global variables? Mutable classes? What should I look for in my code?

Looking at that code there isn’t an immediately obvious memory leak. A few things you could look at:

  • Are there any unnecessary lazy vals on the Traj class which might be hanging around
  • Is there any global mutable state? such as caching etc

Happy hunting!

1 Like

Again, it’s the assumptions that you aren’t checking that are the most likely culprits. If that is true, then you should be able to replace

        if enc.isaConflict then conflicts :+= enc

with

        if enc.isaConflict then throw new Exception("But there are no conflicts!")

and it should run exactly the same. If there are some conflicts, you’d better count how many.

Assuming that you do not, in fact, store any of the conflicts, then the most likely possibility is that you’re somehow keeping global information about encounters every time you create an encounter.

Alternatively, you may have such an inefficient algorithm for checking individual pairs of trajectories (again, the “but I don’t think this matters” assumption always needs to be suspect when things aren’t working sensibly) that just by accident you run into the problem at around 300 where you, say, get one of a particularly long set of trajectories so the further comparisons are slow; and then you hit another big one and checking those against each other is something this machine can’t fit in memory but another one can.

Finally, sequentially appending to a Vector in a var is really not what you should be doing unless you need to manipulate and share the vector on each iteration. You don’t in the example code, at least, so that could just be Vector.newBuilder and you’d .result it at the end after += to add any collisions you find.

4 Likes

Thanks for feedback, Ichoran. Unfortunately, the security protocols in my organization make it difficult to run on that particular computer from home, so I can’t do much testing on the weekend, but I will try your suggestion when I can.

Assuming that you do not, in fact, store any of the conflicts, then the most likely possibility is that you’re somehow keeping global information about encounters every time you create an encounter.

This is what I am trying to figure out. What exactly should I be looking for? I do have some variables defined in the class companion objects that I use for recording performance data. For example, I can record the time it takes to calculate a certain result each time, which I store it in a Vector var in the companion object. However, I also have a switch for activating or deactivating that storage, and I am running with that switch turned off. But the reference is still there, so could that possibly be the problem? I guess I would have to comment out the reference to find out.

Finally, sequentially appending to a Vector in a var is really not what you should be doing unless you need to manipulate and share the vector on each iteration. You don’t in the example code, at least, so that could just be Vector.newBuilder and you’d .result it at the end after += to add any collisions you find.

Thanks for that feedback, but why? Is it inefficient? I was using a yield, but I changed it to :+= just to eliminate the remote possibility that yield was causing the memory leak. In any case, thanks for bringing Vector.newBuilder to my attention.