@Russ Have you tried any of my previous advice yet?
The problem with just a single memory histogram, and one where possibly only one of the readership knows either the source code or the point in execution where it was generated is that there isn’t much context.
However, let’s assume there are 750 trajectories in play, and that we are somewhere in the implied loop over trajectory indices being vetted. I’m unsure as to whether all the trajectories are in memory upfront, or are loaded incrementally, or are being whittled down imperatively on the fly, or are being filtered, thus making some new gathering of vetted trajectories.
So several lots of things in the 6 millions, several lots of things in the 13 millions. Presumably we’re looking at clustering of data into instances here, so I’d guess we have Track
possessing an XML element (or maybe derived from one), it has a Position
and a vector of something or other. Or perhaps the vector contains tracks, and more often than not they are just populated with one track? Would the vectors be owned by trajectories, then? I can’t see 6 million Traj
instances, so this feels off.
I can see 3 million RoutePt
, though.
Each track seems to be associated with two lots of XML text, a string and an array of Booleans.
Hopefully I’m on the right track here, ahem.
So if I’m right about a trajectory containing a single vector of tracks, why are there so many of these vectors when there are just 750 trajectories?
If we’re talking real-world data here, would you normally expect so many tracks in relation to the trajectories?
What is in those arrays of Booleans, btw? Could they be replaced by sets of integers, if the arrays are sparsely populated with true values?
Something that @Ichoran mentioned also makes me wonder: you have a lot of cons-cells of ::
. Are you ripping through lists to make some insertion elsewhere than at the front, or are deleting items midway or at the end, or concatenating lists together. If so, are you holding on to the before and after lists? You won’t get much structure sharing with this approach, so if those lists hold on to track instances, that would bloat things up.
At this point, I’m going to bow out and wish you good luck. Hopefully there something here to might give you a eureka moment. There is real satisfaction (and learning) to be gained when successfully debugging some mystery, so enjoy it…
PS: 750 squared is roughly half a million, so while that’s not 3 or 6 million, it might get you somewhere to explaining those numbers if you have some Cartesian product of trajectories going on.
PPS: One last thing to consider; perhaps there is a genuine need for many tracks per trajectory in the original dataset. If so, then you may be doing the right thing conceptually, but are suffering from having a data-heavy problem. You could optimise the footprint, say by replacing arrays with maps and sets if there are a lot of sentinel entries, or else go the route of loading via proxies or storing things in a database. Why do you have to load them all up at the same time? Could you do your overlap checks with just the time intervals instead?