30%, 70% performance improvements for `flatMap`

I did some serious work on this project last year. I meant to go back and polish it before sharing it, but seeing as that hasn’t happened, I want to share this now.

Implementation & ReadMe: https://github.com/shawjef3/DepthFirst
ReadMe as PDF: https://drive.google.com/file/d/1BF1Ps3XMxM_eMu9IesyH_q38TumHn-PN/view?usp=sharing

Last year found myself thinking about data locality and the operations involved in for syntax. It occurred to me that it might not always be optimal. I wondered if there were times when forcing the evaluation to be depth first, and avoiding intermediate CanBuildFroms, would provide performance benefits. I’m not completely sure those are the reasons this library improves performance, but it was my intent.

I found that for collections >= 1000 elements and deep flatMap call stacks, my evaluation strategy could provide up to 35% performance improvement for Vector, and up to 70% improvement for Stream. The PDF contains some 2d charts of call stack depth vs collection size to give you an idea of how different dimensions improve differently. A 3rd dimension I tried to capture is how performance improvement varied with CPU cache size.

I’m grateful for any suggestions for how to improve this project. I could use some inspiration to keep working on it.