Thanks a lot @hmf && @siddhartha-gadgil - much appreciated.
I am definitely coming to the realization that Python is a lot better at ML, especially at the rapid prototyping and PoC than anything on the JVM. A shame, really.
I do wonder what are the canonical “production” API/libraries that are used for these tasks - is it still Python, but compiled perhaps? Or is it Spark ML?
Thanks again!