Spark-Scala RDD, group by count from array of array

You should take this course: https://www.coursera.org/learn/scala-spark-big-data
It will save you a huge amount of time, instead of endlessly searching the web, reading documentation, asking questions over and over. (That’s a horrible “learning technique” if it can even be called that.) Instead of tiny bits and pieces you’ll gain serious understanding from the course.

For example, transformations on RDDs like map, filter etc. are lazy, they are not evaluated until you call .collect. This is such a basic, fundamental aspect of RDDs. It is taught immediately in the first week of that course. The fact that you didn’t even know that immediately stood out to me.

1 Like