You should take this course: https://www.coursera.org/learn/scala-spark-big-data
It will save you a huge amount of time, instead of endlessly searching the web, reading documentation, asking questions over and over. (That’s a horrible “learning technique” if it can even be called that.) Instead of tiny bits and pieces you’ll gain serious understanding from the course.
For example, transformations on RDDs like map
, filter
etc. are lazy, they are not evaluated until you call .collect
. This is such a basic, fundamental aspect of RDDs. It is taught immediately in the first week of that course. The fact that you didn’t even know that immediately stood out to me.