I work as a data scientist using mostly Python and R. We have begun using Spark for some tasks and I started getting into Scala. I want to move more of my workflow from Python to Scala, specially everything I do in Spark. I found this book Scala for Data Science and want to know if it’s still up to date.
I suspect that because Scala is a relatively new language many things are under active development. I have wanted to buy other Scala books and the reviews always mention that the versions of libraries used are out of date.
I made a gist with the
libraryDependencies used in the book: build.sbt
And you can find the original GitHub repo here.
Given that it was published in January of 2016, I’m guessing that it is rather out of date. I’ve been reading Spark in Action, which came out 10 months later, and they only barely put in the changes for 2.0 at the end of writing. I’ve noticed a few things that are out of date with it, but it is still very usable. Spark has been changing quickly. I’d suggest the most recent references you can get your hands on.
The Spark section is definitely out of date. I checked the build.sbt for the chapters using Spark and it specifies Spark 1.4. I’m not too concerned with Spark because there are a ton of resources to learn Spark (I’m going through Advanced Analytics in Spark atm). My concern is with all the other libraries for which there aren’t many learning resources.
The specific versions they recommend using could be out of date, but my expectation would be that you can learn those versions along with the book and then move on up to the latest version later on. Unless the library is poorly designed or just does silly things between versions, the mental context you build up from working with a slightly outdated version should still be useful in the long run, right?
That is certainly true! How would I use the old versions specified in the book though? As a Python user I would create a conda virtual environment and install the libraries I need, specifying their old versions. Is this done through sbt?
Yeah. SBT abstracts away all of those complexities for you without need for any system-wide tooling. If you declare you want Version A for your current project, sbt will use Version A. And the current project is determined by what working directory you start sbt from.
Some other folder could have a different project that asks for Version B of some dependency. That won’t interfere with your project using Version A.