Setting up a worksheet

flibonacci · February 8, 2023, 3:28pm

Hello
I would like to setup a worksheet to check intermediate results in the Scala Capstone project.
I am not sure how to import and access my objects and methods defined in the project into the worksheet.
The project has this structure:

src
- worksheets
  - extraction.worksheet.sc
- main
  - scala
    - observatory
      - common.scala 
      - Extraction.scala
      ...

what statement import would allow me to use methods from the Extraction object in my worksheet?
Thank you

spamegg1 · February 8, 2023, 4:30pm

Welcome to the Scala community @flibonacci
Assuming all the files start with:

package observatory

you need:

import observatory.*

However, to make this work, I had to put the worksheet inside the src/main/scala/observatory folder along with all the other files.

In any case, you might not be able to get much mileage out of the worksheet, because I think that worksheet capabilities are quite limited, especially since Spark configurations are involved (assuming you are using Spark).

It works for basic stuff (like the Location value in the picture) but it probably won’t work well if it has to create a Spark instance and open files etc. (the next one, with the errors in the picture):

Your best option for checking intermediate results is to write tests and put them into the files inside src/test/scala/observatory directory, then run them with the usual sbt test. Here’s an example for src/test/scala/observatory/ExtractionTest.scala (placing .csv files into src/resources):

  // Implement tests for the methods of the `Extraction` object
  @Test def `locateTemperatures`(): Unit = {
    val path: String = "/home/spam/Desktop/ScalaCapstone/observatory/src/main/resources"
    Extraction.locateTemperatures(
      1975,
      path + "/stat.csv",
      path + "/temp.csv"
    ).foreach(println)

    assert(
      Extraction.locateTemperatures(
        1975,
        path + "/stat.csv",
        path + "/temp.csv") ==
      Iterable(
        (LocalDate.of(1975, 12, 6),
          Location(37.358,-78.438),
          0.0: Temperature),
        (LocalDate.of(1975, 1, 29),
          Location(37.358,-78.438),
          2.000000000000001: Temperature)
      )
    )
  }

(it’s been a while since I wrote this, so it might require some work!)

For more fine-grained intermediate results, you’ll probably have to create a small RDD example by hand for test purposes (assuming you are using RDDs) and probably copy/paste portions of your code into the test file, to see intermediate results. This is not going to be easy, and you’ll have to write a lot of test code. Adopt the mentality: test code is normal code, it’s just as serious / important.

Best of luck!

flibonacci · February 9, 2023, 4:05pm

Thank you @spamegg1 for your answer, and for going the extra mile and showing me the test approach. With the import problem solved worksheet complained instantly about creating the spark instance, so I think the approach is indeed limited, and I’d better follow your suggestion.

Thanks!