Use yield with if

Maia · October 18, 2021, 9:32pm

Dear all,
I’d like to concatenate those two methods into one so that I can get two lists:
readCsvDatabase() and readAllCsvDatabase()
https://scastie.scala-lang.org/bVeesZvaQkWCgLFLeEL02w

the difference is that the first one applies a filtering on a field and the second one reads all the list in the csv file.

Also, I’d like to know if it is better to apply the filtering outside the method.

Thank you,
Best regards

crater2150 · October 19, 2021, 8:45am

So your question is, if you could have one method which can filter the invalid entries, but optionally?

Well, you already have all the required parts, you could write something like

def readAllCsvDatabase(filename: String, keepInvalid: Boolean): List[Data] = {
  ...
    if keepInvalid || values(1).toInt == 1
  } yield ...
}

On another note, it isn’t clear to me, if isValid should be 0 or 1 for a valid entry, the filtering line at the end of the scastie checks for 0, but the filter in the for comprehension checks for 1. If there aren’t multiple types of invalid, that you need to keep, it is probably better to make isValid a Boolean and assign it with values(1).toInt == 1 (or 0, depending on which one is valid).

sangamon · October 19, 2021, 1:06pm

Just adding to @crater2150’s suggestions…

I’d consider applying the filtering at the Data level. This would also provide a better API for passing the predicate as an argument instead of hardwiring a single filter, e.g.

def readCsvDatabase(
    filename: String,
    pred: Data => Boolean = Function.const(true)
): List[Data] =
  ???

…to be called as

readCsvDatabase("data/data.csv")

or

readCsvDatabase("data/data.csv", _.isValid == 1)

I’d also apply conversion to Data and subsequent filtering while still on the iterator, before converting to a collection. This way, in the filtering case you’ll only need to fit the filtered result items in memory rather than the full unfiltered result. Further, as there is no dependency on the result from previous computations, no flatMap/for expression is actually required…

def parseLine(line: String): Data = ???

bufferedSource
  .getLines()
  .drop(1)
  .map(parseLine)
  .filter(pred)
  .toList

Finally, the source should be closed after use - consider Using or similar to ensure this.

Maia · October 19, 2021, 1:51pm

Thank you for the response, it is working.
The last line is just an example if I want to filter outside of the method.

Maia · October 19, 2021, 2:27pm

This is good point as I’d like to apply more filtering based on other/several fields in the future.
My main concern is now how to only update filtered data in the csv file. Here is what I would like to achieve:

Best,

Maia · October 19, 2021, 2:54pm

Here is the update code snippet:

And the data.csv csv

What I’d like to achieve is computing a metric for the valid cases (isValid = 1) only.
I have written a function to compute the metric which takes as parameter the dataID list of valid cases and once the metrics are computed for those cases, I’d like to concatenate all so that I just update those valid cases.

Ideally I’d like to keep the initial dataIDs order.
Sorry if it far from the initial post …

Thank you

Maia · October 19, 2021, 3:44pm

How would you use pred in the method ?

Thanks

sangamon · October 19, 2021, 5:00pm

Please ensure that your Scastie snippets compile. Leaving out context like imports or types/classes used in the code may be easier for you, but certainly not for the reader.

DataList.filter(pred).toList

I’d prefer the variant I pasted above in this thread, though.

Maia · October 19, 2021, 5:34pm

Sorry, here is corrected code:

Thank you, it’s working.

Maia · October 19, 2021, 5:37pm

About the variant you talked above, what would be the function parseLine ? I tried with this one but I got error as it outputs Array[String], not Data type.
def parseLine(line: String): Data = line.split(",").map(_.trim)

sangamon · October 20, 2021, 11:44am

In your original code you are already converting a String line to a Data instance, no? (Although there’s some room for improvement there, as well.) Extract the relevant parts and combine them in this method. What remains to be done to go from Array[String] to Data?