Use yield with if

Dear all,
I’d like to concatenate those two methods into one so that I can get two lists:
readCsvDatabase() and readAllCsvDatabase()
https://scastie.scala-lang.org/bVeesZvaQkWCgLFLeEL02w

the difference is that the first one applies a filtering on a field and the second one reads all the list in the csv file.

Also, I’d like to know if it is better to apply the filtering outside the method.

Thank you,
Best regards

So your question is, if you could have one method which can filter the invalid entries, but optionally?

Well, you already have all the required parts, you could write something like

def readAllCsvDatabase(filename: String, keepInvalid: Boolean): List[Data] = {
  ...
    if keepInvalid || values(1).toInt == 1
  } yield ...
}

On another note, it isn’t clear to me, if isValid should be 0 or 1 for a valid entry, the filtering line at the end of the scastie checks for 0, but the filter in the for comprehension checks for 1. If there aren’t multiple types of invalid, that you need to keep, it is probably better to make isValid a Boolean and assign it with values(1).toInt == 1 (or 0, depending on which one is valid).

2 Likes

Just adding to @crater2150’s suggestions…

I’d consider applying the filtering at the Data level. This would also provide a better API for passing the predicate as an argument instead of hardwiring a single filter, e.g.

def readCsvDatabase(
    filename: String,
    pred: Data => Boolean = Function.const(true)
): List[Data] =
  ???

…to be called as

readCsvDatabase("data/data.csv")

or

readCsvDatabase("data/data.csv", _.isValid == 1)

I’d also apply conversion to Data and subsequent filtering while still on the iterator, before converting to a collection. This way, in the filtering case you’ll only need to fit the filtered result items in memory rather than the full unfiltered result. Further, as there is no dependency on the result from previous computations, no flatMap/for expression is actually required…

def parseLine(line: String): Data = ???

bufferedSource
  .getLines()
  .drop(1)
  .map(parseLine)
  .filter(pred)
  .toList

Finally, the source should be closed after use - consider Using or similar to ensure this.

2 Likes

Thank you for the response, it is working.
The last line is just an example if I want to filter outside of the method.

This is good point as I’d like to apply more filtering based on other/several fields in the future.
My main concern is now how to only update filtered data in the csv file. Here is what I would like to achieve:

Best,

Here is the update code snippet:

And the data.csvcsv

What I’d like to achieve is computing a metric for the valid cases (isValid = 1) only.
I have written a function to compute the metric which takes as parameter the dataID list of valid cases and once the metrics are computed for those cases, I’d like to concatenate all so that I just update those valid cases.

Ideally I’d like to keep the initial dataIDs order.
Sorry if it far from the initial post …

Thank you

How would you use pred in the method ?

Thanks

Please ensure that your Scastie snippets compile. Leaving out context like imports or types/classes used in the code may be easier for you, but certainly not for the reader.

DataList.filter(pred).toList

I’d prefer the variant I pasted above in this thread, though.

1 Like

Sorry, here is corrected code:

Thank you, it’s working.

About the variant you talked above, what would be the function parseLine ? I tried with this one but I got error as it outputs Array[String], not Data type.
def parseLine(line: String): Data = line.split(",").map(_.trim)

In your original code you are already converting a String line to a Data instance, no? (Although there’s some room for improvement there, as well.) Extract the relevant parts and combine them in this method. What remains to be done to go from Array[String] to Data?