I’m writing a virus-scanning class. It has a public method that takes a file path (string) and returns a Seq[String], where each string is a description of a virus found.
For some inputs, no viruses will be found. I’m deciding whether to indicate that by:
- returning an empty sequence, or
- changing the method’s return type to Option[Seq[String]], so that it can return None in these cases.
In general, what should I consider when making this sort of decision?
A relevant maxim is “make illegal states unrepresentable”. If you go with
Option[Seq[String]], that means that
Some(<empty sequence>) is unwanted, yet not forbidden by the type. Sticking with a maybe-empty
Seq avoids this.
Returning an empty sequence in this case seems fully satisfactory to me. And it’s what most of your users would expect. Going any other route requires justification.
Complementing what Seth said. You may want to return
Option[NonEmptyList] which is from cats to signal that it will either found something or anything.
As suggested above, you need to decide what is the meaning of an empty sequence and what to do with this and it’s best to choose something intuitive. In your case it’s trivial as empty sequence of paths intuitively means no viruses found, but let’s assume there’s a different problem, e.g. searching for trading contracts and filtering by currencies:
in this case:
- currenciesOpt = None means that filtering by currencies is disabled, i.e. return all types of contracts
- currenciesOpt = Some(Nil) means to return nothing as no currencies are allowed, but every contract has a currency
- currenciesOpt = Some(Seq(someCurrency, … /* maybe more currencies */ )) means to return filtered sequence of contracts
While I agree with that principle, in this case I might come at it from the opposite direction.
What additional information is conveyed with an option type that isn’t without it?
If you find 2 viruses, you will return a 2-element sequence. If you find 1, you’d return a 1-element Seq. If you find 0, why not return a 0-element Seq?
What does None convey more than Nil in this case?
An additional angle: Which way will make it better for users to use?
What is the use case for explicitly supporting a query variant with a constant empty result? I agree that using plain
Seq[T] and interpreting
Nil as “filtering is disabled” is not the way to go, but wouldn’t
Option[NonEmptyList[T]] (or a custom data type) be a better representation (because it excludes the “no-op” variant)?
To my understanding, @SethTisue advocated precisely this plain
Seq variant in this specific case (and I’d second that).
I could imagine something like
def ordersByUser(user: String): Option[List[Order]]
None: unknown user
Some(Nil): No orders for this user
The use case is to simplify clients. If a client is getting a list of currencies from somewhere and then tries to find contracts for that list of currencies then:
- if you have the API that I’ve described (Option[Seq[T]] as parameter) then you don’t have to worry about empty list of currencies. If you’ve got empty list of currencies then you’ll get empty list of trades. Intuitive and simple on client side.
- if you try to have
Option[NonEmptyList[T]] (how to enforce non-emptiness in json? how many json libraries support that?) or just
List[T] (ambiguous / unintuitive) in the API then you have to make some extra conditional code in the client to avoid accidentally violating the NonEmptyList guarantee or querying for all contracts. If you have many clients then all of them have to do that conditional code paths.
Fair enough. Circe does support Cats
NonEmptyList, it should at least be possible to teach other libs to handle it correctly, and JSON schema covers this, as well. But, yeah, if this kind of constraint is an outlier in the overall pipeline, it’s probably more trouble than it’s worth.