How do you choose when to return Option.None vs. an empty sequence?

I’m writing a virus-scanning class. It has a public method that takes a file path (string) and returns a Seq[String], where each string is a description of a virus found.

For some inputs, no viruses will be found. I’m deciding whether to indicate that by:

  • returning an empty sequence, or
  • changing the method’s return type to Option[Seq[String]], so that it can return None in these cases.

In general, what should I consider when making this sort of decision?

A relevant maxim is “make illegal states unrepresentable”. If you go with Option[Seq[String]], that means that Some(<empty sequence>) is unwanted, yet not forbidden by the type. Sticking with a maybe-empty Seq avoids this.

Returning an empty sequence in this case seems fully satisfactory to me. And it’s what most of your users would expect. Going any other route requires justification.

5 Likes

Complementing what Seth said. You may want to return Option[NonEmptyList] which is from cats to signal that it will either found something or anything.

3 Likes

As suggested above, you need to decide what is the meaning of an empty sequence and what to do with this and it’s best to choose something intuitive. In your case it’s trivial as empty sequence of paths intuitively means no viruses found, but let’s assume there’s a different problem, e.g. searching for trading contracts and filtering by currencies:

def findContracts(
  offset: Long,
  limit: Long,
  currenciesOpt: Option[Seq[Currency]]
): Seq[Contract]

in this case:

  • currenciesOpt = None means that filtering by currencies is disabled, i.e. return all types of contracts
  • currenciesOpt = Some(Nil) means to return nothing as no currencies are allowed, but every contract has a currency
  • currenciesOpt = Some(Seq(someCurrency, … /* maybe more currencies */ )) means to return filtered sequence of contracts
1 Like

While I agree with that principle, in this case I might come at it from the opposite direction.

What additional information is conveyed with an option type that isn’t without it?

If you find 2 viruses, you will return a 2-element sequence. If you find 1, you’d return a 1-element Seq. If you find 0, why not return a 0-element Seq?

What does None convey more than Nil in this case?

An additional angle: Which way will make it better for users to use?

What is the use case for explicitly supporting a query variant with a constant empty result? I agree that using plain Seq[T] and interpreting Nil as “filtering is disabled” is not the way to go, but wouldn’t Option[NonEmptyList[T]] (or a custom data type) be a better representation (because it excludes the “no-op” variant)?

To my understanding, @SethTisue advocated precisely this plain Seq variant in this specific case (and I’d second that).

I could imagine something like

def ordersByUser(user: String): Option[List[Order]]

…where

  • None: unknown user
  • Some(Nil): No orders for this user
  • Some(List(...)): Orders

The use case is to simplify clients. If a client is getting a list of currencies from somewhere and then tries to find contracts for that list of currencies then:

  • if you have the API that I’ve described (Option[Seq[T]] as parameter) then you don’t have to worry about empty list of currencies. If you’ve got empty list of currencies then you’ll get empty list of trades. Intuitive and simple on client side.
  • if you try to have Option[NonEmptyList[T]] (how to enforce non-emptiness in json? how many json libraries support that?) or just List[T] (ambiguous / unintuitive) in the API then you have to make some extra conditional code in the client to avoid accidentally violating the NonEmptyList guarantee or querying for all contracts. If you have many clients then all of them have to do that conditional code paths.
1 Like

Fair enough. Circe does support Cats NonEmptyList, it should at least be possible to teach other libs to handle it correctly, and JSON schema covers this, as well. But, yeah, if this kind of constraint is an outlier in the overall pipeline, it’s probably more trouble than it’s worth.