Too difficult to write a function which accepts Array or List

jimka · April 22, 2020, 12:06pm

Something that I still find difficult and confusing in Scala, even after using the language for almost 3 years, is the distinction between Array and other sequences. Yes, I understand that Array is optimized so that it can use the java native arrays, which is excellent for execution speed.

However, it is a reasonable desire that a user wants to write a function which accepts an Array or a List, and within the function, perform operations which are valid for Array and List, such as foldLeft. I don’t think it is reasonable to expect a programmer to be able to implement a type class to allow this.

What is the correct way to write the following 2 functions without code duplication. BTW IntelliJ also warns me that the code is duplicated.

  def makeAdj[A](edges:List[(A,A)]):Map[A,Set[A]] = {
    edges.foldLeft(Map[A,Set[A]]()){
      case (adj,(src,dst)) =>
        adj + (src -> (adj.getOrElse(src,Set())+dst))
    }
  }
  def makeAdj[A](edges:Array[(A,A)]):Map[A,Set[A]] = {
    edges.foldLeft(Map[A,Set[A]]()){
      case (adj,(src,dst)) =>
        adj + (src -> (adj.getOrElse(src,Set[A]())+dst))
    }
  }

I’d like to write the following, but it’s wrong.

  def makeAdj[M[_]<:Foldable[_],A](edges:M[(A,A)]):Map[A,Set[A]] = {
    edges.foldLeft(Map[A,Set[A]]()){
      case (adj,(src,dst)) =>
        adj + (src -> (adj.getOrElse(src,Set())+dst))
    }
  }

Jasper-M · April 22, 2020, 12:30pm

In your case you can solve it fairly easily with an implicit view:

def makeAdj[C, A](edges: C)(implicit asIterable: C => Iterable[(A, A)]): Map[A, Set[A]] = {
  edges.foldLeft(Map[A, Set[A]]()){
    case (adj,(src,dst)) =>
      adj + (src -> (adj.getOrElse(src,Set())+dst))
  }
}

If you have a function like Coll[A] => Coll[B] and you want to abstract over all collections including arrays, you’ll probably have to do something a bit more complicated with an implicit scala.collection.Factory.

jducoeur · April 22, 2020, 12:39pm

Is there a reason not to just use Seq? I mean, while typeclasses solve many problems, there’s nothing wrong with just reaching for a supertype when it works fine, and that seems to work.

Jasper-M · April 22, 2020, 12:43pm

Ah yes, in this case the implicit view will kick in without explicitly asking for it of course. So what I did wasn’t really necessary yet.
Like I said it’s especially when your output collection depends on the input then it starts getting complicated.

jimka · April 22, 2020, 12:46pm

Ahh, Seq is a good suggestion, because my output type does not depend on the input type. This works well. It does not solve the general annoying problem, but as long as the output type is not Array or List it works.

  def makeAdj_6(edges:Seq[(Int,Int)]):Map[Int,Set[Int]] = {
    edges.foldLeft(Map[Int,Set[Int]]()){
      case (adj,(src,dst)) =>
        adj + (src -> (adj.getOrElse(src,Set())+dst))
    }
  }

sangamon · April 22, 2020, 12:48pm

When passing an array, this will create a copy, though, via LowPriorityImplicits2#copyArrayToImmutableIndexedSeq().

jimka · April 22, 2020, 12:55pm

So in Scala 3 can I do the following to prevent the creation of a copy?

  def makeAdj_6(edges:Seq[(Int,Int)] | Array[(Int,Int)]):Map[Int,Set[Int]] = {
    edges.foldLeft(Map[Int,Set[Int]]()){
      case (adj,(src,dst)) =>
        adj + (src -> (adj.getOrElse(src,Set())+dst))
    }
  }

curoli · April 22, 2020, 1:06pm

Wait, does it also copy the Array when I call Seq methods on an Array? That would be really terrible.

Jasper-M · April 22, 2020, 1:09pm

The reason it creates a copy is that Seq is immutable.Seq. If you take collection.Seq for instance the array will just be wrapped into a mutable.ArraySeq without copying.

I doubt the scala 3 union type will work without jumping through a bunch of hoops.

jimka · April 22, 2020, 1:12pm

Alas, no. In Scala 3 we get the error
value foldLeft is not a member of Array[(Int, Int)] | Seq[(Int, Int)]

This seems wrong to me. foldLeft is a member of the union of those two types. Right? See the example.

Jasper-M · April 22, 2020, 1:13pm

foldLeft is a member of ArrayOps, not of Array itself. I think selections on a union type only work if the member you’re selecting is a member of the LUB of the parts of the union type.

jimka · April 22, 2020, 1:14pm

Which type is the LUB of these two types, the error message reports that the method is missing from Array[(Int, Int)] | Seq[(Int, Int)]? Perhaps just the wrong error message.

Jasper-M · April 22, 2020, 1:15pm

AnyRef?

jimka · April 22, 2020, 1:16pm

it seems to me that the LUB is precisely Array[(Int, Int)] | Seq[(Int, Int)]. Isn’t that what it means to be a type lattice? I.e., if A and B are types then their intersection and union are also types. A lattice is closed under union and intersection.

Jasper-M · April 22, 2020, 1:19pm

I think you’re right. But I meant the non-union-type LUB.

sangamon · April 22, 2020, 1:21pm

Not in general. Usually you’ll end up with the more lightweight ArrayOps, ArraySeq,… As @Jasper-M pointed out, the copying happens only if you force an immutable.Seq.

BalmungSan · April 22, 2020, 1:34pm

IMHO abstracting over collections is a failed attempt of the language and a design that never makes sense. Just either ask for a concrete type or ask for an iterable once and use its iterator (again a concrete type)

Or, if you want to write code that can be expressed in terms of for comprehensions and work for any collection. Just ask for a Monad.

jimka · April 22, 2020, 1:54pm

What do you mean by, “never makes sense”? Do you mean it doesn’t make sense in the presence of static typing? Please elaborate your idea.

BalmungSan · April 22, 2020, 2:21pm

I mean what can you do with something that may either be a List or a Vector or a Set or a Map or a infinite lazy Stream?

Better question, what can you sanely do?
Because you can do a lot of things, but all can be or very efficient or very inefficient.

For example:

Get the head (first elements), actually efficient on all. But what does it means to get the first element of an unordered collection like a set?
Get the tail, forgetting about the ordering problem, this is pretty efficient on List and Stream, but it is pretty inefficient on Vector and Array.
Check if it contains an element, straight forward on all, but if you require it multiple times, you really hope you have a Set which provides it as an O(1) operation.
Get the length, efficient for many of those, inefficient for List, will never terminate for the infinite Stream.
etc.

My point is, for most algorithms you write them with an especial kind of structure in mind. And if you want to reason about the complexity of your whole algorithm, you need the complexity of your collection operations (which you can only do with concretes).

And the few algorithms that can be written for multiple collections fallback to three special cases (IMHO):

Algorithms using map & flatMap; i.e. for Monads.
algorithms that basically iterate the collection with an internal state, e.g. foldLeft, traverse. Again for those, it is better to use a typeclass (Foldable / Traverse in this case) instead of trying to model those with subtyping.
Algorithms that are basically data processing pipelines, which you can do for every collection by requiring a TraversableOnce and asking for its iterator; you then chain all the transformations on that iterator (taking advantage on it being lazy) and finally consume the iterator by transforming to a concrete collection again (you may do black magic to ensure that concrete is the same one it was passed to you, but it has always been extremely complex for me, so I never did that).

PS: For the second use case, I would rather use a proper Stream; e.g. fs2, akka streams, monix observables or zio zstreams.

jimka · April 22, 2020, 2:27pm

So do you think that folding is just a one-off special case? To me the point of fold is that it’s an operation which abstracts the implementation away and just lets you compute something iteratively as a function of the content and a starting value.