Incremental cousin of flatMap

jimka · November 27, 2019, 9:00am

If I have a flatMap-compatible function f:A->List[B] and a List[A] I can use flatMap to produce the List[B]. I suppose that this list concatenation is done efficiently. I.e., I hope the flatMap implementation does not keep searching the list for the end on each iteration, but rather somehow keeps a pointer to “almost” the end.

I’d like to build a list in sort of the same way. I have a recursive function which takes two lists, and on each iteration (depending on program logic) either advances one list or the other list or both, and on each iteration I want to “contribute” another (possibly empty) list of elements which get appended.

Currently I’m using accumlatedList ++ listOfNewLements each time. I could rewrite this to just cons up the lists in reverse order and finally do a reverse and flatten. But is there a better way?

What I envision is a cousin of flatMap where the client function both returns the new list to efficiently append, but also returns the List for the next iteration. The iteration would continue until List() is returned as that component. This is more or less the pattern of tail-recursion.

def flatMapDIY[M[_],N[_],A,B](in:M[A],zero:M[A],f:M[A]=>(M[A],N[B])):N[B] = {...}

Here is an implementation for Seq to sort of show what I mean.

    def flatMapDIY[A,B](in:Seq[A],zero:Seq[A],f:M[A]=>(Seq[A],Seq[B])):Seq[B] = {
      def again(in:Seq[A],acc:Seq[B]):Seq[B] = {
        val (next,incremental) = f(in)
        if (next == zero)
          acc ++ incremental
        else
          // I would like to do this as flatMap does,
          // rather than searching for the end of the sequence each time.
          again(next,acc++incremental)
        }
      again(in,Seq[B]())
    }

jimka · November 27, 2019, 9:06am

When I look back at this, it seems M[_] and A are superfluous. It might as well be

def flatMapDIY[M,N[_],B](in:M,zero:M,f:M=>(M,N[B])):N[B] = {...}

So the Seq based function can be reduced to

    def flatMapDIY[M,B](in:M,zero:M,f:M=>(M,Seq[B])):Seq[B] = {
      def again(in:M,acc:Seq[B]):Seq[B] = {
        val (next,incremental) = f(in)
        if (next == zero)
          acc ++ incremental
        else
          // I would like to do this as flatMap does,
          // rather than searching for the end of the sequence each time.
          again(next,acc++incremental)
        }
      again(in,Seq[B]())
    }

crater2150 · November 27, 2019, 9:34am

The flatMap implementation in the standard library relies on mutability to be fast. But as it does so by using a mutable next field in cons, which is private to the scala package, you can not use the same approach, but may be able to use another mutable collection internally and then converting to a list.

Another possibility is to use a data structure, which has fast concatenation, e.g. Chain from Cats, and then converting to a List in the end (which should be linear for most data structures).

jimka · November 27, 2019, 9:36am

Yes, the way this is done in Common Lisp is also through mutable lists also.

Jasper-M · November 27, 2019, 9:40am

You can use a mutable.ListBuffer internally and call .toList in the end. It exploits the same hidden mutability. It’s explicitly intended to be used as a builder for List.

jimka · November 27, 2019, 2:17pm

Here’s what I ended up with. It accumulates the elements that need to be flatMapped together,
and then at the end flatMaps them, just using a call to flatten. I am supposing that one call to flatten is faster than a bunch of calls to ++ on ever-loner sequences

  def flatMapDIY[M,B](in:M,zero:M)(f:M=>(M,Seq[B])):Seq[B] = {
    def again(in:M,acc:List[Seq[B]]):Seq[B] = {
      val (next,incremental) = f(in)
      if (next == zero)
        (incremental::acc).reverse.flatten
      else
        again(next, incremental::acc)
    }
    again(in,List(Seq[B]()))
  }

LannyRipple · December 2, 2019, 10:24pm

If you have a .reverse and .flatten you are going to have .isEmpty which would mean you won’t need to supply zero.

This reminds me a lot of Iterator.unfold (2.13) in which you supply some init state and a function that acts on the state returning Some(item, newState) or None if there is no further processing. The result is an Iterator[Item]. So something like

Iterator.unfold(init){ in =>
  if (in.isEmpty) None else f(in)
}
  .map(_.iterator)
  .flatten
  .toSeq // .toWhatever

where your f is modified to return Option[(IncrementalType, Next)]

jimka · December 3, 2019, 7:51am

The motivating example here was to merge two pre-sorted. On each iteration I look at the head of the two lists. (head1::tail1, head2:;tail2) Whichever head is smaller I collect that one, and advance that list by passing tail to the next iteration. If (head1 < head2) then collect head1 and return (tail1,list2) otherwise collect head2 and return (list1,tail2). Plus some cases for if one list is exhausted before the other. As you pointed out, we need a way to denote that we’re finished iterating. But (Nil,Nil) is not empty. Right? That’s why I use a given zero value.