Breaking out of a map or other iteration

jimka · July 7, 2017, 12:13pm

I’d like to know the best way to scala-ize this idiom I do from time to time in Common Lisp. Can someone help suggest the cleanest/most-correct way to do it?

While performing an otherwise functional style iteration, I sometimes arrive upon a condition that allows me to short-circuit the rest of the calculation, and just return a result. E.g., If I’m multiplying a sequence of matrices, using some sort of reduce, and I arrive upon the zero matrix, I’d like to just return zero, and abort the rest of the iteration.

In Common Lisp it looks something like the following

(defun Foo (data)
  (reduce (lambda (total value)
            (if (some-predicate value)
                (RETURN-FROM Foo some-default)
                (combine total value)))
          data))

There is of course a similar way to do this from an arbitrary block, not just returning from the top level function. I can also do a similar type of RETURN-FROM from inside a local recursive function, where I return a particular value from the outer function (by name), which terminates the recursion.

(defun Foo (data)
  (labels ((local-fun (data acc)
             (cond ((null data)
                    acc)
                   ((some-predicate data)
                    (RETURN-FROM Foo some-default))
                   (t
                    (local-fun (cdr data) (combine (car data) acc))))))
    (local-fun data 0.0)))

What is the correct way to do this in Scala? I can think of some complicated ways, such as raise an exception, and catch it with try. But perhaps there’s a less obscure way?

jimka · July 7, 2017, 1:48pm

Actually I read an article some months ago by Odersky where he had a limited but already powerful implementation of 1st class continuations such as call/cc in scheme. I don’t know the status of that, but given call/cc, the application programmer can implement many control flow operators for himself, such as return-from and many others.

MarkCLewis · July 7, 2017, 2:06pm

I’m also interested in hearing what the “correct” way to do this would be. Continuations would definitely be an approach. Depending on how expensive the operation you are checking for, the approach that comes to my head is to do a takeWhile on a view followed by the actual operation you wanted. Without a view, the takeWhile would introduce an extra traversal, but with a view, it should be a single traversal. Granted, if the check you need to perform requires some of the operations in the main traversal, that won’t work.

data.view.takeWhile(d => !somePredicate(d)).foldLeft(zero)((acc, d) => combine(d, acc))

farmdawgnation · July 7, 2017, 3:03pm

This is a really interesting question.

I don’t know that I’ve actually ever needed to do this in Scala, but if I did the two ways that come to mind are to either pre-scan the collection for a disqualifying value or to, as you mention, throw an exception and catch it. The pre-scan is only worthwhile if the reduction I want to run is actually somewhat expensive.

Actually… now that I think about it… you could define some case classes like this…

trait FoldResult[R] {
  def item: R
}
case class FoldResultEnvelope[R](item: R) extends FoldResult[R]
case class FoldEscapeEnvelope[R](item: R) extends FoldResult[R]

… and then use them like so…

def foldWithPredicateEscape[T, R](
  data: Seq[T],
  default: R,
  reduction: (R, T) => R,
  escapePredicate: (T)=>Boolean,
  escapeResult: R
): R = {
  val result = foldLeft(default) { (currentResult, item) =>
    if (currentResult.isInstanceOf[FoldEscapeEnvelope]) {
      currentResult
    } else if (escapePredicate(item)) {
      FoldEscapeEnvelope(escapeResult)
    } else {
      FoldResultEnvelope(
        reduction(currentResult.item, item)
      )
    }
  }

  result.item
}

The downside here is that the loop would continue to execute through the rest of data here, but it wouldn’t do anything so that should be the equivalent of n iterations through a no-op loop.

Does that seem like what you’re going for or did I miss the mark?

julienrf · July 8, 2017, 8:01pm

In Scala the standard collections don’t provide a method for that: you have to write the recursion by yourself:

def foo[A, B](data: List[A], combine: (A, B) => B, p: A => Boolean, total: B, someDefault: B): B =
  data match {
    case Nil => total
    case a :: as =>
      if (p(a)) someDefault
      else foo(as, combine, p, combine(a, total), someDefault)
  }

But, more generally, we could have the following method on collections:

def foldLeftBreak[A, B](as: List[A])(init: B)(op: (A, B) => Either[B, B]): B =
  as match {
    case Nil => init
    case a :: as =>
      op(a, init) match {
        case Left(b) => b
        case Right(b) => foldLeftBreak(as)(b)(op)
      }
  }

The difference with foldLeft is that the op function can either signal to stop the recursion with Left or to continue with Right.

An alternative version where you signal continuation or stop with Some and None is given here.

SethTisue · July 10, 2017, 9:15pm

The standard disquisition on this is https://stackoverflow.com/questions/2742719/how-do-i-break-out-of-a-loop-in-scala

Some of the ground in Rex’s answer has been covered again in this thread, but one solution that hasn’t been mentioned here yet is scala.util.control.Breaks, which encapsulates the throw-based approach.

DavidGregory084 · July 11, 2017, 1:06pm

The origami library provides breakable folds, although I’m not sure if the library is maintained anymore.

Peter · July 11, 2017, 1:47pm

Isn’t the problem here that functional operations like Map and Reduce may
be run in parallel? Scala makes no guarantee the lambda on each element
will be run in sequence. What would it mean to return “in the middle” of a
parallel operation?

MarkCLewis · July 11, 2017, 2:59pm

The semantics of map and reduce are sequential on the main collections (Seq, Set, Map). There are Gen collections (GenSeq, GenSet, GenMap) that break that semantics and both the sequential and parallel collections are subtypes of those. However, you are correct that is a big part of why this type of functionality isn’t generally encouraged in the language.

jimka · July 12, 2017, 9:30am

To break out of a parallel calculation, the break-out even, be it exception or continuation would simply need to kill the threads which are running in parallel. Right?

jimka · July 12, 2017, 9:34am

Another motivation for such a break-out feature is if the programmer needs to iterate over an object which is IterableOnce. In this case, he cannot scan the structure for a “zero” element, and if not found decide to do a fold or map. He is forced to map (fold etc) and simultaneously check for the zero condition. Even if this iteration be done in parallel, the semantics don’t change. At least I don’t see why the semantics would be different.

szeiger · July 12, 2017, 3:02pm

IterableOnce gives you an Iterator. You can stop reading from it at any point. The current Traversable and TraversableOnce abstractions will go away in the 2.13 collection library, so can always get an Iterator (which is already the case today in practice, but it is not enforced in the types).

This goes for parallel collections, too. Reading from an Iterator is naturally done sequentially on the current thread, so it works the same way for parallel collections but it may not be the most efficient way of implementing an algorithm in this case. For example, searching for a specific element can be done in parallel. If you iterate manually instead of calling .find you cannot take advantage of this.

LannyRipple · July 14, 2017, 4:34pm

@annotation.tailrec
def unfold[S,A](init: S, zs: List[A] = List.empty[A])(f: S => Option[(A,S)]): List[A] = {
  f(init) match {
    case Some((a, s)) => unfold(s, a :: zs)(f)
    case None => zs.reverse
  }
}

val vs = List(1,3,5,9,2,4,6,8)
unfold(vs){
  case a :: rest if a % 2 == 1 => Option(a -> rest)
  case _ => None
}

Unfold can handle a lot of, “run for a while and then stop (maybe before completion)”, situations.

Here’s a paper on catamorphisms and anamorphisms (like unfold) – https://ris.utwente.nl/ws/portalfiles/portal/6142049

jimka · January 19, 2020, 9:03pm

Here is my latest attempt to implement non-local exit.

  def block[A](body:(A=>Nothing)=>A):A = {
    class NonLocalExit(val data:A) extends Exception {}
    def ret(data:A):Nothing = {
      throw new NonLocalExit(data)
    }
    try{
      body(ret)
    }
    catch{
      case nonLocalExit: NonLocalExit => nonLocalExit.data
    }
  }

This seems to give me a non-local exit.

The function, findIsomorphism, does a potentially exponential recursive search attempting to derive a isomorphic mapping between two application specific objects. Whenever it reaches the bottom of the recursive search, it might have found such an isomorphism. So test it and return it if it’s valid, otherwise backtrack.

  def findIsomorphism(matrixA: SparseAdjacencyMatrix, matrixB: SparseAdjacencyMatrix, possible: List[(Int, Set[Int])]): Option[ISOMORPHISM] = {

    block { <=== =>
      def recur(iso: ISOMORPHISM, possible: List[(Int, Set[Int])], used: Set[Int]): Option[ISOMORPHISM] = {
        possible match {
          case Nil if isIsomorphism(iso, matrixA, matrixB) => {
            // return from findIsomorphism
            <===(Some(iso))
          }
          case Nil => None
          case (i: Int, js: Set[Int]) :: tail =>
            for {j <- js
                 if !used.contains(j)
                 } recur(iso + (i -> j), tail, used + j)
            None
        }
      }

      recur(Map(), possible, Set())
    }
  }

ashwinbhaskar · January 19, 2020, 9:37pm

Hey, seems interesting. Do you happen to have the link to the article?

jimka · January 19, 2020, 11:41pm

which article?

ashwinbhaskar · January 20, 2020, 12:30am

jducoeur · January 20, 2020, 5:59am

On the one hand, I believe that what you’ve shown will work; OTOH, I suspect it’s inefficient. A common idiomatic maxim of Scala is “Don’t use Exceptions for normal program flow”, and one of the reasons is that Exceptions are slow unless you’re careful about it.

I’m not an expert in this area, but my understanding is that the bulk of the slowness is because constructing the stack trace is quite expensive. Since your algorithm isn’t doing anything with the stack trace, that’s wasted effort. So I think you want to add NoStackTrace if you’re going to pursue this seriously – I believe that will suppress the worst of the overhead…

jimka · January 20, 2020, 7:41am

The status is, as far as I understand, that no one wants to support that implementation of continuations, and it will be dropped.

jimka · January 20, 2020, 7:49am

It would indeed be interesting to know the performance compared to alternatives. However, the argument does not use exceptions for normal program flow, rather for exceptional program flow. The example code performs a tree recursion, and possible on one leaf performs the non-local-exit. My suspicion is this is faster than many alternatives, such as returning some wrapper object (? Either ?) and pattern matching on it on every return from the recursive call.

Certainly in terms of development time and code readability, the block/non-local-exit is easier.

I haven’t tested it yet with iterators such as map, fold, etc.