Escaping from search

jimka · June 17, 2020, 10:31am

I’ve asked this same question several times as I’ve been learning Scala. Each time I feel I have a better handle on the situation.
There are several iteration facilities in Scala which traverse a collection and generate some result. Sometime the program logic determines that it is finished and has computed the desired result. Thereafter, it is a challenge to return that value.

Here is something I’ve sort of converged on using

  def block[A](body:(A=>Nothing)=>A):A = {
    // CL-like block/return-from-block, the name of the return() function is provided
    //  by the caller.
    //  Usage:  block{ ret =>  ... ret(someValue) ...}

    // extending Exception with NoStackTrace prevents throwing the
    // exception from computing the stacktrace and storing the information.
    // We don't need a stacktrace because the purpose of this exception
    // is simply to perform a non-local exit.
    import scala.util.control.NoStackTrace

    class NonLocalExit(val data:A,val ident:(A=>Nothing)=>A) extends Exception with NoStackTrace {}

    def ret(data:A):Nothing = {
      throw new NonLocalExit(data,body)
    }
    try{
      body(ret)
    }
    catch{
      case nonLocalExit: NonLocalExit if nonLocalExit.ident eq body => nonLocalExit.data
    }
  }

The challenge is to make it reentrant, so that escaping from a block always escapes from the correct block, regardless of the state of recursion of various functions.

Recently I looked into the implementation of find in the scala library, and discovered there ALREADY IS a class scala.util.control.Breaks to implement this behavior. But I don’t know if it is safe to use.

  def find(p: A => Boolean): Option[A] = {
    var result: Option[A] = None
    breakable {
      for (x <- this)
        if (p(x)) { result = Some(x); break }
    }
    result
  }

I like it because it is already build-in-- I could theoretically just use it.
However, I don’t like the fact that the break keyword is fixed and not a function, and that using it seems imperative, I can only envision using it for side effect or at least with mutable variables.

In my implementation, an argument of block is a function to call with the value you want returned from block. This makes block nestable and any nested level can return from any block, and its never ambiguous.

block{ 
  break_1 =>
    block{
       break_2 =>
       ... some code...
       ... if something break_1(my_return_value_1)
       ... some code
       ... if something break_2(my_return_value_2)
       ... some code
    }
}

Thus the find implementation above would be simplfied:

  def find(p: A => Boolean): Option[A] = {
    block{ found =>
      for (x <- this)
        if (p(x)) { found(Some(x) }
      None
    }
  }

jimka · June 17, 2020, 11:27am

Here is a case in point. The line case Some(_) => acc is there simply so that after we compute the result, foldLeft can finish its iteration without computing anything else. I.e., as soon as growByOne returns a non-empty Option, then we still need to finish the foldLeft loop and just drag that Option along untouched.

  // Given a set of cards, and a target size between 3 and 18 inclusive,
  // find a subset of the given size, consisting of cards which contain no
  // Triptych.
  def findCap(cards: Set[Card], targetSize: Int): Option[Set[Card]] = {

    def growByOne(cap: List[Card], deck: Set[Card]): Option[Set[Card]] = {
      if (deck.isEmpty)
        None
      else if (cap.length == targetSize)
        Some(cap.toSet)
      else {
        deck.foldLeft(None: Option[Set[Card]]) { (acc, c) =>
          acc match {
            case Some(_) => acc
            case None => growByOne(c :: cap, findCapExtensions(c, cap, deck - c))
          }
        }
      }
    }

    val initialCap = List(Set("red", "oval", "one", "solid"),
                          Set("red", "oval", "two", "solid"),
                          Set("green", "oval", "one", "solid"))

    val deck = cards diff initialCap.toSet
    growByOne( initialCap, findCapExtensions(initialCap, deck))
  }

I’m using foldLeft basically for the control structure, which makes the code somewhat obfuscated, in my opinion. What I really want to do here is: when (cap.length == targetSize) then return Some(cat.toSet) from findCap.

  def findCap(cards: Set[Card], targetSize: Int): Option[Set[Card]] = {

    def growByOne(cap: List[Card], deck: Set[Card]): Option[Set[Card]] = {
      if (deck.isEmpty)
        None
      else if (cap.length == targetSize)
        RETURN_FROM_findCap(Some(cap.toSet))  // FOUND THE ANSWER, RETURN IT
      else {
        deck.map{ c =>
            growByOne(c :: cap, findCapExtensions(c, cap, deck - c))
        }
      }
    }

    val initialCap = List(Set("red", "oval", "one", "solid"),
                          Set("red", "oval", "two", "solid"),
                          Set("green", "oval", "one", "solid"))

    val deck = cards diff initialCap.toSet
    growByOne( initialCap, findCapExtensions(initialCap, deck))
  }

BalmungSan · June 17, 2020, 2:09pm

What about writing your ow recursive function? instead of the foldLeft so you can return when you want.
And if you made it a tail recursive function, it doesn’t have to return from many stack calls but just one.

jimka · June 17, 2020, 2:21pm

At each level of the recursion I need to iterate over every element of deck and for each element of deck I need to make another recursive call. The iteration at each level makes the tail recursion difficult, unless I refactor it into continuation passing style.

BalmungSan · June 17, 2020, 2:28pm

The iteration at each level makes the tail recursion difficult

Got it.
A quick and dirty workaround would be something like this:
(it may have typos, but I guess you get the idea)

final case class EarlyReturn[A](a: A) extends Throwable

def findCap(cards: Set[Card], targetSize: Int): Option[Set[Card]] = {
  def growByOne(cap: List[Card], deck: Set[Card]): Option[Set[Card]] = {
    if (deck.isEmpty)
      throw EarlyReturn(None)
    else if (cap.length == targetSize)
      throw EarlyReturn(Some(cap.toSet))
    else
      ...
   }

  try {
    growByOne( initialCap, findCapExtensions(initialCap, deck))
  } catch {
    case EarlyReturn(r) => r
  }

jimka · June 17, 2020, 2:46pm

yes, using throw/catch is indeed what my block/return (shown above) does behind the scenes, and also the scala.util.control.Breaks class.

However, be careful with a home-grown throw/catch solution that the class of exception that you throw includes with NoStackTrace, otherwise the system will compute a stacktrace when the exception is thrown.

jimka · June 18, 2020, 8:53am

It took me a while but I finally was able to convert this to tail recursion. The trick is not to make growByOne tail recursive, but rather just insert yet another local function which is tail recursive in place of the foldLeft.

    def growByOne(cap: List[Card], deck: Set[Card]): Option[Set[Card]] = {
      if (cap.length == targetSize) {
        Some(cap.toSet)
      } else if (deck.isEmpty)
        None
      else {
        @tailrec
        def recur(remaining: List[Card]): Option[Set[Card]] = {
          remaining match {
            case Nil => None
            case c :: cs =>
              growByOne(c :: cap, findCapExtensions(c, cap, deck - c))
                .orElse(recur(cs))
          }
        }
        recur(deck.toList)
      }
    }

I’m not sure it’s really better than the foldLeft. The foldLeft version is very concise, it just has the problem that I have to drag the Some(...) along to the end of the loop, which seems like a hack.

jimka · June 18, 2020, 10:30am

Another interesting way to escape from a search is using a laziness.

def somePredicate(a:Int,b:Int,c:Int):Boolean = {
  (a*a + b*b == c*c) && (c*c - b*b == a*a)
}

val domain = List.tabulate(100){n=>n+2}
val data1 = for {tail1 <- domain.tails
                 if tail1.nonEmpty
                 a = tail1.head
                 tail2 <- tail1.tail.tails
                 if tail2.nonEmpty
                 b = tail2.head
                 c <- tail2.view
                 //_ = println(s"testing $a, $b, $c")
                 if somePredicate(a, b, c)
                 } yield (a,b,c)

Now data1.next() will compute exactly enough of the loop to find an a,b,c satisfying the predicate.

But the following fails. It computes a Vector of all such triples. It’s not clear to me why it fails.

val data2 = for {c <- (1 to 100)
                 b <- (1 to c)
                 a <- (1 to b).view
                 //_ = println(s"testing $a, $b, $c")
                 if somePredicate(a, b, c)
                 } yield (a,b,c)

tarsa · June 20, 2020, 7:48pm

https://github.com/scala/scala-collection-contrib provides an extension method lazyFoldLeft which probably does what you want.

jimka · June 23, 2020, 2:01pm

In the introduction to Scala course which I just finished teaching, I didn’t touch the issue of breaking out of a loop, neither did I talk about exceptions nor return. I assigned a final project where the students needed to find arrangements of cards which satisfy certain predicates, and a majority of the 15 students used return to break out of functions once finding the arrangement they sought.

This is yet more evidence, that such a facility is basic, not seen as an advanced feature, and is missing from the language. People who need it find a way.

Another piece of evidence (not conclusive evidence admittedly) is that the Scala standard library makes use of such a ad-hoc facility (the brittle non-reentrant breakable/break described above). If users wish to use standard library facility, they are forced to write imperative code using var.

tarsa · June 23, 2020, 4:11pm

Nonlocal returns http://dotty.epfl.ch/docs/reference/dropped-features/nonlocal-returns.html worked only because your students used them in strict and synchronous transformations. Look at this code:

def process(seq: Seq[Int]): Seq[Int] = {
  seq.map(num => if (num == 5) return Vector(0) else num + 1)
}

println(process(Seq(1, 2, 3)).mkString(", "))
println(process(Seq(1, 3, 5)).mkString(", "))
println(process(LazyList(1, 2, 3)).mkString(", "))
println(process(LazyList(1, 3, 5)).mkString(", "))

It throws an exception on the last line: https://scastie.scala-lang.org/SXJFVgcHRaCADz3X2RXglg
Is that OK for you?

BalmungSan · June 23, 2020, 5:29pm

I find that as an evidence that you should have teach them that in Scala the return keyword shouldn’t be used in the 99% of the time (I justa assume it doesn’t exist).

@tarsa shows another great example of why such a feature shouldn’t be used (as well as why using Seqs is bad).

Edit: added a note that it sometimes has to be used as @Russ mentioned, I just think it is almost never and certainly not something you should care when learning (again my opinion).

Russ · June 23, 2020, 6:21pm

No, this has been discussed before, and the “return” keyword has legitimate uses.

jducoeur · June 23, 2020, 10:41pm

No, it really doesn’t. It proves that return is common in other languages, which tend to be much more imperatively-focused. So they are used to it from those other languages, so they reflexively want to use it, because they are used to imperative programming.

Scala’s idiom is different. Yes, return has valid uses – but I still maintain that it should be considered an edge case, and only used by advanced users. When used by beginners, it tends to trip them up and cause bugs (I hit a case of this at work just the other day), because it isn’t as easy to use correctly as they think it is. That’s because Scala thinks differently than those other languages, and you wind up with more complex code structures.

As I often point out: in eight years of full-time Scala, I have never used return, and rarely felt the lack. You have to develop different habits, but that’s what learning another language well is about…

Russ · June 24, 2020, 5:43am

That’s one way to look it. The way I see it, “return” is like “var” in that both should be avoided in most cases, but they still have perfectly appropriate and acceptable use cases.

If the entire purpose of a function is to iterate through a sequence to find some value, why not just return that value as soon as it is found?

Here is an example. I have a Vector of altitude segments (climbing, level, or descent), and I want to find the index of the segment at a given distance along the route:

def altSegmentIndex(dist: Scalar): Int = {
  for ((p, i) <- altSegments.zipWithIndex.reverse)
    if (dist >= p.dist) return i
  0
  }

Edit: I just realized that I can use indexWhere for this. But I wonder if that function itself uses a return.

Also, I often find myself developing algorithms involving aircraft trajectories, where the algorithm is undefined or does nothing useful if a particular Vector is empty or has less than a certain number of elements. Why shouldn’t I just check the length up front and return an appropriate value if the length is insufficient? Then I don’t have to worry about accessing the head of an empty Vector, for example. Yes, of course I could avoid return by using an additonal nesting level, but that is just unnecessary complexity as far as I am concerned. I realize that others see it differently, but in the end it is just a matter of style. To my way of thinking, seeing the return right up front makes the intent crystal clear, whereas additional nesting can quickly get confusing.

jimka · June 24, 2020, 12:00pm

I agree that return should be avoided. I never covered this in class, and I suppose that since I failed to mention why to avoid it, some students used it in their final projects.

As I mentioned before, since return doesn’t work, or at least its semantics are difficult to explain, I believe there should be some recommended and robust way to perform non-local exit.

jimka · June 24, 2020, 12:09pm

@jducoeur, Perhaps I misstated. I am not claiming return should be used. I’m rather claiming that a robust non-local-exit is important and missing. The fact that the scala library developers implemented a brittle one, and used it in find and perhaps elsewhere is further evidence. In order to use the block/break from the standard library we are forced to write imperative code using var.

For my own purposes, I can use block which I defined in the original post, which accepts a continuation body:(A=>Nothing)=>A as argument. However, I would not want to try to explain this to beginner students.

jimka · June 24, 2020, 12:16pm

Is there a cousin of find which takes an additional function:

def cousinOfFind[A,B](seq:Findable[A], f:A=>B, predicate:B=>Boolean):(A,B) ...

which searches some input (probably more general than Seq) for an element, a, for which f(a) satisfies the predicate, and when found, returns the pair (a,f(a)), the logic being that find almost does this but only returns a, forcing the caller to call f(a) again which might be expensive or impossible to re-compute. It may be impossible in the case that it depends on an iterator.

jducoeur · June 24, 2020, 12:19pm

That kind of illustrates the problem right there. The issue with non-local exit isn’t implementation, it’s definition. In a relatively deep language like Scala, where closures and nested functions are totally routine, what does an easy-to-use, robust, non-local exit look like?

That’s not a rhetorical question, and there might be a good answer, but I haven’t seen it yet…

tarsa · June 24, 2020, 12:20pm

Deprecated: Nonlocal Returns shows the replacement of non-local returns:

import scala.util.control.NonLocalReturns._

returning { ... throwReturn(x) ... }

That’s how non-local returns work today (except they don’t create a closure), but is written explicitly.