Same funny iteration pattern appears three times, need to make it more efficient

I have a particular type of inefficient iteration which has occurred in at least three methods in an application I’m implementing.
The problem is that in each case, to compute the answer, I logically need to do two different traversals of a data structure. I’d prefer to do the traversal only once, as each traversal applies a function which is considered expensive.

In each case I have a method f (it is actually a method of the same name as the method itself, but I don’t think that’s important for the algorithm). The method, f, should be considered expensive to call. Disregarding efficiency, the semantics of each functions that for two of the methods (disjointDown and subtypep): first to see if f always returns Some(true) . If not, iterate again to see if the call ever returns Some(false) . And if both checks fail, then delegate to the superclass with super.f(t). However, in the third method (inhabited), the roles of always and ever are reversed, (swapping foreach and exists).

Can I do this in one iteration? If I use reduce, I’m forced to finish the iteration, as there’s not clean way to break out of the loop. Also if I use map to compute the sequence of Options, var saveOptions = U.map(_.f(t)) then I’m forced to always iterate entirely through the loop.

case class UnionType(U: Type*) extends Type {

  override def inhabited:Option[Boolean] = {
    // TODO : Only a single pass ?
    if (U.exists(_.inhabited.contains(true)))
      Some(true)
    else if (U.forall(_.inhabited.contains(false)))
      Some(false)
    else
      super.inhabited
  }
  // UnionType(U: Type*)
  override protected def disjointDown(t: Type): Option[Boolean] = {
    // TODO : Only a single pass ?
    if (U.forall(_.disjoint(t).contains(true)))
      Some(true)
    else if (U.exists(_.disjoint(t).contains(false)))
      Some(false)
    else
      super.disjointDown(t)
  }

  // UnionType(U: Type*)
  override def subtypep(t: Type): Option[Boolean] = {
    // TODO : Only a single pass ?
    if (U.forall(_.subtypep(t).contains(true)))
      Some(true)
    else if (U.exists(_.subtypep(t).contains(false)))
      Some(false)
    else
      super.subtypep(t)
  }
}

I could make the code more readable by factoring out the common code into a helper function, so the to methods look like this, but the third method swaps the roles of foreach and exists. I could try to extract that out as well. Not sure if that would just make it even more obfuscated???

I could rewrite the helper function as a recursive function which implements the exists and foreach implicitly, terminating recursion if a Some(false) is found. However, to do this I’d need to convert the U:Seq[Type] to U:List[Type]. Perhaps that’s so bad? But it certainly will be uglier than what exists already.

The way this kind of function is implemented in Clojure is that the map function promises to return a lazy sequence, and iterating over the lazy sequence (or partially over it) a second time, DOES NOT re-evaluate, but just uses the previous results.
This laziness means the foreach and exists terminate early without computing any unnecessary result.

Can I do a similar thing here? Can I create a lazy sequence of U.map(f) and then use that as the LHS of .foreach and .exists guaranteeing that f is never called twice on the same object, and that f is never called unnecessarily?

I took a look at the documentation for view, and it is not clear to me whether code like the following will call f redunantly, or will call f only once per element of U?

val vv = U.view.map(f)
if (vv.forall(something-1))
  something-2
elseif (vv.exists(something-3)
  something-4
else
  something-5

A simple experiment shows that in at least some cases f is called redunantly.


def f(x:Int):Int = {
  println(s"looking at x=$x")
  x + 1
}
val vv = Seq(1,3,2,4,3,5,4).view.map(f)
vv.forall(a => a > 0)
vv.exists(a => a > 3)

This prints the following in the worksheet

f: (x: Int)Int



vv: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)
looking at x=1
looking at x=3
looking at x=2
looking at x=4
looking at x=3
looking at x=5
looking at x=4
res4: Boolean = true
looking at x=1
looking at x=3
res5: Boolean = true

Personally, I would just use a List and a custom recursive algorithm.

But, maybe you can use a LazyList since what you want to avoid is not really the iteration but rather the computation.

1 Like

Just adding some detail to @BalmungSan’s reply…

Sure, that’s what #view is about. It won’t create a new collection instance, but run the calculations on top of the view whenever its elements are accessed. The main use case is chaining transformations without instantiating any intermediate collections.

If you run this without the #view, you’ll see f calculated for each item exactly once (during the #map() application):

val vv = Seq(1,3,2,4,3,5,4).map(f)

If you switch to LazyList and reverse the order of #exists() and #forall() applications, you’ll see an opening for an early return upon a successful #exists() application:

val vv = LazyList(1,3,2,4,3,5,4).map(f)
vv.exists(a => a > 3)
vv.forall(a => a > 0)

If you want a combination of all optimizations (i.e. calculate f at most once, apply each of the two predicates to each result at most once, and succeed early upon #exists()), you’ll probably have to go for the custom recursive traversal.

1 Like

I think I’d write an imperative helper for this case to efficiently check multiple conditions per element and return early without the weird non-local return-via-exception problem. Here’s a rough sketch - I didn’t test it but it should convey what I’m thinking.

case class UnionType(U: Type*) extends Type {
  private def findAGoodName(f: Type => Option[Boolean], earlyExists: Boolean)(orElse: => Option[Boolean]): Option[Boolean] = {
    var allEmpty = true
    var anyMissing = false
    val iter = U.iterator

    while (iter.hasNext) {
      f(iter.next()) match {
        case Some(b) =>
          allEmpty = false
          
          if (b == earlyExists) { return Some(earlyExists) }

        case None =>
          anyMissing = true
      }
    }

    // no early-exit means one of
    // - there were no elements at all
    // - every element that exists contains the non-early exit value
    // but we still want to account for the case that some elements may not exist
    // even if the ones that did contain the non-early exit value
    if (allEmpty | anyMissing) {
      orElse
    } else {
      Some(!earlyExists)
    }
  }

  override def inhabited:Option[Boolean] = {
    findAGoodName(_.inhabited, true)(super.inhabited)
  }
  // UnionType(U: Type*)
  override protected def disjointDown(t: Type): Option[Boolean] = {
    findAGoodName(_.disjoint(t), false)(super.disjointDown(t))
  }

  // UnionType(U: Type*)
  override def subtypep(t: Type): Option[Boolean] = {
    findAGoodName(_.subtypep(t), false)(super.subtypep(t))
  }
}

bizare, I don’t have scala.collection.immutable.LazyList
Looks like my project is still on 2.12. I thought it was on 2.13. Looks like I have work to do to convert it to 2.13.

BTW, what’s the equivalent of LazyList in 2.12 so I can test it out?

Stream.

Hi @jgulotta, that’s an interesting suggestion. However, it seems like a huge amount of code to solve a small problem. And unfortunately, the final code is really obfuscated. I’ll suggest what I think is a simpler and more generic solution below.

I think I’ve come up with a pretty good solution which is both elegant, and also generic. Rather than trying to prevent the second traversal, if we just concentrate on eliminating redundant computation we can introduce a memoization function as follows.

  def memoize[F,T](f:F=>T):F=>T = {
    val hash = scala.collection.mutable.Map[F,T]()
    def mem(i:F):T = {
      hash.getOrElse(i, locally{
        val v:T = f(i)
        hash(i) = v
        v
      })
    }
    mem
  }

The function memoize takes a function, and returns a function of the same type, in particular the computed function is a drop-in replacement for the given function. Calling the function the first time with a particular argument, triggers computing the result. But calling the function a second time with the same input value, simply returns the previously computed result.

The function is inspired by the Clojure function of the same name.

With the function, memoize, being defined. The forall/exists example looks like this:

def f(x:Int):Int = {
  println(s"looking at x=$x")
  x + 1
}
val v = Seq(1,3,2,4,3,5,4)
val ff = memoize(f)
v.exists(a => ff(a) > 3)
v.forall(a => ff(a) > 0)

And the output looks like this. I.e., only x=1 and x=3 is computed in the call to v.exists, and in the call to v.forall x=1 and x=3 are not recomputed, but x=2,4, and 5 are computed.

f: (x: Int)Int



v: Seq[Int] = List(1, 3, 2, 4, 3, 5, 4)
ff: Int => Int = <function>
looking at x=1
looking at x=3
res0: Boolean = true
looking at x=2
looking at x=4
looking at x=5
res1: Boolean = true

Using the memoize function, I can rewrite the methods, inhabited, disjointDown, and subtypep as follows:

case class UnionType(U: Type*) extends Type {

  override def inhabited:Option[Boolean] = {
    val i = memoize((s:Type) => s.inhabited)
    if (U.exists(i(_).contains(true)))
      Some(true)
    else if (U.forall(i(_).contains(false)))
      Some(false)
    else
      super.inhabited
  }
  // UnionType(U: Type*)
  override protected def disjointDown(t: Type): Option[Boolean] = {
    val d = memoize((s:Type) => s.disjoint(t))
    if (U.forall(d(_).contains(true)))
      Some(true)
    else if (U.exists(d(_).contains(false)))
      Some(false)
    else
      super.disjointDown(t)
  }

  // UnionType(U: Type*)
  override def subtypep(t: Type): Option[Boolean] = {
    val s = memoize((s:Type) => s.subtypep(t))
    if (U.forall(s(_).contains(true)))
      Some(true)
    else if (U.exists(s(_).contains(false)))
      Some(false)
    else
      super.subtypep(t)
  }
}

This could be simplified using #getOrElseUpdate().

Which is exactly the behavior you’d get from Stream/LazyList.

1 Like

Thanks. good suggestion.

Correct. Except I think my solution is a bit more general in the sense that the behavior I’ve implemented is associated with the function, not with the type of data structure its values are stored in. And if the same value happens to appear multiple times in the list, the lazy list approach would call the function multiple times.

Right, that’s an advantage, indeed.

If you are open to using cats, you can just use Eval.later.