Custom Collections (Scala 2.13): forcing laziness in infinite lists

I am working with infinite sequences that represent infinite time-series data. So I need to have all the operations perform in a lazy fashion and avoid copying. I have taken the route of using Iterable.

To get a sense of what I am trying to achieve consider the following code:

    val q1 = Iterator.continually('A')
    val q2 = q1.drop(5)
    println("q2")
    val q3 = q2.take(2)
    println(q3.mkString(","))

The last println will execute correctly with no problem. Now if I use an Iterable instead, things don’t work as I expected. Consider the following code:

    val q10 = Iterable.from(Iterator.continually('B'))
    println("q20 init")
    val q20 = q10.drop(5)
    println("q20")
    val q30 = q20.take(2)
    println(q30.mkString(","))

The code above will not terminate. More specifically it seems like the from is iterating through the infinite sequence. The following will also not work:

    val q10 = Iterator.continually('B').to(Iterable) 

Looking at the code I see that:

Factory[-A, +C].fromSpecific(it: IterableOnce[A]): C

is use to create the Iterable. After some searching I found documentation on implementing Custom Collections (Scala 2.13). I basically replicated the code using the final 3rd version of the code (test here done without extending StrictOptimizedIterableOps).

Using this code, if I do the following:

    val q12 = TimeSeriesC.continually('D')
    println("q12 init")
    val q22 = q12.drop(5)
    println("q22")
    val q32 = q22.take(2)
    println(q32.mkString(","))

Its a little better. Now its the drop that does not terminate but I cheated somewhat. More concretely I use the following code to create the initial Iterable:

  def continually[A](elem: => A): TimeSeriesC[A] = {
    //Iterable.from(Iterator.continually(elem)) // not lazy
    //Iterator.continually(elem).to(Iterable) // not lazy
    //Iterator.continually(elem).to(new TimeSeriesCFactory()) // not lazy
    // not lazy on drop
    val iterable = new Iterable[A] {
      override def iterator: Iterator[A] = {
        val converter: AbstractIterator[A] = new AbstractIterator[A] {
          private val iter = Iterator.continually(elem)
          override def hasNext: Boolean = iter.hasNext
          override def next(): A = iter.next()
        } // iterator
        converter
      }
    }
    TimeSeriesC[A](iterable)
  }

Notice that if I used any of the library’s ‘from’ or ‘to’ methods, initializing the infinite Iterable will still hang. For completeness sake I also list below the factory - note how the use of to(Iterable) there still causes issues there.

Currently, the only way I can circumvent this is by using:

    val q12 = TimeSeriesC.continually('D').iterator

So my question is, how does one ensure laziness? I looked up LazyOptimizedIterableOps and LazyIterableOps` :sunglasses: but did not find additional clues.

TIA

class TimeSeriesCFactory() extends IterableFactory[TimeSeriesC] {

  def from[A](source: IterableOnce[A]): TimeSeriesC[A] = TimeSeriesC[A](source.iterator.to(Iterable))

  def empty[A]: TimeSeriesC[A] = TimeSeriesC.empty[A]()

  // used by with StrictOptimizedIterableOps[A, TimeSeriesC, TimeSeriesC[A]]
  def newBuilder[A]: mutable.Builder[A, TimeSeriesC[A]] =
    new mutable.ImmutableBuilder[A, TimeSeriesC[A]](empty) {
      def addOne(elem: A): this.type = { ??? ; this }
    }
}
1 Like

By using a collection that is guaranteed to be lazy. Iterable is an interface. If you request to create an instance of Iterable it will just delegate that to the default implementation of that interface, which is List.

If you want something that has to be traversed just once and discarded, I would just stick with Iterator. If you need an Iterator that can be reused you can do View.from(iterator). If you want the already calculated elements to be memoized use LazyList.

1 Like

Indeed this is what I am looking for. I don’t know how views work, but one possible disadvantage is loosing the ability to create a collection and reuse all underlying methods. Going to look at this.

As a side note, after some more search I found that I already had asked a similar question. There, one of the collection gurus had said:

which, (as seen in the code above) still seems problematic because I have only access to an IterableOnce.

I meant lazy in the sense of processing the sequence. Maybe their is a better word for this?

Thanks

I think lazy is accurate. But a LazyList is lazy in that sense, unless I’m misunderstanding what you mean.

1 Like

It is not lazy in the sense that LazyLists (or streams) use a specific technique to delay the generation of the values from a (non-existant) tail. Did not want to get this confused. So ok, lazy it is. :slight_smile:

Curious on why this is not the same as lazy?

BTW, if you really have something infinite it may be worth using something like fs2 or Akka Streams. But, you may be just ok with Iterator.

Just a quirk of mine. When I see Lazy + stream/lists I associate it to the way these are usually implemented - tails as delayed computation. But yes, essentially it is the same.

The goal is to generate hyper-parameters for machine learning tasks. I just need something light with a low maintenance cost. I also had a very simple way of creating and combining iterators of single values into tuples of multiple values (2.12.xx). The idea is to port this to Dotty/Scala3.

Then probably either Iterator, View or LazyList should be enough then.

Scala 3 uses the same stdlib as Scala 2, so this shouldn’t need porting. Or that the porting should just be change the compiler.

I used 2.12 collections so I am trying to see what changes should be made to better align with the 2.13 collections.

You generally pattern-match on this to use more specific implementations but for a lazy collection an Iterator (which you can get from the IterableOnce) is all you need. You need to have a collection that wraps the Iterator and reads from it on demand. The only optimizations you can really do (for a lazy collection) are recognizing your own type (and just return that directly) and recognizing known empty collections. See the LazyList implementation for an example.

I wrapped it as an IterableOnce directly in order to ensure I can recreate the Iterator. The tests work now. Thank you. The 1st version of the factory was defined as:

class TimeSeriesCFactory() extends IterableFactory[TimeSeriesC] {

  def from[A](source: IterableOnce[A]): TimeSeriesC[A] = TimeSeriesC[A](source)

  def empty[A]: TimeSeriesC[A] = TimeSeriesC.empty[A]()

  // used by with StrictOptimizedIterableOps[A, TimeSeriesC, TimeSeriesC[A]]
  def newBuilder[A]: mutable.Builder[A, TimeSeriesC[A]] =
    new mutable.ImmutableBuilder[A, TimeSeriesC[A]](empty) {
      def addOne(elem: A): this.type = { ??? ; this }
    }
}

TIA.