Extending versus wrapping Vector types

jducoeur · January 1, 2025, 10:38pm

Problem is, there is a fundamental reason for not allowing it – that’s what we’re talking about upthread. Vector isn’t a simple concrete class that you can extend and just use all the methods – it’s somewhat an interface, with a whole lot of inter-related implementations that actually do much of the work. And the concrete methods often assume that all the details are exactly right, not changed at all.

So what you’re asking for probably is impossible: in order to highly optimize it, the Vector class is just the tip of a complex iceberg, and it’s likely not possible to make it easily and safely extensible without losing those optimizations. And optimizing Vector is way, way more important in practice than making it extensible.

tarsa · January 1, 2025, 11:56pm

Russ:

I may be beating a dead horse, but I would still like to be able to just write something like
case class Tracks extends Vector[Track]:
  // add methods and lazy val calculations here
After all, Track and therefore Vector[Track] are my own types, so why should I not be allowed to extend the latter?

i’ve already answered that it wouldn’t get what you want, at least in this form. the methods from Vector return a Vector, not Tracks, so if your class inherits such methods, they are still returning Vector.

otoh, you can extend mutable sequences, like scala.collection.mutable.ArrayBuffer or collections from java, but that’s not really compatible with memoization in lazy val fields in the custom subclass.

the https://github.com/vavr-io/vavr library i.e. functional collections for java also seems to disallow subclassing, like scala’s functional (immutable) collections.

in short, subclassing functional (immutable) collections and imperative (mutable) ones, are very different topics.

i’ve made an example immutable list to illustrate the problem (warning, mostly untested, because it’s just for illustrative purposes):

// functional list, i.e. immutable and with structural sharing
abstract class MyList[+A] {
  import MyList._

  def apply(idx: Int): A = this match {
    case _ if idx < 0 => sys.error("can't find elem of idx < 0")
    case MyNil => sys.error("can't get elem from MyNil")
    case MyCons(head, _) if idx == 0 => head
    case MyCons(_, tail) => tail(idx - 1)
    case unexpected => sys.error(s"unexpected this type: $unexpected")
  }

  def toList: List[A] = List.unfold(this) {
    case MyNil => None
    case MyCons(head, tail) => Some(head -> tail)
    case unexpected => sys.error(s"unexpected this type: $unexpected")
  }

  def ++[B >: A](other: MyList[B]): MyList[B] = {
    toList.foldRight(other)((elem, acc) => MyCons(elem, acc))
  }

  def map[B](fn: A => B): MyList[B] = this match {
    case MyNil => MyNil
    case MyCons(head, tail) => MyCons(fn(head), tail.map(fn))
    case unexpected => sys.error(s"unexpected this type: $unexpected")
  }

  def flatMap[B](fn: A => MyList[B]): MyList[B] = {
    toList.foldRight(MyNil: MyList[B])((elem, acc) => fn(elem) ++ acc)
  }
}

object MyList {
  def apply[A](elems: A*): MyList[A] =
    elems.foldRight(MyNil: MyList[A])((elem, acc) => MyCons(elem, acc))

  case object MyNil extends MyList[Nothing]

  case class MyCons[+A](head: A, tail: MyList[A]) extends MyList[A]
}

case class Track(something: Int)

class Tracks(tracks: MyList[Track]) extends MyList[Track] {
  lazy val negate = map(track => Track(-track.something))
}

import util.Try

val tracks = new Tracks(MyList(Track(5), Track(8)))
// both below statements print: Failure(java.lang.RuntimeException: unexpected this type: Playground$Tracks@...)
println(Try(tracks.negate.toList))
println(Try(tracks.map(track => Track(-track.something))))
// below doesn't compile, says:
// Found:    Playground.MyList[Playground.Track]
// Required: Playground.Tracks
// val tracks2: Tracks = tracks ++ tracks

Russ · January 2, 2025, 2:08am

Obviously, I would want all Vector[Track] instances automatically converted to Tracks before being returned or it would be pointless. If I can force that with an implicit converter, I don’t see why the compiler can’t do it.

Russ · January 2, 2025, 2:20am

Problem is, there is a fundamental reason for not allowing it – that’s what we’re talking about upthread. Vector isn’t a simple concrete class that you can extend and just use all the methods – it’s somewhat an interface, with a whole lot of inter-related implementations that actually do much of the work. And the concrete methods often assume that all the details are exactly right, not changed at all.

Maybe I am just being dense here, but I still don’t get it. Yes, I understand that Vector is not a simple concrete class, but it seems to me that Vector[Track] is a concrete class, simple or not, so why can’t it be extended like any other concrete class? What I am asking for is not much more than “syntactic sugar” for the pattern that I posted above. I don’t see why that would even depend on the implementation of Vector under the hood.

som-snytt · January 2, 2025, 2:49am

The slogan is “prefer composition to inheritance”, and export was added as a convenient mechanism to encourage it.

Conversions that make types indistinguishable are generally considered bad form. For example, the implicit conversions between Scala and Java collections are deprecated. The doc at that link shows an example problem.

The long discussion for Dotty:

where Odersky has the last word in introducing

I may be beating a dead horse

It’s more like you’ve slung the dead horse over one shoulder while pushing a heavy stone uphill.

jducoeur · January 2, 2025, 4:28am

No, there really, seriously isn’t. Vector per se only exists at compile time, not at runtime. It is intentionally impossible to instantiate Vector the way you’re assuming, because there does not exist a concrete, complete Vector class. If you look at the running program, you will never actually see an instance of Vector.

Instead, when you create an initial empty Vector[Track], you actually wind up with the universal, shared Vector0 object. When you add something to it, you get a Vector1 instance. Add another, you get a Vector2. That keeps going through Vector6, and after you add another beyond that, it becomes a BigVector. (And that’s still over-simplifying the situation, I think.)

Basically, there is no single Vector class to extend – there is a compile-time Vector type (which is very different), with many deeply intertwined classes fitting that type, tossing the data hand to hand depending on its size. There’s no complete class that you can extend.

Seriously: try it – copy the code out of stdlib, and try to refactor it to make it extensible, while preserving its performance and reliability. I suspect you’ll find it quite hard. (Probably impossible, but I haven’t tried.)

Russ · January 2, 2025, 6:43am

No, there really, seriously isn’t. Vector per se only exists at compile time, not at runtime. It is intentionally impossible to instantiate Vector the way you’re assuming, because there does not exist a concrete, complete Vector class. If you look at the running program, you will never actually see an instance of Vector.

Interesting. I did not realize that, and I certainly appreciate the explanation. Nevertheless, if I can have an implicit conversion from a Vector[Track] type to the Tracks case class – and I know that I can – then it still seems to me that it should be possible to extend Vector[Track] to Tracks. But maybe that is extension in a some new sense that is not currently supported in the language, although it would be useful for my purposes.

Russ · January 2, 2025, 7:05am

Does that principle apply in all cases, or are there perhaps exceptions? For example, Ints are automatically converted to Doubles (but not vice versa).

For the record, is the following implicit conversion considered bad form?

case class Tracks(tracks: Vector[Track]):
  ...

object Tracks:
  given Conversion[Vector[Track], Tracks] = Tracks(_)
  given Conversion[Tracks, Vector[Track]] = _.tracks

More importantly, it is likely to be disallowed in a future version of the language?

It’s more like you’ve slung the dead horse over one shoulder while pushing a heavy stone uphill.

No pain, no gain!

som-snytt · January 2, 2025, 7:56am

https://docs.scala-lang.org/scala3/reference/dropped-features/weak-conformance.html

and Scala 2 has -Wnumeric-widen, so also playing fast and loose with numbers has fallen out of favor.

BalmungSan · January 2, 2025, 4:40pm

In general yes, most people would consider that kind of implicit conversions bad.
However, since they are not expensive then I guess is mostly fine if that gives you the API you want. Also, I said most people, there surely will be folks that would actually like that solution.

AFAIK, no.
So that can also give yu peace of mind.

I still, however, don’t understand what is the problem you are trying to solve.
IIUC, all you want to do is simply enrich Vector[Track] with some additional methods, why not just use extension for that?

Or, why not just make Tracks a full collection by using the IndexedSeqOps approach rather than focusing on it being a Vector.

Russ · January 2, 2025, 6:40pm

I still, however, don’t understand what is the problem you are trying to solve.
IIUC, all you want to do is simply enrich Vector[Track] with some additional methods, why not just use extension for that?

I just got tired of all the wrapping and unwrapping. I got tired of seeing tracks.tracks and Tracks(tracks) in my code. And I also have three or four other similar classes. In the end, it’s really just cosmetic, I guess, and I tend to be a bit obsessive about clean code, but I figured that once I figured it out, it would reap the benefit from then on.

I considered using extensions, but I don’t like the fact that they require the Python-style explicitly named “self” convention instead of the implicit Java anonymous “this” convention. That makes extensions inconsistent with actual methods. Also, extensions do not have access to private fields, making them a bit less versatile. I addressed those issues on another thread.

Or, why not just make Tracks a full collection by using the IndexedSeqOps approach rather than focusing on it being a Vector.

I’m certainly willing to consider it. Can you give me a brief outline?

BalmungSan · January 2, 2025, 11:38pm

Yes, because they aren’t the same thing.

You won’t have access to private fields of external classes anyways.
And there is simply no point in using extension for classes / traits you control (minus opaque types) other than not wanting to have long files to which I would also

In essence, you want Tracks to be a collection of Track entities right?
Rather than focusing on it being a subtype of Vector[Track] which is not feasible, you can just make it a subclass of IndexedSeq which is totally possible. The implementation would still be done using a Vector so the runtime characteristics would be the same, and you would get all the collections API.

It will be something like this:

final class Tracks (
    private val tracks: Vector[Track]
) extends IndexedSeq[Track] with IndexedSeqOps[Track, Vector, Tracks]:
  // Override abstract methods,
  // some concrete methods to provide more efficient implementations,
  // and overload things like map for Track => Track cases.

object Tracks extends IterableFactory:
  def apply(tracks: Track*): Tracks =
    new Tracks(Vector(tracks))

  // Override factory methods.

But for more details check: Implementing Custom Collections (Scala 2.13) | Scala Documentation

Russ · January 3, 2025, 6:43pm

OK, thanks for that outline. I’ll see what I can do with it when I get a chance (this is a low-priority task that I am doing more or less as a hobby, although it can be applied to my work).

The first step will be to make it generic, otherwise it is pointless. I had a complete non-generic solution to this problem a long time ago, which I outlined in the opening post on this thread. The challenge is to capture the pattern in reusable generic form so I can avoid repeating the method-forwarding boilerplate for each Vector-wrapping class. I’m referring to stuff like this:

  def :+ (t: Track) = copy(tracks :+ t)
  def +: (t: Track) = copy(t +: tracks)
  def ++ (t: Tracks) = copy(tracks ++ t.tracks)
  def ++ (v: Vector[Track]) = copy(tracks ++ v)

  def reverse = copy(tracks.reverse)
  def tail = copy(tracks.tail)
  def take(i: Int) = copy(tracks.take(i))
  def drop(i: Int) = copy(tracks.drop(i))
  def takeRight(i: Int) = copy(tracks.takeRight(i))
  def dropRight(i: Int) = copy(tracks.dropRight(i))

  def takeWhile(f: Track => Bool) = copy(tracks.takeWhile(f))
  def dropWhile(f: Track => Bool) = copy(tracks.dropWhile(f))
  def filter(f: Track => Bool) = copy(tracks.filter(f))

tarsa · January 3, 2025, 11:00pm

i’ve already provided draft of solution here Extending versus wrapping Vector types - #8 by tarsa

to expand on that.

first, there is some upfront investment to create wrapper base class:

import scala.collection.Factory

// idea: this abstract superclass required some elbow grease,
//       but subclasses should have very little boilerplate
abstract class SeqWrapper[Elem, Wrapper <: SeqWrapper[Elem, _]] {
  protected type PartiallyDefinedWrap = PartialFunction[Seq[Elem], Wrapper]
  protected def wrap: PartiallyDefinedWrap
  private def wrapOrDie(rawCollection: Seq[Elem]): Wrapper = {
    wrap.applyOrElse(
      rawCollection,
      (unexpected: Seq[Elem]) =>
        sys.error(s"unexpected collection type: ${unexpected.getClass}")
    )
  }

  protected def unwrap: Seq[Elem]

  // below methods don't need to be overridden in subclasses,
  // so it's one time setup cost here
  def apply(i: Int): Elem = unwrap.apply(i)
  def isDefinedAt(idx: Int): Boolean = unwrap.isDefinedAt(idx)
  def at(idx: Int): Option[Elem] = Option.when(isDefinedAt(idx))(apply(idx))
  def collectFirst[Out](pf: PartialFunction[Elem, Out]): Option[Out] =
    unwrap.collectFirst(pf)

  def ++(other: Wrapper): Wrapper = wrapOrDie(unwrap ++ other.unwrap)
  def +:(elem: Elem): Wrapper = wrapOrDie(elem +: unwrap)
  def :+(elem: Elem): Wrapper = wrapOrDie(unwrap :+ elem)

  def head: Elem = unwrap.head
  def headOption: Option[Elem] = unwrap.headOption
  def tail: Wrapper = wrapOrDie(unwrap.tail)
  def last: Elem = unwrap.last
  def lastOption: Option[Elem] = unwrap.lastOption
  def init: Wrapper = wrapOrDie(unwrap.init)

  def iterator: Iterator[Elem] = unwrap.iterator
  def to[OutColl](factory: Factory[Elem, OutColl]): OutColl =
    unwrap.to(factory)

  def fold[Out >: Elem](z: Out)(op: (Out, Out) => Out): Out =
    unwrap.fold(z)(op)
  def foldLeft[Out](z: Out)(op: (Out, Elem) => Out): Out =
    unwrap.foldLeft(z)(op)
  def foldRight[Out](z: Out)(op: (Elem, Out) => Out): Out =
    unwrap.foldRight(z)(op)

  def map(f: Elem => Elem): Wrapper =
    wrapOrDie(unwrap.map(f))
  def flatMap(f: Elem => Wrapper): Wrapper =
    wrapOrDie(unwrap.flatMap(f(_).unwrap))
  def collect(pf: PartialFunction[Elem, Elem]): Wrapper =
    wrapOrDie(unwrap.collect(pf))

  def partition(p: Elem => Boolean): (Wrapper, Wrapper) =
    wrapTwoOrDie(unwrap.partition(p))
  def span(p: Elem => Boolean): (Wrapper, Wrapper) =
    wrapTwoOrDie(unwrap.span(p))
  def splitAt(n: Int): (Wrapper, Wrapper) =
    wrapTwoOrDie(unwrap.splitAt(n))

  private def wrapTwoOrDie(pair: (Seq[Elem], Seq[Elem])): (Wrapper, Wrapper) =
    (wrapOrDie(pair._1), wrapOrDie(pair._2))
}

then you can create many subclasses with little boilerplate in each of them:

case class Track(whatever: Int)

case class Tracks(tracks: Vector[Track]) extends SeqWrapper[Track, Tracks] {
  override protected def wrap: PartiallyDefinedWrap = {
    case vector @ Vector(_*) => Tracks(vector)
  }

  override protected def unwrap: Seq[Track] = tracks
}

object Tracks {
  def apply(tracks: Track*): Tracks = apply(Vector(tracks: _*))
}

case class Shape(whatever: String)

case class Shapes(shapes: List[Shape]) extends SeqWrapper[Shape, Shapes] {
  override protected def wrap: PartiallyDefinedWrap = { case list @ List(_*) =>
    Shapes(list)
  }

  override protected def unwrap: Seq[Shape] = shapes
}

object Shapes {
  def apply(shapes: Shape*): Shapes = apply(List(shapes: _*))
}

object Test {
  def main(args: Array[String]): Unit = {
    val tracks1 = Tracks((1 to 10).toVector.map(Track))
    val tracks2 = Tracks((10 to 1 by -3).toVector.map(Track))
    val tracks3 = tracks1.init ++ tracks2.tail
    println(tracks3)
    val shapes1 = Shapes(Shape("abc"), Shape("xyz"))
    val shapes2 = Shapes(Shape("012"))
    val shapes3 = shapes1.map(shape => Shape(shape.whatever + "!!!")) ++ shapes2
    println(shapes3)
  }
}

Russ · January 4, 2025, 2:46am

Looks interesting. I’ll give it a try when I get a chance.

som-snytt · January 5, 2025, 10:15am

A random sample use of collection interface in the wild, via a recent link on discord.

github.com

JD557/qrgen/blob/master/core/src/main/scala/eu/joaocosta/qrgen/QrCode.scala#L36


      
            *     final padding, excluding error correction codewords), supply the appropriate version number,
            *     and call the {@link QrCode#QrCode(int,Ecc,byte[],int) constructor}.</p></li>
            * </ul>
            * <p>(Note that all ways require supplying the desired error correction level.)</p>
            * @param version the version number to use, which must be in the range 1 to 40 (inclusive)
            * @param errorCorrectionLevel the error correction level to use
            * @param dataCodewords the bytes representing segments to encode (without ECC)
            * @param mask the mask pattern to use, which is either -1 for automatic choice or from 0 to 7 for fixed choice
            */
          final case class QrCode(version: Int, errorCorrectionLevel: Ecc, dataCodewords: ArraySeq[Byte], mask: Option[Int])
              extends IndexedSeq[IndexedSeq[Boolean]] {
            // Check arguments and initialize fields
            require(version >= QrCode.MIN_VERSION && version <= QrCode.MAX_VERSION, "Version value out of range")
            require(mask.forall(x => x >= 0 && x <= 7), "Mask value out of range")
          
            val length = version * 4 + 17
          
            val (modules, bestMask) = {
              // Private grids of modules/pixels, with dimensions of size*size
              val builder = new QrCodeBuilder(size)

tarsa · January 5, 2025, 4:35pm

just extending IndexedSeq means that any operation that produces new collection based on Tracks (the subclass) will return IndexedSeq instead of Tracks.

case class Track(whatever: Int)

class Tracks(underlying: Vector[Track]) extends IndexedSeq[Track] {
  override def apply(i: Int): Track = underlying(i)

  override def length: Int = underlying.length
}


val tracks = new Tracks(Vector(Track(1)))
// below prints: class scala.collection.immutable.Vector1
println((tracks ++ Seq(Track(2))).getClass)
// compilation fails with error:
// Found:    IndexedSeq[Playground.Track]
// Required: Playground.Tracks
val tracks2: Tracks = tracks ++ Seq(Track(2))

also the performance is bad if only the required methods are overridden and nothing else. example benchmark:

case class Track(whatever: Int)

class Tracks(underlying: Vector[Track]) extends IndexedSeq[Track] {
  override def apply(i: Int): Track = underlying(i)

  override def length: Int = underlying.length
}


def benchmark(name: String, coll: IndexedSeq[Track]): Unit = {
  val startTime = System.currentTimeMillis()
  val controlValue = coll.indices.foldLeft(0) { (acc, idx) =>
    acc + coll.dropRight(idx).length
  }
  val totalTime = System.currentTimeMillis() - startTime
  println(s"$name: controlValue = $controlValue")
  println(s"$name: duration = $totalTime ms")
}

val underlying = (1 to 12345).toVector.map(Track(_))
val wrapped = Tracks(underlying)

benchmark("Vector", underlying)
benchmark("Tracks", wrapped)

prints:

Vector: controlValue = 76205685
Vector: duration = 10 ms
Tracks: controlValue = 76205685
Tracks: duration = 827 ms