I’m migrating a large 2.12 codebase to 2.13 and am running into a bit of an issue with converting my use of ArrayBuffer to immutable.Seq. I have a lot of code that requires the following:
No ClassTag present for the element type T
Need ability to read the size of the builder while building.
Want to continue using a flat array backed data structure for both the builder and the immutable.Seq result since I’m concerned about performance changes and don’t want to re-qualify library code that’s been in production for years.
A simple example of my old code:
def all[T](rowParser: (DbResultRow, Int) => T): Seq[T] = {
val ab = new ArrayBuffer[T]()
while (nextRow) ab += rowParser(this, ab.size)
ab
}
The simplest solution is to change the last line to ab.toSeq, but that effectively returns a non-array-backed, linked List[T], which may (or may not) have downstream performance impacts I’d like to avoid even contemplating.
I can’t use ArraySeq.newBuilder because I don’t have a ClassTag present (and can’t add one since that would percolate up my inheritance chain and wreak all sorts of havoc), and that builder doesn’t disclose its current size.
new VectorBuilder[T] does expose its size and doesn’t require a ClassTag, but the Vector[T] class is a radix-balanced finger tree of width 32, not a flat array.
The only “solution” I’ve come up with is to implement a custom toSeq method (modeled on ArraySeq.map):
def toArraySeq[T](ab: ArrayBuffer[T]): ArraySeq[T] = {
val a = new Array[Any](ab.size)
var i = 0
while (i < a.length){
a(i) = ab(i)
i += 1
}
ArraySeq.unsafeWrapArray(a).asInstanceOf[ArraySeq[T]]
}
(which may be better in a MyArrayBuffer subclass, that’s ok)
I guess my question is if there’s a more idiomatic 2.13 way to accomplish this instead of the toArraySeq() above?
ArrayBuffer is still a mutable.Seq, but the “default” Seq is no longer collection.Seq but immutable.Seq.
So it should be sufficient to just tweak which Seq you intend.
(I’m trying out the use case to see if warnings should tell you about this migratory change.)
Edit: by “Want to continue using a flat array backed data structure for both the builder and the immutable.Seq”, I assume you were previously using mutable.Seq, and that you were misled by the current error message?
Also, depending on your needs, if you only need to populate an array and then use it immutably, consider using ArrayBuilder to build and wrap the resulting array in whatever the “unsafe wrapper” is called (that does not copy, if that sounds appealing).
I’m confused - yes I’m aware that ArrayBuffer is mutable, and that the default collection Seq is now immutable, that’s why I’m trying to change my code to have the methods that previously returned an ArrayBuffer-as-mutable-Seq to now return an immutable Seq.
Basically I’m shooting for Option 3: use immutable sequences, so I want all my Seq returning methods to now return an immutable Seq instead of the mutable ArrayBuffer.
Picking that Seq is a little tricky, since my building code has the three requirements I mentioned, and it’s not entirely clear which Seq class would satisfy those.
The new VectorBuilder[T] or my custom toArraySeq() are the closest options I’ve found, but maybe I’m missing something?
This is what I meant about turning an array into a Seq:
scala> val xs: Seq[Int] = Array(42)
^
warning: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): implicit conversions from Array to immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly if you want to copy, or use the more efficient non-copying ArraySeq.unsafeWrapArray
val xs: Seq[Int] = ArraySeq(42)
scala> val xs: Seq[Int] = ArraySeq.unsafeWrapArray(Array(42))
^
error: not found: value ArraySeq
scala> val xs: Seq[Int] = collection.immutable.ArraySeq.unsafeWrapArray(Array(42))
val xs: Seq[Int] = ArraySeq(42)
That is helpful if you don’t need a Seq with particular performance characteristics.
(Are you adding elements later, indexing, or iterating, etc.)
(Edit: ArrayBuilder has length instead of size member. I don’t know if length counts more than size.)
Ok, so I ended up going in a bunch of different directions depending on the precise scenario of each case. After having done that on 1 project out of 10+ I realized the errors of my ways and just implemented:
class MyArrayBuffer[T](initialSize: Int) extends scala.collection.mutable.ArrayBuffer[T](initialSize) {
def this() = this(ArrayBuffer.DefaultInitialSize)
override def toSeq: Seq[T] = toIndexedSeq
override def toIndexedSeq: IndexedSeq[T] = {
// Totally cribbed from ArraySeq.map[B](f: A => B): ArraySeq[B] which doesn't require a ClassTag[B]
val a = new Array[Any](size)
var i = 0
while (i < a.length) {
a(i) = this (i)
i += 1
}
ArraySeq.unsafeWrapArray(a).asInstanceOf[ArraySeq[T]]
}
}
Why? Because the builders are very frustrating to work with:
It’s not convenient to specify a sizeHint (see code snippet below).
I can’t iterate over the items accumulated so far, only over the eventual result.
Querying the size requires calling knownSize, which raises later code reading alarm bells since it is allowed to return -1, so then you have to double-check precisely what builder you’re working with to be sure you didn’t mess it up.
Specifying the sizeHint is very inconvenient in e.g.:
// This clear one-liner...
val m = if (a) new ArrayBuffer[M](32) else null
// ... turns into a muddy 5-liner:
val m = if (a) {
val out = ArraySeq.newBuilder[M]
out.sizeHint(32)
out
} else null
Let’s not start a “don’t use null”-discussion, it’s just to illustrate that there are cases where size hinting adds a lot more than one line. Having sizeHint return this.type would resolve this.
Having ArrayBuffer.toSeq return a List and ArrayBuffer.toIndexedSeq return a Vector seem like odd choices (I guess made by default?) - maybe they could be updated with something like the code above?
I happen to be writing doc (or setting expectations) for sizeHint, so that is interesting feedback. You’re right that it would be better to return this.
The idiom is ArraySeq.newBuilder[M].tap(_.sizeHint(32)). That is very OK if using the inliner -opt:inline:mypackage.*.
The other API is sizeHint(collection, delta). It uses collection.knownSize judiciously.
Also BTW you can always classTag[Any] to summon a ClassTag[Any]. Of course, if you wanted an array of a specific type, that is suboptimal. I would be inclined to just use ArrayBuilder[Any] if I only needed to populate an array. It doesn’t much matter, since ArrayBuilder and ArrayBuffer now use the same logic for sizing the internal capacity.
By coincidence, I just tried tweaking tap to return self.type. The PR isn’t green yet.
Wow. That was quite a rabbit hole. I love Scala, but the level of abstraction in the collections here almost take it to a Ruby level of opaqueness.
ArraySeq.untagged is a SeqFactory[A], but it’s actually a ClassTagSeqFactory.AnySeqDelegate(ArraySeq), at which point it’s no longer really statically traceable (= I just get lost in the world of delegates ctrl-clicking in IDEA).
Setting a breakpoint and stepping through the code reveals that we go through two (or was it more?) layers of delegation and eventually end up with Array.from which calls Iterator.toArray which calls copyToArray. Those normally require a ClassTag so it seems like ClassTag.Any is summoned along the way.
Anyway, I mostly just wanted to check that we didn’t go through yet another builder (like ArrayBuffer.toIndexedSeq does with VectorBuilder → Vector), which we don’t, so yes, buffer.to(ArraySeq.untagged) can indeed be used!
(But at the end of the day it net-net does essentially the same thing as MyArrayBuffer.toIndexedSeq)