How to avoid double storage in class hierarchies?

I am looking at class hierarchies where parameters are passed down the hierarchy via primary constructors.

I am having issues to get bytecode that is both fast and memory efficient.

Example setting: Objects of type A are constructed from an integer id. B subclasses A and its constructor calls A’s constructor with the given id.

Let’s start with the Java equivalent of my Scala code.

public class A {

    public final int id;

    public A(int id) {
        this.id = id;
    }

    public String toString() {
        return String.format("%d", id);
    }

}

public class B extends A {

    public B(int id) {
        super(id);
    }

    public String toString() {
        return String.format("%d", id);
    }

}

(Do not get distracted by the double implementation of toString. I was interested in bytecode differences between A.toString and B.toString and also needed B.toString to make the Scala compiler produce the bytecode shown below.)

This compiles to:

public class java.A {
  public int id;

  public java.A(int);
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: aload_0
       5: iload_1
       6: putfield      #2                  // Field id:I
       9: return
    LineNumberTable:
      line 7: 0
      line 8: 4
      line 9: 9
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      10     0  this   Ljava/A;
          0      10     1    id   I

  public java.lang.String toString();
    Code:
       0: ldc           #3                  // String %d
       2: iconst_1
       3: anewarray     #4                  // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: getfield      #2                  // Field id:I
      12: invokestatic  #5                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #6                  // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 12: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Ljava/A;
}

public class java.B extends java.A {
  public java.B(int);
    Code:
       0: aload_0
       1: iload_1
       2: invokespecial #1                  // Method java/A."<init>":(I)V
       5: return
    LineNumberTable:
      line 6: 0
      line 7: 5
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0       6     0  this   Ljava/B;
          0       6     1    id   I


public java.lang.String toString();
    Code:
       0: ldc           #2                  // String %d
       2: iconst_1
       3: anewarray     #3                  // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: getfield      #4                  // Field id:I
      12: invokestatic  #5                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #6                  // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 10: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Ljava/B;
}

Things to notice:

  • id is only stored in A.
  • B.toString accesses A.id via getfield.

Now let’s turn to Scala:

class A(id: Int) {
    override def toString = String.format("%d", id)
}

class B(id: Int) extends A(id) {
    override def toString = String.format("%d", id)
}

Scala 3 compiles A and B to:

public class scala.A {
  private final int id;

  public scala.A(int);
    Code:
       0: aload_0
       1: iload_1
       2: putfield      #11                 // Field id:I
       5: aload_0
       6: invokespecial #14                 // Method java/lang/Object."<init>":()V
       9: return
    LineNumberTable:
      line 3: 0
      line 4: 9
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      10     0  this   Lscala/A;
          0      10     1    id   I

  public java.lang.String toString();
    Code:
       0: ldc           #20                 // String %d
       2: iconst_1
       3: anewarray     #4                  // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: getfield      #11                 // Field id:I
      12: invokestatic  #26                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #32                 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 4: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Lscala/A;
}

public class scala.B extends scala.A {
  private final int id;

  public scala.B(int);
    Code:
       0: aload_0
       1: iload_1
       2: putfield      #11                 // Field id:I
       5: aload_0
       6: iload_1
       7: invokespecial #13                 // Method scala/A."<init>":(I)V
      10: return
    LineNumberTable:
      line 7: 0
      line 8: 10
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      11     0  this   Lscala/B;
          0      11     1    id   I

  public java.lang.String toString();
    Code:
       0: ldc           #19                 // String %d
       2: iconst_1
       3: anewarray     #21                 // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: getfield      #11                 // Field id:I
      12: invokestatic  #27                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #33                 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 8: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Lscala/B;
}

Things to notice:

  • id is stored twice, both in A and B.
  • B.toString accesses B.id via getfield.

We can avoid the double storage by making A.id a val:

class A(final val id: Int) {
    override def toString = String.format("%d", id)
}

class B(id: Int) extends A(id) {
    override def toString = String.format("%d", id)
}

This compiles to:

public class scala.A {
  private final int id;

  public scala.A(int);
    Code:
       0: aload_0
       1: iload_1
       2: putfield      #11                 // Field id:I
       5: aload_0
       6: invokespecial #14                 // Method java/lang/Object."<init>":()V
       9: return
    LineNumberTable:
      line 3: 0
      line 4: 9
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      10     0  this   Lscala/A;
          0      10     1    id   I

  public int id();
    Code:
       0: aload_0
       1: getfield      #11                 // Field id:I
       4: ireturn
    LineNumberTable:
      line 3: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0       5     0  this   Lscala/A;

  public java.lang.String toString();
    Code:
       0: ldc           #21                 // String %d
       2: iconst_1
       3: anewarray     #4                  // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: invokevirtual #23                 // Method id:()I
      12: invokestatic  #29                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #35                 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 4: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Lscala/A;
}

public class scala.B extends scala.A {
  public scala.B(int);
    Code:
       0: aload_0
       1: iload_1
       2: invokespecial #10                 // Method scala/A."<init>":(I)V
       5: return
    LineNumberTable:
      line 7: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0       6     0  this   Lscala/B;
          0       6     1    id   I

  private int id$accessor();
    Code:
       0: aload_0
       1: invokespecial #17                 // Method scala/A.id:()I
       4: ireturn
    LineNumberTable:
      line 7: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0       5     0  this   Lscala/B;

  public java.lang.String toString();
    Code:
       0: ldc           #21                 // String %d
       2: iconst_1
       3: anewarray     #23                 // class java/lang/Object
       6: dup
       7: iconst_0
       8: aload_0
       9: invokespecial #25                 // Method id$accessor:()I
      12: invokestatic  #31                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
      15: aastore
      16: invokestatic  #37                 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
      19: areturn
    LineNumberTable:
      line 8: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      20     0  this   Lscala/B;
}

Things to notice:

  • id is only stored in A.
  • The Scala compiler extended A with the getter A.id.
  • The Scala compiler extended A with B.id$accessor which calls A.id.
  • B.toString calls id$accessor.

Concluding, we have two calls to retrieve the id where a getfield would be enough, because A.id is declared as final, so there is no need for precautions for someone overriding id.

The overhead of the addditional method calls is significant: In a four hour benchmark running on four cores, I measured a 10% loss in throughput after changing a few central classes to avoid double storage.

I am aware of another way to avoid double storage:

abstract class A {
    val id: Int
    override def toString = String.format("%d", id)
}

class B(override val id: Int) extends A {
    override def toString = String.format("%d", id)
}

This code is elegant and yields elegant byte code, too. However, in the benchmark mentioned above, the additional virtual method call reduced throughput by 7%.

Is this a shortcoming of the Scala compiler? Does it ignore the final modifier on id?
Or is there a another way to produce bytecode like Java?

Looks like a Scala 3 regression to me. If I compile this code:

class A(val id: Int) {
    override def toString = String.format("%d", id)
}
class B(id: Int) extends A(id) {
    override def toString = String.format("%d", id)
}

with the Scala 2.13.8 compiler, the bytecode for B lacks id$accessor, as hoped.

You might try digging around in Issues · lampepfl/dotty · GitHub to see if you’re the first person to notice the problem.

Now that I recognize the disconcerting avatar (I guess informarte and SethTisue may compete on this point)

https://github.com/lampepfl/dotty/issues/14600

1 Like