I am looking at class hierarchies where parameters are passed down the hierarchy via primary constructors.
I am having issues to get bytecode that is both fast and memory efficient.
Example setting: Objects of type A are constructed from an integer id. B subclasses A and its constructor calls A’s constructor with the given id.
Let’s start with the Java equivalent of my Scala code.
public class A {
public final int id;
public A(int id) {
this.id = id;
}
public String toString() {
return String.format("%d", id);
}
}
public class B extends A {
public B(int id) {
super(id);
}
public String toString() {
return String.format("%d", id);
}
}
(Do not get distracted by the double implementation of toString. I was interested in bytecode differences between A.toString and B.toString and also needed B.toString to make the Scala compiler produce the bytecode shown below.)
This compiles to:
public class java.A {
public int id;
public java.A(int);
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: aload_0
5: iload_1
6: putfield #2 // Field id:I
9: return
LineNumberTable:
line 7: 0
line 8: 4
line 9: 9
LocalVariableTable:
Start Length Slot Name Signature
0 10 0 this Ljava/A;
0 10 1 id I
public java.lang.String toString();
Code:
0: ldc #3 // String %d
2: iconst_1
3: anewarray #4 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: getfield #2 // Field id:I
12: invokestatic #5 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #6 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 12: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Ljava/A;
}
public class java.B extends java.A {
public java.B(int);
Code:
0: aload_0
1: iload_1
2: invokespecial #1 // Method java/A."<init>":(I)V
5: return
LineNumberTable:
line 6: 0
line 7: 5
LocalVariableTable:
Start Length Slot Name Signature
0 6 0 this Ljava/B;
0 6 1 id I
public java.lang.String toString();
Code:
0: ldc #2 // String %d
2: iconst_1
3: anewarray #3 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: getfield #4 // Field id:I
12: invokestatic #5 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #6 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 10: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Ljava/B;
}
Things to notice:
- id is only stored in A.
- B.toString accesses A.id via getfield.
Now let’s turn to Scala:
class A(id: Int) {
override def toString = String.format("%d", id)
}
class B(id: Int) extends A(id) {
override def toString = String.format("%d", id)
}
Scala 3 compiles A and B to:
public class scala.A {
private final int id;
public scala.A(int);
Code:
0: aload_0
1: iload_1
2: putfield #11 // Field id:I
5: aload_0
6: invokespecial #14 // Method java/lang/Object."<init>":()V
9: return
LineNumberTable:
line 3: 0
line 4: 9
LocalVariableTable:
Start Length Slot Name Signature
0 10 0 this Lscala/A;
0 10 1 id I
public java.lang.String toString();
Code:
0: ldc #20 // String %d
2: iconst_1
3: anewarray #4 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: getfield #11 // Field id:I
12: invokestatic #26 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #32 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 4: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Lscala/A;
}
public class scala.B extends scala.A {
private final int id;
public scala.B(int);
Code:
0: aload_0
1: iload_1
2: putfield #11 // Field id:I
5: aload_0
6: iload_1
7: invokespecial #13 // Method scala/A."<init>":(I)V
10: return
LineNumberTable:
line 7: 0
line 8: 10
LocalVariableTable:
Start Length Slot Name Signature
0 11 0 this Lscala/B;
0 11 1 id I
public java.lang.String toString();
Code:
0: ldc #19 // String %d
2: iconst_1
3: anewarray #21 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: getfield #11 // Field id:I
12: invokestatic #27 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #33 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 8: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Lscala/B;
}
Things to notice:
- id is stored twice, both in A and B.
- B.toString accesses B.id via getfield.
We can avoid the double storage by making A.id a val:
class A(final val id: Int) {
override def toString = String.format("%d", id)
}
class B(id: Int) extends A(id) {
override def toString = String.format("%d", id)
}
This compiles to:
public class scala.A {
private final int id;
public scala.A(int);
Code:
0: aload_0
1: iload_1
2: putfield #11 // Field id:I
5: aload_0
6: invokespecial #14 // Method java/lang/Object."<init>":()V
9: return
LineNumberTable:
line 3: 0
line 4: 9
LocalVariableTable:
Start Length Slot Name Signature
0 10 0 this Lscala/A;
0 10 1 id I
public int id();
Code:
0: aload_0
1: getfield #11 // Field id:I
4: ireturn
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lscala/A;
public java.lang.String toString();
Code:
0: ldc #21 // String %d
2: iconst_1
3: anewarray #4 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: invokevirtual #23 // Method id:()I
12: invokestatic #29 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #35 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 4: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Lscala/A;
}
public class scala.B extends scala.A {
public scala.B(int);
Code:
0: aload_0
1: iload_1
2: invokespecial #10 // Method scala/A."<init>":(I)V
5: return
LineNumberTable:
line 7: 0
LocalVariableTable:
Start Length Slot Name Signature
0 6 0 this Lscala/B;
0 6 1 id I
private int id$accessor();
Code:
0: aload_0
1: invokespecial #17 // Method scala/A.id:()I
4: ireturn
LineNumberTable:
line 7: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lscala/B;
public java.lang.String toString();
Code:
0: ldc #21 // String %d
2: iconst_1
3: anewarray #23 // class java/lang/Object
6: dup
7: iconst_0
8: aload_0
9: invokespecial #25 // Method id$accessor:()I
12: invokestatic #31 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
15: aastore
16: invokestatic #37 // Method java/lang/String.format:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/String;
19: areturn
LineNumberTable:
line 8: 0
LocalVariableTable:
Start Length Slot Name Signature
0 20 0 this Lscala/B;
}
Things to notice:
- id is only stored in A.
- The Scala compiler extended A with the getter A.id.
- The Scala compiler extended A with B.id$accessor which calls A.id.
- B.toString calls id$accessor.
Concluding, we have two calls to retrieve the id where a getfield would be enough, because A.id is declared as final, so there is no need for precautions for someone overriding id.
The overhead of the addditional method calls is significant: In a four hour benchmark running on four cores, I measured a 10% loss in throughput after changing a few central classes to avoid double storage.
I am aware of another way to avoid double storage:
abstract class A {
val id: Int
override def toString = String.format("%d", id)
}
class B(override val id: Int) extends A {
override def toString = String.format("%d", id)
}
This code is elegant and yields elegant byte code, too. However, in the benchmark mentioned above, the additional virtual method call reduced throughput by 7%.
Is this a shortcoming of the Scala compiler? Does it ignore the final modifier on id?
Or is there a another way to produce bytecode like Java?