Refactoring class hierarchy into ADT

jimka · January 2, 2021, 6:39pm

I agree. finding the right abstraction to make code correct, extensible, maintainable and testable is difficult, and may require several iterations. Often the implementation language provides some tools to make the job easier, sometimes the language’s feature become obstacles for the application to overcome.

jimka · January 3, 2021, 11:14am

Actually, what is the difference between an ADT and a sealed class? Perhaps, I shouldn’t call my construction an ADT, but rather just call it a sealed class? I’m not really relying on its algebraic properties, but only is closed-ness property.

sangamon · January 3, 2021, 3:25pm

ADT is an abstract concept that doesn’t have a direct representation in Scala, sealed classes/traits is a Scala language feature often used (in conjunction with others) to express this concept.

I was referring the difference between class/object (includes behavior, OO dispatch) and data type (no intrinsic behavior, pattern matching dispatch). Given that they imply different approaches to overall code organization, it shouldn’t come as a surprise if they don’t complement each other perfectly within one single “type family”.

jimka · January 4, 2021, 8:58am

Yes, that is my understanding as well. So perhaps I should not refer to my construction as an ADT, but rather simply as a sealed class hierarchy. Referring to it as an ADT will lead to confusing as I’m not depending on any of the other algebraic properties an ADT implies.

BalmungSan · January 4, 2021, 2:46pm

More than what are properties of an ADT, in Scala it is common to split data from logic and sealed traits + case classes are commonly used to model data.

Whereas, you have a more OOP approach of mixing data and computation in the same entity.

jimka · January 11, 2021, 9:42am

I’m not really sure how to think about Boolean algebra in terms of data and logic. There is a very small amount of data. However, the logic is a mixture of mathematical correctness tempered with attempt to aid human understanding, testability, and not too badly performing.

jimka · January 11, 2021, 9:54am

I thought I would report back on this refactoring effort. I managed to refactor my code into a set of abstract classes (could have also been traits I supposed) and a sealed class having a small number of final case classes.

It was an interesting exercise, and I learned quite a lot about Scala’s concept of type-oriented programming. However, in the end, I think I will abandon this branch. There were indeed a few places I found where my pattern matching was non-exhaustive. However, to do this I had to do a lot of extra work, and I think the code is far less readable. Introducing extra classes just to satisfy the compiler, does not seem like an over all win.

One problem that kept happening was the following.
I have one abstract class, SimpleTypeDA, and a sealed class SimpleTypeD which extends it.
I, the programmer know, that every object of type SimpleTypeDA is also of type SimpleTypeD, the compiler does not know this. This mean I have to put in extra cases of either ??? or explicit exception raising, which makes the code obfuscated, or even works I have to pattern match SimpleTypeDA against a singleton case SimpleTypeDA to make the return value be sufficiently specific.

In the end, my conclusion, admittedly from one experiment, is that the Scala restriction that a sealed class can only be extended in a single file is annoying and counter productive. There ought to be some other mechanism such as perhaps allow extension within the same package, or within the same directory. Forcing code into a single file is unfortunate.

It would be great if sealed class hierarchies could be refactored across multiple files.

sangamon · January 11, 2021, 11:08am

Ok, now small files are a problem, as well?

A very simplistic take on an “FP design”:

sealed trait Bool

case object True extends Bool
case object False extends Bool

// can go to another file
def render(b: Bool): String =
  b match {
    case True => "true"
    case False => "false"
  }

// can go to another file
def and(a: Bool, b: Bool): Bool =
  (a, b) match {
    case (True, True) => True
    case (True, False) => False
    case (False, True) => False
    case (False, False) => False
  }

If I understand correctly, you’d rather want something like this:

sealed trait Bool {
  def render(): String
  def and(other: Bool): Bool
}

case object True extends Bool {

  override def render(): String = "true"

  override def and(other: Bool): Bool =
    other match {
      case True => True
      case False => False
    }

}

case object False extends Bool {

  override def render(): String = "false"

  override def and(other: Bool): Bool = False

}

There’s nothing wrong with this, it’s just very likely not the primary use case the language designers had in mind - this would rather be the FP approach, maybe plus providing a drop-in replacement for Java enums (where all the code needs to go into a single file, too, IIRC).

jimka · January 11, 2021, 12:13pm

Several people have asked why the code is so long, why not implement it in a single file, why not separate the data from the program logic, why use object orientation, etc… I’ll try to explain what’s going on in hopes that my choses become clearer.

The code is available freely and publicly if anyone cares to look at it. The code relevant to the Boolean algebra is here. Of course I welcome any constructive feedback on the code.

The application is a simple embeddable type system, a term which is formally defined elsewhere.

Basically, what the code does is to allow the user to designate sets of values at run-time by so-called type designators. There are several ways to designate such a set. In each case the user invokes a constructor of one of the case classes.

SEmpty the empty set
STop the set of all values
SAtomic the set of all values whose class is a subclass of the given class, the given class being an object of type java.lang.class
SCustom the set of all values for which the given Any=>Boolean predicate returns true
SEql the set of values = to the given object
SMember the set of values = to any of the given finite sequence of objects
SAnd the set of objects designated simultaneously by all of the given type designators
SOr the set of objects designated by at least one of the given type designators
SNot the set of objects not designated by the given type designator

The abstract class which is a superclass of these classes is SimpleTypeD standing for simple-type-designator. SimpleTypeD advertises several methods for use by applications:

subtypep: (SimpleTypeD,SimpleTypeD)=>Option[Boolean] – determines whether one type is a subtype of the other, i.e, whether the set designated by the first argument is a subset of the set designated by the second argument, returning yes, no, or dont-know
disjoint: (SimpleTypeD,SimpleTypeD)=>Option[Boolean] – determines whether two sets provably have empty intersection, returning yes, no, or dont-know
inhabited: SimpleTypeD=>Option[Boolean] – indicates whether there is an element of this type, returning yes, no, or dont-know
canonicalize:SimpleTypeD=>SimpleTypeD – computes a SimpleTypeD in one of the three supported canonical forms, disjunctive-normal-form, conjunctive-normal-form, simple-normal-form
typep:(Any,SimpleTypeD)=>Boolean – set membership test

Even if these 5 functions/methods are the ones intended for end users to call, the methods themselves are implemented using lower level methods which must be implemented for all the case classes. E.g., canonicalize is implemented as the fixed point of another method named canonicalizeOnce, and disjoint is a commutative function which is implemented by calling a non-commutative version disjointDown twice, once with arguments reversed.

Some of these methods are quite lengthy—implemented as a series of reduction/simplification rules. This reduction rule management also has some infrastructure, for example for dealing with 3-way logic Some(true), Some(false), None, and for stopping when it finds the correct simplification rule.

Additionally, in some cases there is a 9 by 9 set of tests, for example for testing whether the set designated by a type designator of type A is disjoint from a set designated by a type designator of type B; each of A and B may be any of the 9 leaf level classes. It never really amounts really to 81 cases, because many of the checks are subsumed in others. E.g., SEmpty is disjoint from everything and STop is disjoint with nothing except SEmpty. SEqv is really a special case of SMember, so often I can check simultaneously for SMember and SEqv.

BalmungSan · January 11, 2021, 12:57pm

As I told you the other day, you should make every intermediate step private and forget about it.

You should never accept a SympleTypeA rather a SimpleTypeD as such pattern matching will always be exhaustive.

The same with the implementation traits / abstract classes, those shouldn’t exists you only ask and return the final case classes that extend SimpleTypeD + the implementation trait.

jimka · January 11, 2021, 1:15pm

The problem I found with that suggestion when I tried to implement it is that yes every method paraemter is of type SimpleTypeD, but this is still of type SimpleTypeA, not SimpleTypeD. So every time I need to pattern match this I had this openness issue. What I ended up doing was to declare a new var called self of type SimpleTypeD whose value is a singleton pattern match of this against SimpleTypeD. This feels like a real hack just to satisfy the compiler.

BalmungSan · January 11, 2021, 1:30pm

Ah you are pattern matching inside SimpleTypeA you can just do a cast, or suppress the warning, or you could do this:

abstract class SimpleTypeA(...) { self: SympleTypeD =>
  def foo: Bar = self match {
    // Exhaustive. 
  }
}

sealed trait SimpleTypeD extends SimpleTypeA(...)

If you followed all my other advices it should work without problems.

jimka · January 11, 2021, 1:46pm

Yes that’s what I did. However, as I mentioned above, I found that I ended up doing a lot of work just to satisfy the compiler’s type checker. Every method in SimpleTypeA contained a local function like your foo function above, just so I could tell the compiler the only class I’m interested in is SimpleTypeD. But in fact, that is what the sealed class is supposed to do for me, except that sealed does not work across file boundaries.

In the end the code worked, with only a 50 line code penalty; all my tests passed, and I discovered a couple of bugs in my original code in so doing. However, in the end, I believe all this extra work is not worth it. It is an interesting exercise to help me learn more about type oriented programming à-la Scala, it has advantages and disadvantages, and in my judgement the disadvantages make the code more difficult to understand. I feel like I’m successfully forcing the compiler to do something it does not want to do.

jimka · January 11, 2021, 1:48pm

Again, I don’t understand the issue with private-ness. We spoke about this before, and as I understand, you said in the end that it didn’t really matter. Sorry if I misunderstood you.

I don’t understand the advantage of making it more difficult for someone to extend my code later, unless I have a real reason for doing so.

BalmungSan · January 11, 2021, 1:56pm

Well as I said the other day, your code looks like the code a compiler would generate.

If someday Scala (or some plugin) provides cross-files sealed ADTs that is how I would expect it to implement it.

In any case, not sure what you mean with type oriented programming? Also, and I hope I do not sound rude, that code doesn’t look ala-Scala. It looks like Java + pattern matching; which is not to say it is bad, at the end that is the kind of flexibility Scala provides.

sangamon · January 11, 2021, 2:00pm

I’d expect such a mechanism to be at odds with low level compiler/VM constraints, particularly incremental compilation and the JVM’s late binding/“open world” approach. Source file/compilation unit is likely the coarsest-grained compile-time entity where a “sealed” constraint can reliably be enforced.

jimka · January 11, 2021, 2:04pm

nice to hear. I’m not sure if you’re referring to my attempt to create the artificial hierarchy to satisfy the compiler, or if you’re referring to the shorter version where I just refactored the sealed trait among multiple files, and removed the word sealed from the declaration.

But I can assure you that any resemblance to Java is just circumstantial. As I’m mentioned several times before I don’t know Java, and have never written a single line of Java code. All I know about Java comes from what I’ve learned using Scala and Clojure.

jimka · January 11, 2021, 2:08pm

This is an impression. I probably should indeed think about this to see if I can articulate it more consequently and elegantly.

jimka · January 11, 2021, 2:15pm

Yes I’m quite sure you are right. My suggestion is not that cross-file sealed hierarchies would be easy to implement. And I’m pretty sure that the majority of Scala users expressly DO NOT WANT them.

I do think it is unfortunate that I cannot implement it myself as a Scala user, as some OO systems attempt to do.

BTW, the assumption that the file is the compile-unit is seen also in Clojure. Clojure doesn’t really have classes (at least nobody uses them), but it does have packages. While the language/compiler does not assume that every package corresponds to exactly one file, unfortunately all the development tools do assume this. If you want to use the standard IDEs you have to change package when you change files. This is a nightmare for refactoring.

sangamon · January 11, 2021, 2:17pm

I still don’t see a convincing reason here why this functionality couldn’t be implemented based on a plain ADT and pattern matching instead of OO methods. The latter is fine, of course, but then you’ll just have to live with the constraints of the “sealed” feature - or don’t use “sealed” at all, or go old school and use the Visitor pattern instead.

Note that you’ll find vaguely similar constraints in other languages, as well - e.g. in Haskell you can’t (or shouldn’t) have “orphan instances”, i.e. type class instances for a type defined in another module than the type class or the type itself.