FP/functional programming - unit testing function composition

Hi there!

I’ve been happily using Scala for a while now in a (mostly) purely functional environment. I use classes to model functions and compose them to create the application workflow.

What I’m consistently wondering is: What is the best way to test function composition in a pure way?

Here’s an attempt at illustrating the issue via a very simple example:

package com.example

import org.scalatest.funsuite.AnyFunSuite
import org.scalatest.matchers.should.Matchers

private final class ExampleTest extends AnyFunSuite with Matchers {

  /*
   At some point we get to a leaf function - these are just there to show
   the general implementation style...
   In the real world it's reasonable to assume that they are more
   complicated than the mock functions we'll use in the tests.
   */
  class A extends (List[String] => String) {
    override def apply(value: List[String]): String = ???
  }

  class B extends (String => Int) {
    override def apply(value: String): Int = ???
  }

  // This is the function that we actually want to test. We use default
  // values for production and supply mocks in tests.
 // This is just an example. In reality, there can be more functions and arity and types usually vary...
  class Composition(
      a: List[String] => String = new A,
      b: String => Int = new B
  ) extends (List[String] => Int) {
    // andThen is an extremely simple example. Assume there is additional logic.
    // The way a, b, ... are combined depends on the domain problem. The important thing is:
    // They are intended to be used in a certain way to solve the problem.
    override def apply(value: List[String]): Int = a.andThen(b)(value)
  }

  test("Composition should apply functions in correct order - v1") {
    /*
    Pure variant - create mock functions in a way that the data flows through
    them and the result shows that each function has been passed in correct 
    order.

    Unfortunately data and mock behavior is usually way more complex in real 
    world applications and mock functions become harder to write and equally 
    hard to understand. 
    In worst case scenarios, mocks start to emulate the production logic :(

    Example: Using one field in a complex object and only changing that 
    field from mock to mock
     */
    val testee = new Composition(a = _.mkString(" "), b = _.length)
    testee.apply("hello" :: "world" :: "!" :: Nil) shouldBe 13
  }

  test("Composition should apply functions in correct order - v2") {
    /*
     Pure variant - but relying on the compiler type checks.

     This way we can radically simplify the mocks and just return stub 
     values, checking that the last stub value is the overall result.

     Assuming that null doesn't exist (we have outlawed the use of null
     in our projects), since all types are distinct, there is only one 
     way the functions can be composed, which is backed by the compiler 
     type checking.

     Is this reasonably safe? Or is this test insufficient?

     Obviously this does not work when the types returned by the 
     functions are not distinct, e.g.
     a: T => T
     b: T => T
     In that example we cannot verify that a has been called at all...
     */
    val testee = new Composition(a = _ => "", b = _ => 42)
    testee.apply("any" :: "input" :: Nil) shouldBe 42
  }

  test("Composition should apply functions in correct order - v3") {
    /*
     Impure variant using closures - manually check that each function 
     has been called with the expected input.

     This is a little more work than the example above but ensures
     everything is connected correctly - even when the types are 
     not distinct. 
     Unfortunately this implementation is impure :(

     Is there a pure way to write this? Or is this a bad idea 
     in general?
     */
    var actualInputA: List[String] = Nil
    var actualInputB: String = ""
    val testee = new Composition(
      a = input => {
        actualInputA = input
        "input-b"
      },
      b = input => {
        actualInputB = input
        42
      }
    )

    val input = "hello" :: "world" :: "!" :: Nil

    testee.apply(input) shouldBe 42
    actualInputA shouldEqual input
    actualInputB shouldEqual "input-b"
  }

}

Has anyone else considered these questions? How do you handle testing composition in a purely functional environment?

Edit: Formatting
Edit2: Added some comments for clarification

Don’t test something that the library authors already tested.

You don’t need to test that andThen works, what you need to test is that your logic does what it needs to do.

3 Likes

I’m not really sure what the actual model/use case is here.

Scala does provide a perfectly fine function algebra/API, why would you want to model composition on top of this? What’s wrong about using Function1#compose() (or Function1#andThen(), or just g(f(...))) directly (and assuming these have been sufficiently tested in the platform build)? Then you could just move on to testing the actual composites, which can be considered black boxes, i.e. test cases should focus on inputs and outputs rather than internal structure.

If you really want to come up with your own composition, why specialize to specific input and output types - why not Composition[-A, B, +C], covering all potential cases?

That being said, if I wanted to go with this approach to composition at all, I’d go with v1 and test the composition laws, using concrete examples as you do, and perhaps pulling in Scalacheck in addition.

3 Likes

Thank you for your replies :slight_smile:

I’m afraid I had simplified my example too much. I’m not trying to re-implement Function1#andThen() - sorry for the confusion!

Composition is a problem domain specific function that solves a problem by using any number of functions, which may also vary in arity. I used the term “composition” because injected functions are combined in a certain way required to solve the problem. That could be actual function composition, path dependent e.g. in a pattern match, or the functions could be applied to one or multiple collections

See below for a real world example.

The goal of the test is to make sure the injected functions are used correctly, with the correct arguments (which may be tricky e.g. when accepting multiple arguments of the same type) and in the correct combination.

That is of course good advice in general! I only test library functions when I’m not sure how they are behaving in edge cases and the documentation provides no clarification. Of course that is rare.

Assuming that Composition really just consists of andThen, my intention with the test would be to prevent some one from accidentally changing the implementation to e.g. Function1#compose() , because that would break the logic.

I suppose that’s the answer right there :slight_smile: It’s just that writing mocks that make a more complex black box work (covering all potential pit falls) is hard :wink: After all, the mocks should be simple and concise so that they can be easily understood and their logic is trivial enough as to not require a test for themselves.
Even in the example above I had to make sure the mocks don’t model an associative combination - otherwise the order in which the functions are used could not be verified…

Here is a real world example for Composition:

private[bizzlogic] final class GetFullReport(
    readFile: String => String = new ReadFile,
    getStyle: () => String = new GetStyle,
    getReportTitle: ExportDataModel => String = new GetReportTitle(),
    getMenu: (ExportDataModel, Locale) => String = new GetMenu(),
    getReport: (ExportDataModel, Locale) => String = new GetReport(),
    readTemplate: (String, List[String]) => String = new ReadTemplate,
) extends (ExportDataModel => String) {

  /**
    * Creates a report in its entirety, including localization, interactive menu, etc.
    * @param exportDataModel the data to be exported
    * @return the parsed full report template
    */
  override def apply(exportDataModel: ExportDataModel): String = 
    readTemplate(
        "full/full-report.html",
        List(
          readFile("plotly-min.js"),
          getStyle(),
          getReportTitle(exportDataModel),
          getMenu(exportDataModel, Locale.ENGLISH),
          getMenu(exportDataModel, Locale.GERMAN),
          getReport(exportDataModel, Locale.ENGLISH),
          getReport(exportDataModel, Locale.GERMAN)
        )
      )

}

This is the respective test, analogue to v2 above:

  test("should add plotly, style, menus and finished reports to template") {
    val testee = new GetFullReport(
      s => s,
      () => "style",
      _ => "title",
      (_, locale) => s"menu_$locale",
      (_, locale) => s"report_$locale",
      (template, report) => s"$template ${report}_translated"
    )

    val expected = "full/full-report.html List(plotly-min.js, style, title, menu_en, menu_de, report_en, report_de)_translated"
    testee.apply(null) shouldBe expected
  }

The rationale for using v2 is: There is only one ExportDataModel in scope and it is arguably unreasonable to assume a developer would summon another one out of thin air. That why mocks 3, 4 and 5 above don’t use the first argument.
If I wanted to go for v1 I would have to provide an ExportDataModel with at least one meaningful value inside, and I’d have to use that value in each of the mocks.
In this specific case, that wouldn’t be hard because the result is a String, but I often have to deal with data objects that contain only numbers and booleans, and that makes creating meaningful return values a lot harder.

Sorry for the wall of text - I hope it clarifies what I’m actually trying to ask, though :wink:

1 Like

I have never really checked out property based testing. I know it’s being propagated in dynamically typed functional languages like Elixir.
I’m sure it works really well for simple input types. But does it make sense when using complex data objects?

When the test requires checking mutiple paths and edge cases, I use TableDrivenPropertyChecks from scala-test. Then I can specify the inputs and possibly expected outputs myself without duplicating the test code. Of course, that is a huge difference to having a library create cases automatically…

You are either still simplifying too much or I simply don’t understand what you mean by that.
Because all I see is just a normal class, with a normal method, with a normal implementation; thus, again, I don’t see why you are so focused on testing composition… it doesn’t make sense, just test the logic.

Again, you should not care about that but rather about this:

So, again, don’t test the composition.
That is an internal implementation detail, don’t test implementation details; doing that would make your test brittle.

Rather, focus on testing the contract of the method; i.e. its logic.

Hello, since you are interested in composition, this may (or may not) be of interest:

this may also be of interest:

I agree, but now in the concrete example, isn’t this what the test is doing…?

The unit under test is GetFullReport#apply(). The test needs to supply the constructor arguments (the “component” functions) and the argument to #apply() itself (the model instance) and assert on the output. The question seems to be about the structure of the constructor argument functions used in the tests.

The style is somewhat unusual, probably it’d be more common to model this as a trait, i.e.

trait GetFullReport extends (ExportDataModel => String) :
  protected def readFile(path: String): String
  protected def getStyle(): String
  protected def getReportTitle: ExportDataModel => String
  // ... 

  override def apply(exportDataModel: ExportDataModel): String =
    // as in the original

…but the approach to testing would be similar, a dummy implementation of the trait replacing the dummy constructor arguments. Am I missing a better approach to modeling or testing?

The initial “compose” question still feels kind of a red herring, though. :slight_smile:

1 Like

Yes, I don’t say no.
But my point is that, in such a case, you are not really testing composition (or rather, you could say every test is testing composition).
So that is my point, you actually don’t care about the real order, all you care about is that given some inputs you get the expected output… be it in a unit-test style or in a property-based testing style.

1 Like

Hmm… :thinking: But GetFullReport#apply() is exclusively about chaining and nesting calls to the constructor argument functions the right way, correct? IOW, it doesn’t (need to) know anything about ExportDataModel specifics. In principle you could just replace this with a generic parameter to GetFullReport and use a simple type like String in the tests.

You might even get more extreme and extend this reasoning to the result type, as well, thus get rid of the requirement to squeeze the captured results into a String in the tests, at the cost of conjuring an appropriate ADT for the capture. The resulting model might look like this:

final class GetFullReport[M, R](
  readFile: String => R,
  getStyle: () => R,
  getReportTitle: M => R,
  getMenu: (M, Locale) => R,
  getReport: (M, Locale) => R,
  readTemplate: (String, List[R]) => R,
) extends (M => R) :
  // ...

Test case:

val testee = new GetFullReport[String, Capture[String]](
  File.apply,
  () => Style,
  Title.apply,
  Menu.apply,
  Report.apply,
  Template.apply
)
val model = "model"
val expected =
  Template(
    templatePath,
    List(
      File(plotlyPath),
      Style,
      Title(model),
      Menu(model, Locale.ENGLISH),
      Menu(model, Locale.GERMAN),
      Report(model, Locale.ENGLISH),
      Report(model, Locale.GERMAN),
    )
  )
testee(model).shouldBe(expected)

(The same could be applied to the trait-based model, of course.)

I’m still not sure whether I really understood your design approach and your question, so this may make no sense at all. :slight_smile:

1 Like

Generators work quite well. Of course it requires some effort to write (and fine tune) them, but hard coding example instances of complex data types for unit tests requires effort, too, and it’s usually more boring. :slight_smile:

That being said, I’d consider property based tests as complement to classical unit tests rather than a replacement.

2 Likes

Agreed. Classic tests are good for represented known, usually “normal” examples, as well as regression tests for failure modes that you’re aware of. Property tests are good for teasing out the unknown-unknowns: a good Generator will automatically build weird edge cases with negative numbers, empty Strings, weird Unicode, and things like that, helping you make sure that you’ve properly covered your bases.

2 Likes

Au contraire - you are spot on :slight_smile:

That is a very interesting idea and it would solve the exact problem I have! Awesome :smiley: Thank you!

I’ll have to look at some real-life examples - in some cases there is additional logic involved that does require knowledge of the concrete types. But in those cases I might still be able to make some of the types generic or simply extract another function that encapsulates the logic that depends on the concrete type.

At the very least this approach would help with some of the most awkward tests :wink:

Type inference should also make the new generic functions easy to use since the concrete types should be inferred from the use site, e.g. in the example above:
getFullReport: ExportDataModel => String = new GetFullReport

Sorry for the confusion! I had tried to come up with a simple example but it ended up using a wrong term (function composition being way too specific) and then even using an example that matches that term :man_facepalming:

Thank you for your suggestions! The wording of my original question was too specific - it’s not about function composition, I guess. But there’s always something to learn (for instance, I didn’t know there’s a mathematical symbol for forward composition) so I’ll gladly check out your suggestions :wink:

1 Like

Interesting! I’ll add it to my to-do list :wink: And I’ll start thinking about where it might be useful in my current projects :slight_smile:

…but in these cases you very likely wouldn’t have got away with passing null/ignoring the passed model value in the first place.

1 Like

That is true :wink: Thanks again for your help! :slight_smile: