Upper case letters in pattern matching

jimka · April 23, 2020, 7:23am

What is the general feeling about using upper case letters in pattern matching? Is this a feature which is seen in retrospect as a mistake, or is it a feature which programmers should exploit to simplify their code?

In the following example, if I take advantage that pattern matching sees S and D as a test for equality rather than a variable to bind, then the code becomes considerably simpler. But to me I’m relying on an obscure feature.

In the first example I have variables src, dst, i, j, and I’m looping to find when src == i and dst==j. In the second example, I just have S and D, and let the pattern matcher do all the work.


  def makeAdj_5(n:Int, edges:Array[(Int,Int)]):Array[Set[Int]] = {
    def connectionsTo(i:Int):Set[Int] = {
      (0 until n).filter {
        j => edges.exists { case (src, dst) => src == i && dst == j }
      }.toSet
    }
    Array.tabulate(n)(connectionsTo)
  }

  def makeAdj_5b(n:Int, edges:Array[(Int,Int)]):Array[Set[Int]] = {
    def connectionsTo(S:Int):Set[Int] = {
      (0 until n).filter {
        D => edges.exists {
          case (S, D) => true
          case _ => false
        }
      }.toSet
    }
    Array.tabulate(n)(connectionsTo)
  }

guilgaly · April 23, 2020, 9:05am

Note that it should also work with backticks :

  def makeAdj_5c(n:Int, edges:Array[(Int,Int)]):Array[Set[Int]] = {
    def connectionsTo(i:Int):Set[Int] = {
      (0 until n).filter {
        j => edges.exists {
          case (`i`, `j`) => true
          case _ => false
        }
      }.toSet
    }
    Array.tabulate(n)(connectionsTo)
  }

I’d say it’s a bit obscure either way, but I think the version with backticks has the advantage of looking a bit weird, so someone who doesn’t know the feature is at least more likely to guess that something is going on here ? Although in practice I’d probably just stick with the first version for something like this.

jducoeur · April 23, 2020, 12:44pm

Huh – hadn’t even occurred to me that that would work.

I think you’re slightly abusing the feature here. As far as I know (certainly the way I’ve always seen it taught), the intent is about matching on a concrete object: it signals based on the capitalization of the first letter because by convention object declarations are capitalized. I’ve never seen it mentioned that this could be used to force an equality check like this, although it’s not entirely surprising that that works.

Personally, I think this reification of convention in pattern matching is a language wart, albeit a deeply-established one – it’s caused me bugs more than a few times, and I think it’s pretty unintuitive. I probably wouldn’t use it like this, because I wouldn’t intuitively see what the code is trying to do. (That is, I would always write your first version, never your second.)

jimka · April 23, 2020, 2:58pm

As I recall from the Coursera Scala course, this was presented as a feature. The pattern matching operation would bind lower case variables, and treat capitalized variables as constants. I remember remarking at that time that that seems obscure, and a bit of a hack.

The use of capitalized variable names seems to be encouraged in the Odersky scala (stair step) book, in Chapter 15, section 15.2, page 276. (I’m using 2nd edition, (c) 2010). The example shown in the book constracts pattern matching E and Pi as constants, vs e and pi as binding variables.

martijnhoekstra · April 23, 2020, 3:14pm

I agree with @jducoeur that it’s a bit of a wart with an ugly mechanism to make scala usually do the right thing, and that intentionally using the ugly mechanism is abusing the feature.

I also seem to remember that there was a linter warning, or at least talk of one, for most dangerous gotcha – that you accidentally and silently bind a fresh variable that shadows a variable in scope rather than matching against its value. If there isn’t, that probably is just wishful thinking.

sangamon · April 23, 2020, 3:19pm

That blends well with the convention that constant names should be upper camel case. But in your example, S and D are not constants. I prefer to use back-ticked name references in these cases, as pointed out by @guilgaly.

Jasper-M · April 24, 2020, 8:27am

I don’t really agree with the apparent consensus here that upper case patterns are a hack or anti-pattern. I find it pretty convenient and also kind of logical, even though I don’t use it that often either. But that’s just because in general it doesn’t happen very often that you have to pattern match on a variable. Which contributes to people not being used to seeing this.

martijnhoekstra · April 24, 2020, 12:47pm

The primary reason for me to consider it a hack is because the match in some part of the code influences the choice of name in a completely unrelated part. At the point of the naming you may not know it’s going to be matched against. The match may not even exists when you name the value. Or the match may be removed at some point. Distinguishing (through naming) between values that should be matched on or shouldn’t be matched on is a concern that doesn’t belong to naming IMO.