Scala Rage: Pattern-Matching

Sometimes Scala makes me happy. Sometimes Scala makes me angry. This post is about the latter.

def contrivedExample[A, B, C](a: A, b: B, c: C): Unit = a match {
  case b => println("matched b")
  case c => println("matched c")
  case _ => println("matched neither")
}

Somebody learning Scala might reasonably intuit that this method would be logically equivalent to:

def contrivedExample[A, B, C](a: A, b: B, c: C): Unit =
  if (a == b)
    println("matched b")
  else if (a == c)
    println("matched c")
  else
    println("matched neither")

Instead, they get the cryptic compiler error:

error: unreachable code
case c => println("matched c")
                 ^

error: unreachable code
case _ => println("matched neither")
                 ^

This is because the patterns written above are variable patterns: “A variable pattern x is a simple identifier which starts with a lower case letter” (§8.1.1). Variable patterns are irrefutable: they always match, thus subsequent patterns will never be reached. The patterns above are not, as somebody learning Scala might expect, stable identifier patterns, because “a stable identifier pattern may not be a simple name starting with a lower-case letter” (§8.1.5). Hence the following compiles and works as one would expect:

def contrivedExample[A, B, C](A: A, B: B, C: C): Unit = A match {
  case B => println("matched B")
  case C => println("matched C")
  case _ => println("matched neither")
}

First of all, I wonder how many new Scala developers have no freaking idea that the “unreachable code” error is actually telling them this. Also, this contradicts most typical naming conventions. Upper-case method parameter names? No, thanks. Instead, we have to explicitly indicate that the patterns are stable identifier patterns by referencing our lower-case identifiers in backquotes:

def contrivedExample[A, B, C](a: A, b: B, c: C): Unit = a match {
  case `b` => println("matched b")
  case `c` => println("matched c")
  case _ => println("matched neither")
}

As one of the lucky few people in the world right now with the rare privilege of teaching new Scala developers, guess how much time I feel like spending on stupid shit like this. Really, guess how much! Well, here are two hints:

  1. Upon seeing this quirk, students invariably ask questions: Why the arbitrary choice to use upper and lower case to distinguish between stable identifier patterns and variable patterns? Why does Scala permit name shadowing? What’s the special backquote syntax about? I can easily explain that the backquote syntax is a nice, general way of allowing Scala keywords to be used as identifiers (which is important when referencing code built in other JVM languages), so we can reuse that mechanism for disambiguation here… But to the other two questions, I seriously can’t provide justifiable answers.

  2. Teaching any topic, Scala included, involves some degree of “cheerleading,” motivating students to bring a sense of excitement about the topic back to their respective development teams. If they go back to work utterly stoked about Scala, that will clearly benefit more people than if they return with any sense of skepticism. What impression should it leave, if in the middle of introducing pattern matching — one of Scala’s most powerful features — they see this glaring issue?

Yeah. This doesn’t make me happy.


Edit: On Twitter, @missingfaktor asked the very fair question:

How else should that have been done? What do you propose?

A few ideas:

  1. Simply disallow name shadowing, since I think permitting shadowing (§2) is a dangerous idea to begin with, and directly gives rise to this problem. If a case clause could be interpreted ambiguously as either declaring and binding a new variable or referencing an existing one in scope, obviously it must be the latter because the former would shadow the existing binding. The compiler should at a minimum warn on name shadowing (like the erstwhile -Ywarn-shadowing) and issue a more helpful error message than “unreachable code” in the above case.

  2. Use an obvious syntax to distinguish between the two types of pattern. For example, case a might be a variable pattern and case == a might be a stable identifier pattern. Or case a might be a stable identifier pattern, and case val a might be a variable pattern. I don’t know if either of these choices are good, but I like them better from a “principle of least surprise” perspective than what we have currently.

  3. Fix inconsistencies in the spec regarding the different kinds of identifier. For example, §1.1 introduces the terms “plain identifier”, “variable identifier” and “constant identifier.” §8.1.1 and §8.2 introduce the term “simple identifier.” These aren’t all well defined, and aren’t used consistently in the places where they matter. I’d argue that case shouldn’t matter anywhere, since Scala allows code written in most natural languages (i.e. identifiers need not be written in Latin characters) and many natural languages don’t have case in their writing systems. At any rate, going through the process of cleaning this up might bring us to a solution where, for example, variable references in case clauses must always be enclosed in backquotes.

On the other hand, a ludicrous non-starter in my opinion would be to say that upper and lower case matter… It’s amusing at best to observe that case affects case.

Tagged , ,

One thought on “Scala Rage: Pattern-Matching

  1. […] By using the is/to combo and restricting eq to literals and guards, we can get rid of the is, to, and eq keywords, leaving only as. Even with eq gone, we cannot get rid of as without introducing ambiguity. When we write A, does that mean is A or as A. Nevertheless, most of the full-featured pattern matching languages eliminate it, introducing some of the weirdest corner cases in programming history. They parse a lowercase identifier (like name) as a binding and parse an uppercase identifier (like Name) as an instance test. In languages where types must begin with a capital letter, like Haskell and Ceylon, it is not a big deal. However, in languages where this is the only place where capitalization matters, like Scala and F#, it is particularly jarring. […]

Comments are closed.