Chapter 4. Pattern Matching

Scala’s pattern matching provides deep inspection and decomposition of objects in a variety of ways. It’s one of my favorite features in Scala. For your own types, you can follow a protocol that allows you to control the visibility of internal state and how to expose it to users. The terms extraction and destructuring are sometimes used for this capability.

Pattern matching can be used in several code contexts, as we’ve already seen in “A Sample Application” and “Partial Functions”. We’ll start with a change in Scala 3 for better type safety, followed by a quick tour of common and straightforward usage examples, then explore more advanced scenarios. We’ll cover a few more pattern-matching features in later chapters, once we have the background to understand them.

Safer Pattern Matching with Matchable

Let’s begin with an important change in Scala 3’s type system that is designed to make compile-time checking of pattern-matching expressions more robust.

Scala 3 introduced an immutable wrapper around Arrays called scala.IArray. Arrays in Java are mutable, so this is intended as a safer way to work with them. In fact, IArray is a type alias for Array to avoid the overhead of wrapping arrays, which means that pattern matching introduces a hole in the abstraction. Using the Scala 3.0 REPL without the -source:future setting, observe the following:

// src/script/scala/progscala3/patternmatching/Matchable.scala
scala> val iarray = IArray(1,2,3,4,5)
     | iarray match
     |   case a: Array[Int] => a(2) = 300 // Scala 3 warning!!
     | println(iarray)
val iarray: opaques.IArray[Int] = Array(1, 2, 300, 4, 5)

There are other examples where this can occur. To close this loophole, The Scala type system now has a trait called Matchable. It fits into the type hierarchy as follows:

abstract class Any:
  def isInstanceOf
  def getClass
  def asInstanceOf      // Cast to a new type: myAny.asInstanceOf[String]
  def ==
  def !=
  def ##   // Alias for hashCode
  def equals
  def hashCode
  def toString

trait Matchable extends Any

class AnyVal extends Any, Matchable

class AnyRef extends Any, Matchable

Note that Matchable is a marker trait, as it currently has no members. However, a future release of Scala may move getClass and isInstanceOf to Matchable, as they are closely associated with pattern matching.

The intent is that pattern matching can only occur on values of type Matchable, not Any. Since almost all types are subtypes of AnyRef and AnyVal, they already satisfy this constraint, but attempting to pattern match on the following types will trigger warnings in future Scala 3 releases or when using -source:future with Scala 3.0:

Type Any. Use Matchable instead, when possible.
Type parameters and abstract types without bounds. Add <: Matchable.
Type parameters and abstract types bounded only by universal traits. Add <: Matchable.

We’ll discuss universal traits in “Value Classes”. We can ignore them for now. As an example of the second bullet, consider the following method definition in a REPL session with the -source:future flag restored:

scala> def examine[T](seq: Seq[T]): Seq[String] = seq map {
     |   case i: Int => s"Int: $i"
     |   case other => s"Other: $other"
     | }
2 |  case i: Int => s"Int: $i"
  |          ^^^
  |          pattern selector should be an instance of Matchable,
  |          but it has unmatchable type T instead

Now the type parameter T needs a bound:

scala> def examine[T <: Matchable](seq: Seq[T]): Seq[String] = seq map {
     |   case i: Int => s"Int: $i"
     |   case other => s"Other: $other"
     | }
def examine[T <: Matchable](seq: Seq[T]): Seq[String]

scala> val seq = Seq(1, "two", 3, 4.4)
     | examine(seq)
val seq: Seq[Matchable] = List(1, two, 3, 4.4)
val res0: Seq[String] = List(Int: 1, Other: two, Int: 3, Other: 4.4)

Notice the inferred common supertype of the values in the sequence, seq. In Scala 2, it would be Any.

Back to IArray, the example at the beginning now triggers a warning because the IArray alias is not bounded by Matchable:

scala> val iarray = IArray(1,2,3,4,5)
     | iarray match
     |   case a: Array[Int] => a(2) = 300
     |
3 |  case a: Array[Int] => a(2) = 300
  |          ^^^^^^^^^^
  |          pattern selector should be an instance of Matchable,
  |          but it has unmatchable type opaques.IArray[Int] instead

IArray is considered an abstract type by the compiler. Abstract types are not bounded by Matchable, which is why we now get the warning we want.

This is a significant change that will break a lot of existing code. Hence, warnings will only be issued starting in a future Scala 3 release or when compiling with -source:future.

Values, Variables, and Types in Matches

Let’s cover several kinds of matches. The following example matches on specific values, all values of specific types, and it shows one way of writing a default clause that matches anything:

// src/script/scala/progscala3/patternmatching/MatchVariable.scala

val seq = Seq(1, 2, 3.14, 5.5F, "one", "four", true, (6, 7))    
val result = seq.map {
  case 1                   => "int 1"                           
  case i: Int              => s"other int: $i"
  case d: (Double | Float) => s"a double or float: $d"          
  case "one"               => "string one"                      
  case s: String           => s"other string: $s"
  case (x, y)              => s"tuple: ($x, $y)"                
  case unexpected          => s"unexpected value: $unexpected"  
}
assert(result == Seq(
  "int 1", "other int: 2",
  "a double or float: 3.14", "a double or float: 5.5",
  "string one", "other string: four",
  "unexpected value: true",
  "tuple: (6, 7)"))

: Because of the mix of values, seq is of type Seq[Matchable].
: If one or more case clauses specify particular values of a type, they need to occur before more general clauses that just match on the type. So we first check if the anonymous value is an Int equal to 1. If so, we simply return the string "int 1". If the value is another Int value, the next clause matches. In this case, the value is cast to Int and assigned to the variable i, which is used to construct a string.
: Match on any Double or Float value. Using | is convenient when two or more cases are handled the same way. However, for this to work, the logic after the => must be type compatible for all matched types. In this case, the interpolated string works fine.
: Two case clauses for strings.
: Match on a two-element tuple where the elements are of any type, and extract the elements into the variables x and y.
: Match all other inputs. The variable unexpected has an arbitrary name. Because no type declaration is given, Matchable is inferred. This functions as the default clause. The Boolean value from the sequence seq is assigned to unexpected.

We passed a partial function to Seq.map(). Recall that the literal syntax requires case statements, and we have put the partial function inside parentheses or braces to pass it to map. However, this function is effectively total, because the last clause matches any Matchable. (It would be Any in Scala 2.) This means it wouldn’t match instances of the few other types that aren’t Matchables, like IArray, but these types are no longer candidates for pattern matching. From now on, I’ll just call partial functions like this total.

Don’t use clauses with specific floating-point literal values because matching on floating-point literals is a bad idea. Rounding errors mean two values that you might expect to be the same may actually differ.

Matches are eager, so more specific clauses must appear before less specific clauses. Otherwise, the more specific clauses will never get the chance to match. So the clauses matching on particular values of types must come before clauses matching on the type (i.e., on any value of the type). The default clause shown must be the last one. Fortunately, the compiler will issue an “Unreachable case” warning if you make this mistake. Try switching the two Int clauses to see what happens.

Match clauses are expressions, so they return a value. In this example, all clauses return strings, so the return type of the match expression (and the partial function) is String. Hence, the return type of the map call is List[String]. The compiler infers the least upper bound, the closest supertype, for the types of values returned by all the case clauses.

This is a contrived example, of course. When designing pattern-matching expressions, be wary of relying on a default case clause. Under what circumstances would “none of the above” be the correct answer? It may indicate that your design could be refined so you know more precisely all the possible matches that might occur, like a sealed type hierarchy or enum, which we’ll discuss further. In fact, as we go through this chapter, you’ll see more realistic scenarios and no default clauses.

Here is a similar example that passes an anonymous function to map, rather than a partial function, plus some other changes:

// src/script/scala/progscala3/patternmatching/MatchVariable2.scala

val seq2 = Seq(1, 2, 3.14, "one", (6, 7))
val result2 = seq2.map { x => x match
  case _: Int  => s"int: $x"                                    
  case _       => s"unexpected value: $x"                       
}
assert(result2 == Seq(
  "int: 1", "int: 2", "unexpected value: 3.14",
  "unexpected value: one", "unexpected value: (6,7)"))

: Use _ for the variable name, meaning we don’t capture it.
: Catch-all clause that also uses x instead of capturing to a new variable.

The first case clause doesn’t need to capture the variable because it doesn’t exploit the fact that the value is an Int. For example, it doesn’t call Int methods. Otherwise, just using x wouldn’t be sufficient, as it has type Matchable.

Once again, braces are used around the whole anonymous function, but the optional braces syntax is used inside the function for the match expression. In general, using a partial function is more concise because we eliminate the need for x => x match.

Tip

When you use pattern matching with any of the collection methods, like map and foreach, use a partial function.

There are a few rules and gotchas to keep in mind for case clauses. The compiler assumes that a term that begins with a lowercase letter is the name of a variable that will hold a matched value. If the term starts with a capital letter, it will expect to find a definition already in scope.

This lowercase rule can cause surprises, as shown in the following example. The intention is to pass some value to a method, then see if that value matches an element in the collection:

// src/script/scala/progscala3/patternmatching/MatchSurprise.scala

def checkYBad(y: Int): Seq[String] =
  for x <- Seq(99, 100, 101)
  yield x match
    case y => "found y!"
    case i: Int => "int: "+i  // Unreachable case!

The first case clause is supposed to match on the value passed in as y, but this is what we actually get:

def checkBad(y: Int): Seq[String]
10 |      case i: Int => "int: "+i  // Unreachable case!
   |           ^^^^^^
   |           Unreachable case

We treat warnings as errors in our built.sbt settings, but if we didn’t, then calling checkY(100) would return found y! for all three numbers.

The case y clause means “match anything because there is no type declaration, and assign it to this new variable named y.” The y in the clause is not interpreted as a reference to the method parameter y. Rather, it shadows that definition. Hence, this clause is actually a default, match-all clause and we will never reach the second case clause.

There are two solutions. First, we could use capital Y, although it looks odd to have a method parameter start with a capital letter:

def checkYGood1(Y: Int): Seq[String] =
  for x <- Seq(99, 100, 101)
  yield x match
    case Y => "found y!"
    case i: Int => "int: "+i

Calling checkYGood1(100) returns List(int: 99, found y!, int: 101).

The second solution is to use backticks to indicate we really want to match against the value held by y:

def checkYGood2(y: Int): Seq[String] =
  for x <- Seq(99, 100, 101)
  yield x match
    case `y` => "found y!"
    case i: Int => "int: "+i

Warning

In case clauses, a term that begins with a lowercase letter is assumed to be the name of a new variable that will hold an extracted value. To refer to a previously defined variable, enclose it in backticks or start the name with a capital letter.

Finally, most match expressions should be exhaustive:

// src/script/scala/progscala3/patternmatching/MatchExhaustive.scala

scala> val seq3 = Seq(Some(1), None, Some(2), None)
val seq3: Seq[Option[Int]] = List(Some(1), None, Some(2), None)

scala> val result3 = seq3.map {
     |   case Some(i)  => s"int $i"
     | }
5 |  case Some(i)  => s"int $i"
  |  ^
  |  match may not be exhaustive.
  |
  |  It would fail on pattern case: None

The compiler knows that the elements of seq3 are of type Option[Int], which could include None elements. At runtime, a MatchError will be thrown if a None is encountered. The fix is straightforward:

// src/script/scala/progscala3/patternmatching/MatchExhaustiveFix.scala

scala> val result3 = seq3.map {
     |   case Some(i)  => s"int $i"
     |   case None     => ""
     | }
val result3: Seq[String] = List(int 1, "", int 2, "")

“Problems in Pattern Bindings” will discuss additional points about exhaustive matching.

Matching on Sequences

Let’s examine the classic idiom for iterating through a Seq using pattern matching and recursion and, along the way, learn some useful fundamentals about sequences:

// src/script/scala/progscala3/patternmatching/MatchSeq.scala

def seqToString[T](seq: Seq[T]): String = seq match                  
  case head +: tail => s"($head +: ${seqToString(tail)})"            
  case Nil => "Nil"

: Define a recursive method that constructs a String from a Seq[T] for some type T, which will be inferred from the sequence passed in. The body is a single match expression.
: There are two match clauses and they are exhaustive. The first matches on any nonempty Seq, extracting the first element as head and the rest of the Seq as tail. These are common names for the parts of a Seq, which has head and tail methods. However, here these terms are used as variable names. The body of the clause constructs a String with the head followed by +: followed by the result of calling seqToString on the tail, all surrounded by parentheses, (). Note this method is recursive, but not tail recursive.
: The only other possible case is an empty Seq. We can use the special case object for an empty List, Nil, to match all the empty cases. This clause terminates the recursion. Note that any type of Seq can always be interpreted as terminating with a Nil, or we could use an empty instance of the actual type (examples follow).

The operator +: is the cons (construction) operator for sequences. Recall that methods that end with a colon (:) bind to the right, toward the Seq tail. However, +: in this case clause is actually an object named +:, so we have a nice syntax symmetry between construction of sequences, like val seq = 1 +: 2 +: Nil, and deconstruction, like case 1 +: 2 +: Nil =>…. We’ll see later in this chapter how an object is used to implement deconstruction.

These two clauses are mutually exclusive, so they could be written with the Nil clause first.

Now let’s try it with various empty and nonempty sequences:

scala> seqToString(Seq(1, 2, 3))
     | seqToString(Seq.empty[Int])
val res0: String = (1 +: (2 +: (3 +: Nil)))
val res1: String = Nil

scala> seqToString(Vector(1, 2, 3))
     | seqToString(Vector.empty[Int])
val res2: String = (1 +: (2 +: (3 +: Nil)))
val res3: String = Nil

scala> seqToString(Map("one" -> 1, "two" -> 2, "three" -> 3).toSeq)
     | seqToString(Map.empty[String,Int].toSeq)
val res4: String = ((one,1) +: ((two,2) +: ((three,3) +: Nil)))
val res5: String = Nil

Note the common idiom for constructing an empty collection, like Vector.empty[Int]. The empty methods are in the companion objects.

Map is not a subtype of Seq because it doesn’t guarantee a particular order when you iterate over it. Calling Map.toSeq creates a sequence of key-value tuples that happen to be in insertion order, which is a side effect of the implementation for small Maps and not true for arbitrary maps. The nonempty Map output shows parentheses from the tuples as well as the parentheses added by seqToString.

Note the output for the nonempty Seq (actually List) and Vector. They show the hierarchical structure implied by a linked list, with a head and a tail:

(1 +: (2 +: (3 +: Nil)))

So we process sequences with just two case clauses and recursion. This implies something fundamental about all sequences: they are either empty or not. That sounds trite, but once you recognize fundamental structural patterns like this, it gives you a surprisingly general tool for “divide and conquer.” The idiom used by processSeq is widely reusable.

To demonstrate the construction versus destruction symmetry, we can copy and paste the output of the previous examples to reconstruct the original objects. However, we have to add quotes around strings:

scala> val is = (1 +: (2 +: (3 +: Nil)))
val is: List[Int] = List(1, 2, 3)

scala> val kvs = (("one",1) +: (("two",2) +: (("three",3) +: Nil)))
val kvs: List[(String, Int)] = List((one,1), (two,2), (three,3))

scala> val map = Map(kvs*)
val map: Map[String, Int] = Map(one -> 1, two -> 2, three -> 3)

The Map.apply method expects a repeated parameter list of two-element tuples. In order to use the sequence kvs, we use the * idiom so the compiler converts the sequence to a repeated parameter list.

Try removing the parentheses that we added in the preceding string output.

For completeness, there is an analog of +: that can be used to process the sequence elements in reverse, :+:

// src/script/scala/progscala3/patternmatching/MatchReverseSeq.scala

scala> def reverseSeqToString[T](l: Seq[T]): String = l match
     |   case prefix :+ end => s"(${reverseSeqToString(prefix)} :+ $end)"
     |   case Nil => "Nil"

scala> reverseSeqToString(Vector(1, 2, 3, 4, 5))
val res6: String = (((((Nil :+ 1) :+ 2) :+ 3) :+ 4) :+ 5)

Note that Nil comes first this time in the output. A Vector is used for the input sequence to remind you that accessing a nonhead element is O(1) for a Vector, but O(N) for a List of size N! Hence, reverseSeqToString is O(N) for a Vector of size N and O(N²) for a List of size N!

As before, you could use this output to reconstruct the collection :

scala> val revList1 = (((((Nil :+ 1) :+ 2) :+ 3) :+ 4) :+ 5)
val revList1: List[Int] = List(1, 2, 3, 4, 5)       // but List is returned!

scala> val revList2 = Nil :+ 1 :+ 2 :+ 3 :+ 4 :+ 5  // unnecessary () removed
val revList2: List[Int] = List(1, 2, 3, 4, 5)

scala> val revList3 = Vector.empty[Int] :+ 1 :+ 2 :+ 3 :+ 4 :+ 5
val revList3: Vector[Int] = Vector(1, 2, 3, 4, 5)   // how to get a Vector

Pattern Matching on Repeated Parameters

Speaking of repeated parameter lists, you can also use them in pattern matching:

// src/script/scala/progscala3/patternmatching/MatchRepeatedParams.scala

scala> def matchThree(seq: Seq[Int]) = seq match
     |   case Seq(h1, h2, rest*) =>    // same as h1 +: h2 +: rest => ...
     |     println(s"head 1 = $h1, head 2 = $h2, the rest = $rest")
     |   case _ => println(s"Other! $seq")

scala> matchThree(Seq(1,2,3,4))
     | matchThree(Seq(1,2,3))
     | matchThree(Seq(1,2))
     | matchThree(Seq(1))
head 1 = 1, head 2 = 2, the rest = List(3, 4)
head 1 = 1, head 2 = 2, the rest = List(3)
head 1 = 1, head 2 = 2, the rest = List()
Other! List(1)

We see another way to match on sequences. If we don’t need rest, we can use the placeholder, _, that is case Seq(h1, h2, _*). In Scala 2, rest* was written rest @ _*. The Scala 3 syntax is more consistent with other uses of repeated parameters.

Matching on Tuples

Tuples are also easy to match on, using their literal syntax:

// src/script/scala/progscala3/patternmatching/MatchTuple.scala

val langs = Seq(
  ("Scala",   "Martin", "Odersky"),
  ("Clojure", "Rich",   "Hickey"),
  ("Lisp",    "John",   "McCarthy"))

val results = langs.map {
  case ("Scala", _, _) => "Scala"                               
  case (lang, first, last) => s"$lang, creator $first $last"    
}

: Match a three-element tuple where the first element is the string “Scala” and we ignore the second and third arguments.
: Match any three-element tuple, where the elements could be any type, but they are inferred to be Strings due to the input langs. Extract the elements into variables lang, first, and last.

A tuple can be taken apart into its constituent elements. We can match on literal values within the tuple, at any positions we want, and we can ignore elements we don’t care about.

In Scala 3, tuples have enhanced features to make them more like linked lists, but where the specific type of each element is preserved. Compare the following example to the preceding implementation of seqToString, where *: replaces +: as the operator:

scala> langs.map {
     |   case "Scala" *: first *: last *: EmptyTuple =>
     |     s"Scala -> $first -> $last"
     |   case lang *: rest => s"$lang -> $rest"
     | }
val res0: Seq[String] = List(Scala -> Martin -> Odersky,
 Clojure -> (Rich,Hickey), Lisp -> (John,McCarthy))

The analog of Nil for tuples is EmptyTuple. The second case clause can handle any tuple with one or more elements. Let’s create a new list by prepending EmptyTuple itself and a one-element tuple:

scala> val l2 = EmptyTuple +: ("Indo-European" *: EmptyTuple) +: langs
val l2: Seq[Tuple] = List((), (Indo-European,), (Scala,Martin,Odersky),
 (Clojure,Rich,Hickey), (Lisp,John,McCarthy))

scala> l2.map {
     |   case "Scala" *: first *: last *: EmptyTuple =>
     |     s"Scala -> $first -> $last"
     |   case lang *: rest => s"$lang -> $rest"
     |   case EmptyTuple => EmptyTuple.toString
     | }
val res1: Seq[String] = List((), Indo-European -> (),
 Scala -> Martin -> Odersky, Clojure -> (Rich,Hickey), Lisp -> (John,McCarthy))

You might think that ("Indo-European") would be enough to construct a one-element tuple, but the compiler just interprets the parentheses as unnecessary wrappers around the string! ("Indo-European" *: EmptyTuple) does the trick.

Just as we can construct pairs (two-element tuples) with ->, we can deconstruct them that way too:

// src/script/scala/progscala3/patternmatching/MatchPair.scala

val langs2 = Seq("Scala" -> "Odersky", "Clojure" -> "Hickey")

val results = langs2.map {
  case "Scala" -> _ => "Scala"                           
  case lang -> last => s"$lang: $last"                   
}
assert(results == Seq("Scala", "Clojure: Hickey"))

: Match on a tuple with the string “Scala” as the first element and anything as the second element.
: Match on any other, two-element tuple.

Recall that I said +: in patterns is actually an object in the scala.collection package. Similarly, there is an *: object and a type alias for -> to Tuple2.type (effectively the companion object for the Tuple2 case class) in the scala package.

Parameter Untupling

Consider this example using tuples:

// src/script/scala/progscala3/patternmatching/ParameterUntupling.scala

val tuples = Seq((1,2,3), (4,5,6), (7,8,9))
val counts1 = tuples.map {    // result: List(6, 15, 24)
  case (x, y, z) => x + y + z
}

A disadvantage of the case syntax inside the anonymous function is the implication that it’s not exhaustive, when we know it is for the tuples sequence. It is also a bit inconvenient to add case. Scala 3 introduces parameter untupling that simplifies special cases like this. We can drop the case keyword:

val counts2 = tuples.map {
  (x, y, z) => x + y + z
}

We can even use anonymous variables:

val counts3 = tuples.map(_+_+_)

However, this untupling only works for one level of decomposition:

scala> val tuples2 = Seq((1,(2,3)), (4,(5,6)), (7,(8,9)))
     | val counts2b = tuples2.map {
     |   (x, (y, z)) => x + y + z
     | }
     |
3 |  (x, (y, z)) => x + y + z
  |      ^^^^^^
  |      not a legal formal parameter

Use case for such, uh, cases.

Guards in Case Clauses

Matching on literal values is very useful, but sometimes you need a little additional logic:

// src/script/scala/progscala3/patternmatching/MatchGuard.scala

val results = Seq(1,2,3,4).map {
  case e if e%2 == 0 => s"even: $e"                          
  case o             => s"odd:  $o"                          
}
assert(results == Seq("odd:  1", "even: 2", "odd:  3", "even: 4"))

: Match only if e is even.
: Match the only other possibility, that o is odd.

Note that we didn’t need parentheses around the condition in the if expression, just as we don’t need them in for comprehensions. In Scala 2, this was true for guard clause syntax too.

Matching on Case Classes and Enums

It’s no coincidence that the same case keyword is used for declaring special classes and for case expressions in match expressions. The features of case classes were designed to enable convenient pattern matching. The compiler implements pattern matching and extraction for us. We can use it with nested objects, and we can bind variables at any level of the extraction, which we are seeing for the first time now:

// src/script/scala/progscala3/patternmatching/MatchDeep.scala

case class Address(street: String, city: String)
case class Person(name: String, age: Int, address: Address)

val alice   = Person("Alice",   25, Address("1 Scala Lane", "Chicago"))
val bob     = Person("Bob",     29, Address("2 Java Ave.",  "Miami"))
val charlie = Person("Charlie", 32, Address("3 Python Ct.", "Boston"))

val results = Seq(alice, bob, charlie).map {
  case p @ Person("Alice", age, a @ Address(_, "Chicago")) =>      
    s"Hi Alice! $p"
  case p @ Person("Bob", 29, a @ Address(street, city)) =>         
    s"Hi ${p.name}! age ${p.age}, in ${a}"
  case p @ Person(name, age, Address(street, city)) =>             
    s"Who are you, $name (age: $age, city = $city)?"
}
assert(results == Seq(
  "Hi Alice! Person(Alice,25,Address(1 Scala Lane,Chicago))",
  "Hi Bob! age 29, in Address(2 Java Ave.,Miami)",
  "Who are you, Charlie (age: 32, city = Boston)?"))

: Match on any person named “Alice”, of any age at any street address in Chicago. Use p @ to bind variable p to the whole Person, while also extracting fields inside the instance, in this case age. Similarly, use a @ to bind a to the whole Address while also binding street and city inside the Address.
: Match on any person named “Bob”, age 29 at any street and city. Bind p the whole Person instance and a to the nested Address instance.
: Match on any person, binding p to the Person instance and name, age, street, and city to the nested fields.

If you aren’t extracting fields from the Person instance, we can just write p: Person => …

This nested matching can go arbitrarily deep. Consider this example that revisits the enum Tree[T] algebraic data type from “Enumerations and Algebraic Data Types”. Recall the enum definition, which also supports “automatic” pattern matching:

// src/main/scala/progscala3/patternmatching/MatchTreeADTEnum.scala
package progscala3.patternmatching

enum Tree[T]:
  case Branch(left: Tree[T], right: Tree[T])
  case Leaf(elem: T)

Here we do deep matching on particular structures:

// src/script/scala/progscala3/patternmatching/MatchTreeADTDeep.scala
import progscala3.patternmatching.Tree
import Tree.{Branch, Leaf}

val tree1 = Branch(
  Branch(Leaf(1), Leaf(2)),
  Branch(Leaf(3), Branch(Leaf(4), Leaf(5))))
val tree2 = Branch(Leaf(6), Leaf(7))

for t <- Seq(tree1, tree2, Leaf(8))
yield t match
  case Branch(
    l @ Branch(_,_),
    r @ Branch(rl @ Leaf(rli), rr @ Branch(_,_))) =>
      s"l=$l, r=$r, rl=$rl, rli=$rli, rr=$rr"
  case Branch(l, r) => s"Other Branch($l, $r)"
  case Leaf(x) => s"Other Leaf($x)"

The same extraction could be done for the alternative version we defined using a sealed class hierarchy in the original example. We’ll try it in “Sealed Hierarchies and Exhaustive Matches”.

The last two case clauses are relatively easy to understand. The first one is highly tuned to match tree1, although it uses _ to ignore some parts of the tree. In particular, note that it isn’t sufficient to write l @ Branch. We need to write l @ Branch(_,_). Try removing the (_,_) here and you’ll notice the first case no longer matches tree1, without any obvious explanation.

Warning

If a nested pattern match expression doesn’t match when you think it should, make sure that you capture the full structure, like l @ Branch(_,_) instead of l @ Branch.

It’s worth experimenting with this example to capture different parts of the trees, so you develop an intuition about what works, what doesn’t, and how to debug match expressions.

Here’s an example using tuples. Imagine we have a sequence of (String,Double) tuples for the names and prices of items in a store, and we want to print them with their index. The Seq.zipWithIndex method is handy here:

// src/script/scala/progscala3/patternmatching/MatchDeepTuple.scala

val itemsCosts = Seq(("Pencil", 0.52), ("Paper", 1.35), ("Notebook", 2.43))

val results = itemsCosts.zipWithIndex.map {
  case ((item, cost), index) => s"$index: $item costs $cost each"
}
assert(results == Seq(
  "0: Pencil costs 0.52 each",
  "1: Paper costs 1.35 each",
  "2: Notebook costs 2.43 each"))

Note that zipWithIndex returns a sequence of tuples of the form (element, index), or ((name, cost), index) in this case. We matched on this form to extract the three elements and construct a string with them. I write code like this a lot.

Matching on Regular Expressions

Regular expressions (or regexes) are convenient for extracting data from strings that have a particular structure. Here is an example:

// src/script/scala/progscala3/patternmatching/MatchRegex.scala

val BookExtractorRE = """Book: title=([^,]+),\s+author=(.+)""".r     
val MagazineExtractorRE = """Magazine: title=([^,]+),\s+issue=(.+)""".r

val catalog = Seq(
  "Book: title=Programming Scala Third Edition, author=Dean Wampler",
  "Magazine: title=The New Yorker, issue=January 2021",
  "Unknown: text=Who put this here??"
)

val results = catalog.map {
  case BookExtractorRE(title, author) =>                             
    s"""Book "$title", written by $author"""
  case MagazineExtractorRE(title, issue) =>
    s"""Magazine "$title", issue $issue"""
  case entry => s"Unrecognized entry: $entry"
}
assert(results == Seq(
  """Book "Programming Scala Third Edition", written by Dean Wampler""",
  """Magazine "The New Yorker", issue January 2021""",
  "Unrecognized entry: Unknown: text=Who put this here??"))

: Match a book string, with two capture groups (note the parentheses), one for the title and one for the author. Calling the r method on a string creates a regex from it. Also match a magazine string, with capture groups for the title and issue (date).
: Use the regular expressions much like using case classes, where the string matched by each capture group is assigned to a variable.

Because regexes use backslashes for constructs beyond the normal ASCII control characters, you should either use triple-quoted strings for them, as shown, or use raw interpolated strings, such as raw"foo\sbar".r. Otherwise, you must escape these backslashes; for example "foo\\sbar".r. You can also define regular expressions by creating new instances of the Regex class, as in new Regex("""\W+""").

Warning

Using interpolation in triple-quoted strings doesn’t work cleanly for the regex escape sequences. You still need to escape these sequences (e.g., s"""$first\\s+$second""".r instead of s"""$first\s+$second""".r). If you aren’t using interpolation, escaping isn’t necessary.

scala.util.matching.Regex defines several methods for other manipulations, such as finding and replacing matches.

Matching on Interpolated Strings

If you know the strings have an exact format, such as a precise number of spaces, you can even use interpolated strings for pattern matching. Let’s reuse the catalog:

// src/script/scala/progscala3/patternmatching/MatchInterpolatedString.scala

val results = catalog.map {
  case s"""Book: title=$t, author=$a""" => ("Book" -> (t -> a))
  case s"""Magazine: title=$t, issue=$d""" => ("Magazine" -> (t -> d))
  case item => ("Unrecognized", item)
}
assert(results == Seq(
  ("Book", ("Programming Scala Third Edition", "Dean Wampler")),
  ("Magazine", ("The New Yorker", "January 2020")),
  ("Unrecognized", "Unknown: text=Who put this here??")))

Sealed Hierarchies and Exhaustive Matches

Let’s revisit the need for exhaustive matches and consider the situation where we have an enum or the equivalent sealed class hierarchy.

First, let’s use the enum Tree[T] definition from earlier. We can pattern match on the leafs and branches knowing we’ll never be surprised to see something else:

// src/script/scala/progscala3/patternmatching/MatchTreeADTExhaustive.scala
import progscala3.patternmatching.Tree
import Tree.{Branch, Leaf}

val enumSeq: Seq[Tree[Int]] = Seq(Leaf(0), Branch(Leaf(6), Leaf(7)))
val tree1 = for t <- enumSeq yield t match
  case Branch(left, right) => (left, right)
  case Leaf(value) => value
assert(tree1 == List(0, (Leaf(6),Leaf(7))))

Because it’s not possible for a user of Tree to add another case to the enum, these match expressions can never break. They will always remain exhaustive.

As an exercise, change the case Branch to recurse on left and right (you’ll need to define a method), then use a deeper tree example.

Let’s try a corresponding sealed hierarchy:

// src/main/scala/progscala3/patternmatching/MatchTreeADTSealed.scala
package progscala3.patternmatching

sealed trait STree[T]               // "S" for "sealed"
case class SBranch[T](left: STree[T], right: STree[T]) extends STree[T]
case class SLeaf[T](elem: T) extends STree[T]

The match code is essentially identical:

import progscala3.patternmatching.{STree, SBranch, SLeaf}

val sealedSeq: Seq[STree[Int]] = Seq(SLeaf(0), SBranch(SLeaf(6), SLeaf(7)))
val tree2 = for t <- sealedSeq yield t match
  case SBranch(left, right) => (left, right)
  case SLeaf(value) => value
assert(tree2 == List(0, (SLeaf(6),SLeaf(7))))

A corollary is to avoid using sealed hierarchies and enums when the type hierarchy needs to evolve. Instead, use an “open” object-oriented type hierarchy with polymorphic methods instead of match expressions. We discussed this trade-off in “A Sample Application”.

Chaining Match Expressions

Scala 3 changed the parsing rules for match expressions to allow chaining, as in this contrived example:

// src/script/scala/progscala3/patternmatching/MatchChaining.scala

scala> for opt <- Seq(Some(1), None)
     | yield opt match {
     |  case None => ""
     |  case Some(i) => i.toString
     | } match {  // matches on the String returned from the previous match
     |  case "" => false
     |  case _ => true
     | }
val res10: Seq[Boolean] = List(true, false)

Pattern Matching Outside Match Expressions

Pattern matching is not restricted to match expressions. You can use it in assignment statements, called pattern bindings:

// src/script/scala/progscala3/patternmatching/Assignments.scala

scala> case class Address(street: String, city: String, country: String)
scala> case class Person(name: String, age: Int, address: Address)

scala> val addr = Address("1 Scala Way", "CA", "USA")
scala> val dean = Person("Dean", 29, addr)
val addr: Address = Address(1 Scala Way,CA,USA)
val dean: Person = Person(Dean,29,Address(1 Scala Way,CA,USA))

scala> val Person(name, age, Address(_, state, _)) = dean
val name: String = Dean
val age: Int = 29
val state: String = CA

They work in for comprehensions:

scala> val people = (0 to 4).map {
     |   i => Person(s"Name$i", 10+i, Address(s"$i Main Street", "CA", "USA"))
     | }
val people: IndexedSeq[Person] = Vector(Person(Name0,10,Address(...)), ...)

scala> val nas = for
     |   Person(name, age, Address(_, state, _)) <- people
     | yield (name, age, state)
val nas: IndexedSeq[(String, Int, String)] =
  Vector((Name0,10,CA), (Name1,11,CA), ...)

Suppose we have a function that takes a sequence of doubles and returns the count, sum, average, minimum value, and maximum value in a tuple:

// src/script/scala/progscala3/patternmatching/AssignmentsTuples.scala

/** Return the count, sum, average, minimum value, and maximum value. */
def stats(seq: Seq[Double]): (Int, Double, Double, Double, Double) =
  assert(seq.size > 0)
  val sum = seq.sum
  (seq.size, sum, sum/seq.size, seq.min, seq.max)

val (count, sum, avg, min, max) = stats((0 until 100).map(_.toDouble))

Pattern bindings can be used with interpolated strings:

// src/script/scala/progscala3/patternmatching/AssignmentsInterpStrs.scala

val str = """Book: "Programming Scala", by Dean Wampler"""
val s"""Book: "$title", by $author""" = str : @unchecked
assert(title == "Programming Scala" && author == "Dean Wampler")

I’ll explain the need for @unchecked in a moment.

Finally, we can use pattern bindings with a regular expression to decompose a string. Here’s an example for parsing (simple!) SQL strings:

// src/script/scala/progscala3/patternmatching/AssignmentsRegex.scala

scala> val c = """\*|[\w, ]+"""  // cols
     | val t = """\w+"""         // table
     | val o = """.*"""          // other substrings
     | val selectRE =
     |   s"""SELECT\\s*(DISTINCT)?\\s+($c)\\s*FROM\\s+($t)\\s*($o)?;""".r

scala> val selectRE(distinct, cols, table, otherClauses) =
     |   "SELECT DISTINCT col1 FROM atable WHERE col1 = 'foo';": @unchecked
val distinct: String = DISTINCT
val cols: String = "col1 "
val table: String = atable
val otherClauses: String = WHERE col1 = 'foo'

See the source file for other examples. Because I used string interpolation, I had to add extra backslashes (e.g., \\s instead of \s) in the last regular expression.

Next I’ll explain why the @unchecked type annotation was used.

Problems in Pattern Bindings

In general, keep in mind that pattern matching will throw MatchError exceptions when the match fails. This can make your code fragile when used in assignments because it’s harder to make them exhaustive. In the previous interpolated string and regex examples, the String type for the righthand side values can’t ensure that the matches will succeed.

Assume I didn’t have the : @unchecked type declaration. In Scala 2 and 3.0, both examples would compile and work without MatchErrors. Starting in a future Scala 3 release or when compiling with -source:future, the examples fail to compile, for example:

scala> val selectRE(distinct, cols, table, otherClauses) =
     |   "SELECT DISTINCT col1 FROM atable WHERE col1 = 'foo';"
     |
2 |  "SELECT DISTINCT col1 FROM atable WHERE col1 = 'foo';"
  |  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |pattern's type String is more specialized than the righthand side
  |expression's type String
  |
  |If the narrowing is intentional, this can be communicated by adding
  |`: @unchecked` after the expression.

This compile-time enforcement makes your code more robust, but if you know the declaration is safe, you can add the @unchecked type declaration, as we did earlier, and the compiler will not complain.

However, if we silence these warnings, we may get runtime MatchErrors. Consider the following examples with sequences:

// src/script/scala/progscala3/patternmatching/AssignmentsFragile.scala

scala> val h4a +: h4b +: t4 = Seq(1,2,3,4) : @unchecked
val h4a: Int = 1
val h4b: Int = 2
val t4: Seq[Int] = List(3, 4)

scala> val h2a +: h2b +: t2 = Seq(1,2) : @unchecked
val h2a: Int = 1
val h2b: Int = 2
val t2: Seq[Int] = List()

scala> val h1a +: h1b +: t1 = Seq(1) : @unchecked     // MatchError!
scala.MatchError: List(1) (of class scala.collection.immutable.$colon$colon)
  ...

Seq doesn’t constrain the number of elements, so the lefthand matches may work or fail. The compiler can’t verify at compile time if the match will succeed or throw a MatchError, so it will report a warning unless the @unchecked type annotation is added as shown. Sure enough, while the first two cases succeed, the last one raises a MatchError.

Pattern Matching as Filtering in for Comprehensions

However, in a for comprehension, matching that isn’t exhaustive functions as a filter instead:

// src/script/scala/progscala3/patternmatching/MatchForFiltering.scala

scala> val elems = Seq((1, 2), "hello", (3, 4), 1, 2.2, (5, 6))
val elems: Seq[Matchable] = List((1,2), hello, (3,4), 1, 2.2, (5,6))

scala> val what1 = for (case (x, y) <- elems) yield (y, x)        
     | val what2 = for  case (x, y) <- elems  yield (y, x)
val what1: Seq[(Any, Any)] = List((2,1), (4,3), (6,5))
val what2: Seq[(Any, Any)] = List((2,1), (4,3), (6,5))

: The case keyword is required for matching and filtering. The parentheses are optional.

Note that the inferred common supertype for the elements in elems is Matchable, not Any. For what1 and what2, the inferred type is a tuple—a subtype of Matchable. The tuple members can be Any.

The case keyword was not required for Scala 2 or 3.0. Starting with a future Scala 3 release or compiling with -source:future will trigger the “narrowing” warning if you omit the case keyword:

scala> val nope = for (x, y) <- elems yield (y, x)
1 |val nope = for (x, y) <- elems yield (y, x)
  |               ^^^^^^
  |pattern's type (Any, Any) is more specialized than the right hand side
  |expression's type Matchable
  |
  |If the narrowing is intentional, this can be communicated by writing `case`
  |before the full pattern.
[source,scala]

When we discussed exhaustive matching previously, we used an example of a sequence of Option values. We can filter out values in a sequence using pattern matching:

scala> val seq = Seq(None, Some(1), None, Some(2.2), None, None, Some("three"))
scala> val filtered = for case Some(x) <- seq yield x
val filtered: Seq[Matchable] = List(1, 2.2, three)

Pattern Matching and Erasure

Consider the following example, where we attempt to discriminate between the inputs List[Double] and List[String]:

// src/script/scala/progscala3/patternmatching/MatchTypesErasure.scala

scala> val results = Seq(Seq(5.5,5.6,5.7), Seq("a", "b")).map {
     |   case seqd: Seq[Double] => ("seq double", seqd)   // Erasure warning
     |   case seqs: Seq[String] => ("seq string", seqs)   // Erasure warning
     |   case other             => ("unknown!", other)
     | }
2 |  case seqd: Seq[Double] => ("seq double", seqd)   // Erasure warning
  |       ^^^^^^^^^^^^^^^^^
  |       the type test for Seq[Double] cannot be checked at runtime
3 |  case seqs: Seq[String] => ("seq string", seqs)   // Erasure warning
  |       ^^^^^^^^^^^^^^^^^
  |       the type test for Seq[String] cannot be checked at runtime

These warnings result from type erasure, where the information about the actual types used for the type parameters is not retained in the compiler output. Hence, while we can tell at runtime that the object is a Seq, we can’t check that it is a Seq[Double] or a Seq[String]. In fact, if we neglect the warning, the second case clause for Seq[String] is unreachable. The first clause matches for all Seqs.

One ugly workaround is to match on the collection first, then use a nested match on the head element to determine the type. We now have to handle an empty sequence too:

// src/script/scala/progscala3/patternmatching/MatchTypesFix.scala

def doSeqMatch[T <: Matchable](seq: Seq[T]): String = seq match
  case Nil => ""
  case head +: _ => head match
    case _ : Double => "Double"
    case _ : String => "String"
    case _ => "Unmatched seq element"

val results = Seq(Seq(5.5,5.6), Nil, Seq("a","b")).map(seq => doSeqMatch(seq))
assert(results == Seq("Double", "", "String"))

Extractors

So how does pattern matching and destructuring or extraction work? Scala defines a pair of object methods that are implemented automatically for case classes and for many types in the Scala library. You can implement these extractors yourself to customize the behavior for your types. When those methods are available on suitable types, they can be used in pattern-matching clauses.

However, you will rarely need to implement your own extractors. You also don’t need to understand the implementation details to use pattern matching effectively. Therefore, you can safely skip the rest of this chapter now and return to this discussion later, when needed.

unapply Method

Recall that the companion object for a case class has at least one factory method named apply, which is used for construction. Using symmetry arguments, we might infer that there must be another method generated called unapply, which is used for deconstruction or extraction. Indeed, there is an unapply method, and it is invoked in pattern-match expressions for most types.

There are several ways to implement unapply, specifically what is returned from it. We’ll start with the return type used most often: an Option wrapping a tuple. Then we’ll discuss other options for return types.

Consider again Person and Address from before:

person match
  case Person(name, age, Address(street, city)) => ...
  ...

Scala looks for Person.unapply(…) and Address.unapply(…) and calls them. They return an Option[(…)], where the tuple type corresponds to the number of values and their types that can be extracted from the instance.

By default for case classes, the compiler implements unapply to return all the fields declared in the constructor argument list. That will be three fields for Person, of types String, Int, and Address, and two fields for Address, both of type String. So the Person companion object has methods that would look like this:

object Person:
  def apply(name: String, age: Int, address: Address) =
    new Person(name, age, address)
  def unapply(p: Person): Some[(String,Int,Address)] =
    Some((p.name, p.age, p.address))

Why is an Option used if the compiler already knows that the object is a Person? Scala allows an implementation of unapply to veto the match for some reason and return None, in which case Scala will attempt to use the next case clause. Also, we don’t have to expose all fields of the instance if we don’t want to. We could suppress our age, if we’re embarrassed by it. We could even add additional values to the returned tuples.

When a Some wrapping a tuple is returned by an unapply, the compiler extracts the tuple elements for use in the case clause or assignment, such as comparison with literal values, binding to variables, or dropping them for _ placeholders.

However, note that the simple compiler-generated Person.unapply never fails, so Some[…] is used as the return type, rather than Option[…].

The unapply methods are invoked recursively when necessary, so the nested Address instance is processed first, then Person.

Recall the head +: tail expression we used previously. Now let’s understand how it actually works. We’ve seen that the +: (cons) operator can be used to construct a new sequence by prepending an element to an existing sequence, and we can construct an entire sequence from scratch this way:

val list = 1 +: 2 +: 3 +: 4 +: Nil

Because +: is a method that binds to the right, we first prepend 4 to Nil, then prepend 3 to that list, and so forth.

If the construction of sequences is done with a method named +:, how can extraction be done with the same syntax, so that we have uniform syntax for construction and deconstruction/extraction?

To do that, the Scala library defines a special singleton object named +:. Yes, that’s the name. Like methods, types can have names with a wide variety of characters.

It has just one method, the unapply method the compiler needs for our extraction case statement. The declaration of unapply is conceptually as follows (some details removed):

def unapply[H, Coll](collection: Coll): Option[(H, Coll)]

The head is of type H, which is inferred, and some collection type Coll, which represents the type of the tail collection. So an Option of a two-element tuple with the head and tail is returned.

We learned in “Defining Operators” that types can be used with infix notation, so head +: tail is valid syntax, equivalent to +:(head, tail). In fact, we can use the normal notation in a case clause:

scala> def seqToString2[T](seq: Seq[T]): String = seq match
     |   case +:(head, tail) => s"($head +: ${seqToString2(tail)})"
     |   case Nil => "Nil"

scala> seqToString2(Seq(1,2,3,4))
val res0: String = (1 +: (2 +: (3 +: (4 +: Nil))))

Here’s another example, just to drive home the point:

// src/script/scala/progscala3/patternmatching/Infix.scala

infix case class And[A,B](a: A, b: B)

val and1: And[String,Int] = And("Foo", 1)
val and2: String And Int  = And("Bar", 2)
// val and3: String And Int  = "Baz" And 3  // ERROR

val results = Seq(and1, and2).map {
  case s And i => s"$s and $i"
}
assert(results == Seq("Foo and 1", "Bar and 2"))

We mentioned earlier that you can pattern match pairs with ->. This feature is implemented with a val defined in Predef, ->. This is an alias for Tuple2.type, which subtypes Product2, which defines an unapply method that is used for these pattern-matching expressions.

Alternatives to Option Return Values

While it is common to return an Option from unapply, any type with the following signature is allowed, which Option also implements:

def isEmpty: Boolean
def get: T

A Boolean can also be returned or a Product type, which is a supertype of tuples, for example. Here’s an example using Boolean where we want to discriminate between two kinds of strings and the match is really implementing a true versus false analysis:

// src/script/scala/progscala3/patternmatching/UnapplyBoolean.scala

object ScalaSearch:                                                  
  def unapply(s: String): Boolean = s.toLowerCase.contains("scala")

val books = Seq(
  "Programming Scala",
  "JavaScript: The Good Parts",
  "Scala Cookbook").zipWithIndex   // add an "index"

val result = for s <- books yield s match                            
  case (ScalaSearch(), index) => s"$index: found Scala"              
  case (_, index) => s"$index: no Scala"

assert(result == Seq("0: found Scala", "1: no Scala", "2: found Scala"))

: Define an object with an unapply method that takes a string, converts to lowercase, and returns the result of a predicate; does it contain “scala”?
: Try it on a list of strings, where the first case match succeeds only when the string contains “scala.”
: Empty parentheses required.

Other single values can be returned. Here is an example that converts a Scala Map to a Java HashMap:

// src/script/scala/progscala3/patternmatching/UnapplySingleValue.scala

import java.util.{HashMap as JHashMap}

case class JHashMapWrapper[K,V](jmap: JHashMap[K,V])
object JHashMapWrapper:
  def unapply[K,V](map: Map[K,V]): JHashMapWrapper[K,V] =
    val jmap = new JHashMap[K,V]()
    for (k,v) <- map do jmap.put(k, v)
    new JHashMapWrapper(jmap)

In action:

scala> val map = Map("one" -> 1, "two" -> 2)
val map: Map[String, Int] = Map(one -> 1, two -> 2)

scala> map match
     |   case JHashMapWrapper(jmap) => jmap
val res0: java.util.HashMap[String, Int] = {one=1, two=2}

However, it’s not possible to implement a similar extractor for Java’s HashSet and combine them into one match expression (because there are two possible return values, not one):

// src/script/scala/progscala3/patternmatching/UnapplySingleValue2.scala
scala> ...
scala> val map = Map("one" -> 1, "two" -> 2)
scala> val set = map.keySet
scala> for x <- Seq(map, set) yield x match
     |   case JHashMapWrapper(jmap) => jmap
     |   case JHashSetWrapper(jset) => jset
... errors ...

See the source file for the full details. The Scala collections already have tools for converting between Scala and Java collections. See “Conversions Between Scala and Java Collections” for details.

Another option for unapply is to return a Product, or more specifically an object that mixes in this trait, which is an abstraction for types when it is useful to treat the member fields uniformly, such as retrieving them by index or iterating over them. Tuples implement Product. We can use it as a way to provide several return values extracted by unapply:

// src/script/scala/progscala3/patternmatching/UnapplyProduct.scala

class Words(words: Seq[String], index: Int) extends Product:         
  def _1 = words                                                     
  def _2 = index

  def canEqual(that: Any): Boolean = ???                             
  def productArity: Int = ???
  def productElement(n: Int): Any = ???

object Words:
  def unapply(si: (String, Int)): Words =                            
    val words = si._1.split("""\W+""").toSeq                         
    new Words(words, si._2)

val books = Seq(
  "Programming Scala",
  "JavaScript: The Good Parts",
  "Scala Cookbook").zipWithIndex   // add an "index"

val result = books.map {
  case Words(words, index) => s"$index: count = ${words.size}"
}
assert(result == Seq("0: count = 2", "1: count = 4", "2: count = 2"))

: Now we need a class Words to hold the results when a match succeeds. Words implements Product.
: Define two methods for retrieving the first and second items. Note the method names are the same as for two-element tuples.
: The Product trait declares these methods too, so we have to provide definitions, but we don’t need working implementations. This is because Product is actually a marker trait for our purposes. All we really need is for Words to mixin this type. So we simply invoke the ??? method defined in Predef, which always throws NotImplementedError.
: Matches on a tuple of String and Int.
: Split the string on runs of whitespace.

unapplySeq Method

When you want to return a sequence of extracted items, rather than a fixed number of them, use unapplySeq. It turns out the Seq companion object implements apply and unapplySeq, but not unapply:

def apply[A](elems: A*): Seq[A]
final def unapplySeq[A](x: Seq[A]): UnapplySeqWrapper[A]

UnapplySeqWrapper is a helper class.

Matching with unapplySeq is invoked in this variation of our previous example for +:, where we examine a sliding window of pairs of elements at a time:

// src/script/scala/progscala3/patternmatching/MatchUnapplySeq.scala

// Process pairs
def windows[T](seq: Seq[T]): String = seq match
  case Seq(head1, head2, tail*) =>                                   
    s"($head1, $head2), " + windows(seq.tail)                        
  case Seq(head, tail*) =>                                           
    s"($head, _), " + windows(tail)
  case Nil => "Nil"                                                  

val nonEmptyList   = List(1, 2, 3, 4, 5)
val emptyList      = Nil
val nonEmptyMap    = Map("one" -> 1, "two" -> 2, "three" -> 3)

val results = Seq(nonEmptyList, emptyList, nonEmptyMap.toSeq).map {
  seq => windows(seq)
}
assert(results == Seq(
  "(1, 2), (2, 3), (3, 4), (4, 5), (5, _), Nil",
  "Nil",
  "((one,1), (two,2)), ((two,2), (three,3)), ((three,3), _), Nil"))

: It looks like we’re calling Seq.apply(…), but in a match clause, we’re actually calling Seq.unapplySeq. We grab the first two elements separately, and the rest of the repeated parameters list as the tail.
: Format a string with the first two elements, then move the window by one (not two) by calling seq.tail, which is also equivalent to head2 +: tail.
: We also need a match for a one-element sequence, such as near the end, or we won’t have exhaustive matching. This time we use the tail in the recursive call, although we actually know that this call to windows(tail) will simply return Nil.
: The Nil case terminates the recursion.

We could rewrite the second case statement to skip the final invocation of windows(tail), but I left it as is for simplicity.

We could still use the +: matching we saw before, which is more elegant and what I would do:

// src/script/scala/progscala3/patternmatching/MatchWithoutUnapplySeq.scala

val nonEmptyList   = List(1, 2, 3, 4, 5)
val emptyList      = Nil
val nonEmptyMap    = Map("one" -> 1, "two" -> 2, "three" -> 3)

// Process pairs
def windows2[T](seq: Seq[T]): String = seq match
  case head1 +: head2 +: _ => s"($head1, $head2), " + windows2(seq.tail)
  case head +: tail => s"($head, _), " + windows2(tail)
  case Nil => "Nil"

val results = Seq(nonEmptyList, emptyList, nonEmptyMap.toSeq).map {
  seq => windows2(seq)
}
assert(results == Seq(
  "(1, 2), (2, 3), (3, 4), (4, 5), (5, _), Nil",
  "Nil",
  "((one,1), (two,2)), ((two,2), (three,3)), ((three,3), _), Nil"))

Working with sliding windows is actually so useful that Seq gives us two methods to create them:

scala> val seq = 0 to 5
val seq: scala.collection.immutable.Range.Inclusive = Range 0 to 5

scala> seq.sliding(2).foreach(println)
ArraySeq(0, 1)
ArraySeq(1, 2)
ArraySeq(2, 3)
ArraySeq(3, 4)

scala> seq.sliding(3,2).foreach(println)
ArraySeq(0, 1, 2)
ArraySeq(2, 3, 4)

Both sliding methods return an iterator, meaning they are lazy and don’t immediately make a copy of the collection, which is desirable for large collections. The second method takes a stride argument, which is how many steps to go for the next sliding window. The default is one step. Note that none of the sliding windows contain our last element, 5.

Implementing unapplySeq

Let’s implement an unapplySeq method adapted from the preceding Words example. We’ll tokenize the words as before but also remove all words shorter than a specified value:

// src/script/scala/progscala3/patternmatching/UnapplySeq.scala

object Tokenize:
  // def unapplySeq(s: String): Option[Seq[String]] = Some(tokenize(s))
  def unapplySeq(lim_s: (Int,String)): Option[Seq[String]] =           
    val (limit, s) = lim_s
    if limit > s.length then None
    else
      val seq = tokenize(s).filter(_.length >= limit)
      Some(seq)

  def tokenize(s: String): Seq[String] = s.split("""\W+""").toSeq      

val message = "This is Programming Scala v3"
val limits = Seq(1, 3, 20, 100)

val results = for limit <- limits yield (limit, message) match
  case Tokenize() => s"No words of length >= $limit!"
  case Tokenize(a, b, c, d*) => s"limit: $limit => $a, $b, $c, d=$d"   
  case x => s"limit: $limit => Tokenize refused! x=$x"

assert(results == Seq(
  "limit: 1 => This, is, Programming, d=ArraySeq(Scala, v3)",
  "limit: 3 => This, Programming, Scala, d=ArraySeq()",
  "No words of length >= 20!",
  "limit: 100 => Tokenize refused! x=(100,This is Programming Scala v3)"))

: If we didn’t match on the limit value, this is what the declaration would be.
: We match on a tuple with the limit for word size and the string of words. If successful, we return Some(Seq(words)), where the words are filtered for those with a length of at least limit. We consider it unsuccessful and return None when the input limit is greater than the length of the input string.
: Split on whitespace.
: Capture the first three words returned and the rest of them as a repeated parameters list (d).

Try simplifying this example to not do length filtering. Uncomment the line for comment 1 and work from there .

Recap and What’s Next

Along with for comprehensions, pattern matching makes idiomatic Scala code concise, yet powerful. It provides a protocol for extracting data inside data structures in a principled way, one you can control by implementing custom unapply and unapplySeq methods. These methods let you extract that information while hiding other details. In fact, the information returned by unapply might be a transformation of the actual fields in the instance.

Pattern matching is a hallmark of many functional languages. It is a flexible and concise technique for extracting data from data structures. We saw examples of pattern matching in case clauses and how to use pattern matching in other expressions too.

The next chapter discusses a unique, powerful, but controversial feature in Scala—context abstractions, formerly known as implicits, which are a set of tools for building intuitive DSLs, reducing boilerplate, and making APIs both easier to use and more amenable to customization.

Get Programming Scala, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Programming Scala, 3rd Edition by Dean Wampler

Chapter 4. Pattern Matching

Safer Pattern Matching with Matchable

Values, Variables, and Types in Matches

Tip

Warning

Matching on Sequences

Pattern Matching on Repeated Parameters

Matching on Tuples

Parameter Untupling

Guards in Case Clauses

Matching on Case Classes and Enums

Warning

Matching on Regular Expressions

Warning

Matching on Interpolated Strings

Sealed Hierarchies and Exhaustive Matches

Chaining Match Expressions

Pattern Matching Outside Match Expressions

Problems in Pattern Bindings

Pattern Matching as Filtering in for Comprehensions

Pattern Matching and Erasure

Extractors

unapply Method

Alternatives to Option Return Values

unapplySeq Method

Implementing unapplySeq

Recap and What’s Next

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly