In the previous post in this series, I gave an example of trees in Python and Haskell:
class Tree: def __init__(self, value=None, left=None, right=None): self.value = value self.left = left self.right= right def preorder(self): if self.value is None: return  else: if self.left is None: left_pre =  else: left_pre = self.left.preorder() if self.right is None: right_pre =  else: right_pre = self.right.preorder() return left_pre + [self.value] + right_pre
data Tree = Empty | Branch Int Tree Tree preorder Empty =  preorder (Branch val left right) = preorder left ++ [val] ++ preorder right
It’s probably possible to capture some of the clarity of the Haskell version in our Python (perhaps by making the
__init__ code a bit more complex, but that’s okay since most users of the class won’t have to modify
class Tree2: def __init__(self, value=None, left=None, right=None): if value is None and left is None and right is None: self.shape = 'empty' elif (value is not None and left is not None and right is not None): self.shape = 'branch' self.value = value self.left = left self.right = right else: raise SomeKindOfError # finally, the actual function def preorder(tree): if tree.shape == 'empty': return  elif tree.shape == 'branch' # We know the left and right subtrees are not None return preorder(tree.left) + [tree.value] + preorder(tree.right) else: raise ValueError("%s is not a valid tree!" % tree)
Indeed, this captures some of the clarity of the Haskell code: instead of writing
preorder as a series of nested binary tests that interleave detective work (figuring out what shape our data has) with application logic (actually computing the preorder of the tree), we hide all the detective work in the
Tree constructor, and all
preorder has to do is query the information that’s already there. To someone reading
preorder (and who knows what a
Tree is), it’s immediately obvious what we’ve learned in each branch of the conditional: either the tree is
'empty' or we’re looking at a
But clarity is only one of the things that case analysis buys us. I can immediately think of two other things that you don’t get by simulating case analysis with a series of conditionals:
First, they’re easier for the compiler to understand, and more things the compiler understands means fewer things the programmer has to remember. For example, in our Python code above, it’s up to the programmer to remember that
tree.shape == 'branch' means that
value attributes that are not
None. In Haskell, the compiler knows what attributes each constructor has, and if the programmer tries to use an attribute that isn’t present, the compiler will catch that mistake. Also, the compiler knows how many cases there are and can issue a warning if you miss any — this can be very helpful with more complex patterns or with data types have more constructors. For example, consider the following type for representing mathematical expressions (perhaps we’re implementing a calculator and we want to echo the expression back to the user instead of just displaying its result):
data Exp = Add Exp Exp | Sub Exp Exp | Mult Exp Exp | Neg Exp | Constant Int eval :: Exp -> Int eval (Constant x) = x eval (Add x y) = eval x + eval y eval (Sub x y) = eval x - eval y eval (Mult x y) = eval x * eval y eval (Neg x) = - (eval x)
That’s kind of a lot of constructors, and it’s easy to imagine extending this type with even more (variables are an obvious next step). Translating this into a Python class would be kind of tedious, and users of the class would need to remember what “shapes” to test and what data each shape contains. In Haskell, though, the compiler will immediately tell you whether you’ve handled every case correctly!
Second, case analysis can be compiled into more efficient code than a series of conditionals. In Python, the entire logic of “We know this data type has one of the following shapes:” is handled in user-level code. In this implementation, that means repeatedly fetching a string attribute and comparing it to one of the alternatives. But in a language where the compiler knows about case analysis, we can hide all of this dispatching logic away from the programmer, and that means that we can make all kinds of optimizations that would make the Python less readable. For example, in GHC, each data constructor is represented as just part of an integer (it only needs to be a couple bits, so it’s stored alongside some other pieces of data that only need to be a couple bits), and in many cases a pattern match can be compiled as a jump table lookup instead of sequentially testing alternatives.
Finally, in case this isn’t immediately obvious: boolean conditional branching can be implemented as a special case of case analysis. First, case analysis isn’t just a top-level function definition thing: we could rewrite our earlier tree functions with a so-called “case expression” instead.
preorder tree = case tree of Empty ->  Branch val left right -> preorder left ++ [val] ++ preorder right
So now we see that doing case analysis on a boolean is pretty much the same as an
if cond then trueVal else falseVal -- is the same as case cond of True -> trueVal False -> falseVal
Taking case analysis as a language primitive and deriving boolean branching from there is much nicer than taking boolean branching as primitive and trying to derive case analysis from that.