Algebraic Data Types, pt 2: Case analysis is not a nested conditional

In the previous post in this series, I gave an example of trees in Python and Haskell:

class Tree:
    def __init__(self, value=None, left=None, right=None):
        self.value = value
        self.left = left
        self.right= right
    def preorder(self):
        if self.value is None: return []
            if self.left is None: left_pre = []
            else: left_pre = self.left.preorder()
            if self.right is None: right_pre = []
            else: right_pre = self.right.preorder()
            return left_pre + [self.value] + right_pre
data Tree = Empty | Branch Int Tree Tree
preorder Empty = []
preorder (Branch val left right) = preorder left ++ [val] ++ preorder right

It’s probably possible to capture some of the clarity of the Haskell version in our Python (perhaps by making the __init__ code a bit more complex, but that’s okay since most users of the class won’t have to modify __init__):

class Tree2:
    def __init__(self, value=None, left=None, right=None):
        if value is None and left is None and right is None:
            self.shape = 'empty'
        elif (value is not None and left is not None
              and right is not None):
            self.shape = 'branch'
            self.value = value
            self.left = left
            self.right = right
            raise SomeKindOfError
# finally, the actual function
def preorder(tree):
    if tree.shape == 'empty':
        return []
    elif tree.shape == 'branch'
        # We know the left and right subtrees are not None
        return preorder(tree.left) + [tree.value] + preorder(tree.right)
        raise ValueError("%s is not a valid tree!" % tree)

Indeed, this captures some of the clarity of the Haskell code: instead of writing preorder as a series of nested binary tests that interleave detective work (figuring out what shape our data has) with application logic (actually computing the preorder of the tree), we hide all the detective work in the Tree constructor, and all preorder has to do is query the information that’s already there. To someone reading preorder (and who knows what a Tree is), it’s immediately obvious what we’ve learned in each branch of the conditional: either the tree is 'empty' or we’re looking at a 'branch'.

But clarity is only one of the things that case analysis buys us. I can immediately think of two other things that you don’t get by simulating case analysis with a series of conditionals:

First, they’re easier for the compiler to understand, and more things the compiler understands means fewer things the programmer has to remember. For example, in our Python code above, it’s up to the programmer to remember that tree.shape == 'branch' means that tree has left, right, and value attributes that are not None. In Haskell, the compiler knows what attributes each constructor has, and if the programmer tries to use an attribute that isn’t present, the compiler will catch that mistake. Also, the compiler knows how many cases there are and can issue a warning if you miss any — this can be very helpful with more complex patterns or with data types have more constructors. For example, consider the following type for representing mathematical expressions (perhaps we’re implementing a calculator and we want to echo the expression back to the user instead of just displaying its result):

data Exp = Add Exp Exp | Sub Exp Exp | Mult Exp Exp | Neg Exp | Constant Int
eval :: Exp -> Int
eval (Constant x) = x
eval (Add x y) = eval x + eval y
eval (Sub x y) = eval x - eval y
eval (Mult x y) = eval x * eval y
eval (Neg x) = - (eval x)

That’s kind of a lot of constructors, and it’s easy to imagine extending this type with even more (variables are an obvious next step). Translating this into a Python class would be kind of tedious, and users of the class would need to remember what “shapes” to test and what data each shape contains. In Haskell, though, the compiler will immediately tell you whether you’ve handled every case correctly!

Second, case analysis can be compiled into more efficient code than a series of conditionals. In Python, the entire logic of “We know this data type has one of the following shapes:” is handled in user-level code. In this implementation, that means repeatedly fetching a string attribute and comparing it to one of the alternatives. But in a language where the compiler knows about case analysis, we can hide all of this dispatching logic away from the programmer, and that means that we can make all kinds of optimizations that would make the Python less readable. For example, in GHC, each data constructor is represented as just part of an integer (it only needs to be a couple bits, so it’s stored alongside some other pieces of data that only need to be a couple bits), and in many cases a pattern match can be compiled as a jump table lookup instead of sequentially testing alternatives.

Finally, in case this isn’t immediately obvious: boolean conditional branching can be implemented as a special case of case analysis. First, case analysis isn’t just a top-level function definition thing: we could rewrite our earlier tree functions with a so-called “case expression” instead.

preorder tree = case tree of
  Empty -> []
  Branch val left right -> preorder left ++ [val] ++ preorder right

So now we see that doing case analysis on a boolean is pretty much the same as an if expression:

if cond then trueVal else falseVal
-- is the same as
case cond of
  True -> trueVal
  False -> falseVal

Taking case analysis as a language primitive and deriving boolean branching from there is much nicer than taking boolean branching as primitive and trying to derive case analysis from that.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s