The Problem with Subtle Shades of Mu

This post was provoked by a real event,

unlike many of my posts which are born of pure imagination.

Javascript distinguishes between null and undefined. A cursory Google suggests that null is meant to be used by the programmer to represent an intentional null value and undefined is supposed to be a value that “doesn’t exist”. But this is a pretty vaguely worded distinction, and it lives entirely in the programmer’s mind and culture. The compiler has no idea what the difference between the two use cases is.

I was using a webpage a while ago and noticed that one of their buttons didn’t work. (It was a pretty important button too, rendering the page completely unusable.) On a whim, I decided to open up the Firefox dev tools console and see whether there was some kind of error. Sure enough, TypeError: paymentMethod is undefined showed up in the log, with a handy line number attached. I clicked on the line number and scanned through the code, wondering if the fix would be obvious, and… wait a minute, they’re checking whether paymentMethod is undefined just a few lines above! Control flow should never hit here unless paymentMethod has a value…

Oh wait. The check is paymentMethod !== null. And sure enough, undefined == null evaluates to true, but undefined === null evaluates to false. They checked to make sure paymentMethod wasn’t null, and then acted as if they knew paymentMethod was neither null nor undefined — relying on assumptions that were not provided by the check they made.1

if (x !== null) ... is an insidious mistake, because it looks like within its scope you should be safe. Even an experienced programmer could be forgiven for allowing their eyes to gloss over it, and proceeding to assume that in the following code block x has a value.

Some language features can demand an unreasonable amount of your attention.

A vast maze of choice

Clojure has two falsy values, false and nil. false is a pretty straightforward boolean; nil is a bizarre mashup of falsity, failure, every kind of empty sequence, and Java’s null.

It just so happens that idiomatic Clojure uses a lot of functions that return “either nil or some non-nil value” as pseudo predicates, since you can conditionally branch on any kind of value.

But nil corresponds to null on the Java side, so it generally shouldn’t be used with Clojure’s java interop. I once had to write (if x x false)2 instead of x, because we had a function that called out to Java and therefore wasn’t allowed to return nil. (I’m lucky the function’s docstring said as much, or it could have taken me weeks to figure out what was going wrong.)

And then there’s the fact that a lot of Clojure data is represented by key-value maps. What’s the difference between {:some-property false}, {:some-property nil}, and {}?3 Are they all allowed? Do you prefer one form over the others? You decide!

Or well, you try to decide. But it’s not really your choice: it’s your codebase’s choice. Every time someone checks to see whether a given map has :some-property or not, they’re making an implicit choice as to what those three things mean. And if sometimes people are sloppy, because in the heat of the moment one interpretation was obvious and the others never even came to mind… well, you just have to hope that all of these accidental, unconscious choices happened to match up.

(Even though you know in your heart that they don’t. They never do, for the universe is cruel.)

Whatever shall we do?

If you’ve been reading my blog for very long, you might expect me to now embark on a rant about how Haskell does it better. This is, in fact, what I’m going to do.4

For very simple sources of failure, there’s the Maybe type, which I have probably written a lot about (and if it doesn’t have its own dedicated blog post it will probably get one eventually). Maybe is useful when there’s only one kind of failure possible (or there might as well be only one for all you care), and when you want to respond to that failure just by abandoning the control flow that led to the error. When you’re in this situation, you can use Maybe to introduce a single failure value, guaranteed to be distinct from any of the “success” values, which basically has the behavior of causing control flow to be abandoned whenever something tries to operate on it.

What about when you want to have multiple failure values, with subtle distinctions between them?

Well, first of all, think twice. Dealing with subtly different shades of mu can be a huge headache.

But if you really need to, Haskell has an idiomatic way of easing the headache. First, define an explicit data type enumerating the ways something might have failed: data Error = KeyboardNotFound | SegmentationFault | PEBKAC. Then you use the Either type, which is like Maybe except that instead of one failure value there are many. (In particular, Either errorType introduces as many failure values as there are values of errorType). But while there are many different failure values, there’s a bastion of order and sanity for you to cling to: all of the different ways of failure are guaranteed to have the same effect on control flow (namely, the abortion of control flow). And it is guaranteed to be distinct from the effect of successful values on control flow, which are in turn all the same as each other.

And most importantly, there is a single way to check for this composite notion of failure, and this check is easy to remember and doesn’t hide in a room full of distracting wrong alternatives.

  1. this is one of the more important, nonobvious things a static type system can give you. A really good type checker can, to some degree, infer what assumptions a code block relies on and what properties a dynamic check verifies. If these things don’t match up, as in this javascript case, you can get an error at compile time. 
  2. I now realize that this could be replaced by the equally horrible but more compact (or x false)
  3. It gets even worse if you think about lists instead of booleans. false vs nil vs “not present” works out the way you’d expect barring occasional subtle interactions, but [] vs () vs {} vs #{} vs “not present” requires a lot more care — especially since almost all functions that act on collections return seqs5 (like ()) regardless of what the input was, and because nil can be treated to some extent like an empty collection of any built-in collection type. 
  4. note, however, that this section glosses over a few things. It is in praise of Haskell’s ideal philosophy, which I do believe is closer to perfection than any language before it, but which Haskell does not unfailingly embody. I’m happy to talk about the various unfortunate departures Haskell makes, but in this section of this blog post a lengthy discussion of such would merely distract from the point. 
  5. except, ironically, for seq, which coerces the input collection to a seq unless the input was empty in which case it returns nil

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s