Static Function Arglist Mismatch

An unsettling amount of my time working at Prismatic has been spent looking for errors whose ultimate cause turns out to be a function statically applied to the wrong number of arguments. By this I mean that you can tell that the function is applied to too many or few arguments just by looking at the function definition and call site, without doing any sophisticated control flow analysis or reasoning about the values of any variable. For example:

(defn foo [x y z] ...)
...
(foo x)

Things aren’t always this easy, of course. If we want to be able to check something like this trivially, there are a few wrinkles:

Declaration site


The function needs to have a static number of arguments at the declaration site — variadic functions are harder to check. If the declaration looks like (defn foo [x y z] ...), the compiler should be able to know exactly how many arguments the function takes. But if it’s like (defn foo [x y z & rest]) instead, knowing whether a given number of arguments is an appropriate input is much harder and probably impossible in general.

Call site


The function also needs to be called with a static number of arguments — which is to say, we need to provide an argument list in the source code rather than using a list variable. (foo 5 6 7) is statically applied to three arguments, so if its declaration statically requires three arguments we know this code is wrong. But something like (apply foo list-of-arguments) applies foo to a dynamic number of arguments, and we have to find out the length of list-of-arguments to know whether the call is correct. This is of course impossible in general.

The entire codebase


This one is probably less of an issue in practice than the other two, but it’s much harder to check for, and I expect that it is the main reason function argument counting is not usually implemented.

Basically, suppose we have something like this:

(defn foo [x y z] ...)
(defn foo [x y] ...)
(foo 5 6)

This is perfectly legal in Clojure and in most other languages, and it means that we can’t just nominate a single declaration form as the definitive place to find foo‘s argument count. And while in this particular case it’s pretty obvious that the function being called has an arity of exactly 2, there are obfuscations we can do to make it less clear whether a function has been redefined between its first definition and any particular call site. So whereas with the other two restrictions you can check whether they apply and turn off the static checking, it is certainly very difficult and maybe impossible to determine whether a definition has been superseded, and that means that static argument counting can disallow valid code.

(defn foo [x y z] ...)
;; something here that redefines foo
;; so that it can take two arguments
(foo 5 6) ;; this is valid!

If the compiler can’t tell that this unspecified code in the middle is redefining foo… it will disallow this code even though it’s technically valid and won’t crash.


So does this mean that in a dynamic language we can’t have the extremely helpful sanity checking of counting a function invocation’s arguments and comparing that to the definition? Actually, no!

The thing that really prevents us from performing this check is the third caveat, simply because it’s difficult to detect whether the caveat is applicable or not. But this caveat doesn’t inescapably apply to all dynamic languages: it’s a side effect of tying the top level namespace to the execution model. If top-level functions were immutable1, there wouldn’t even be the possibility of caveat 3, and we could safely protect programmers from passing the wrong number of arguments.

Of course, if you’re not willing to disallow the redefinition of top-level names, there are still ways to perform the check. They’re not quite as obvious, and I don’t actually know any of them, but I’m certain it’s possible because the SBCL compiler for Common Lisp can very often emit warnings if you call a function with too few arguments.

It would be really nice if languages like Clojure and Python could take a leaf out of SBCL’s book and start doing some straightforward checking of function argument count. It’s not applicable in every case, but where it doesn’t work it can just be disabled with no downside. And over the rest of my life, I calculate that it will save me seven years and seven days of trouble.


  1. technically, if top-level variables were immutable. Clojure functions already are in the sense that once you have a reference to a function it’s set in stone, but if you’re calling a function by name (which is what happens almost all of the time), the referent of that name (and hence the meaning of the function call) can still change. 

Type-directed termination

I’ve been distracted from Serpent lately, not just by work but by a fascinating theoretical idea.

The Inspiration


A couple months ago I came across this great paper1 detailing a way to do guardedness checking in a type system.

(A quick detour, because there are going to be a lot of strange words here. “Coprogramming” is a fancy way to talk about programs that are supposed to be infinite loops (like an operating system or game). “Guardedness” is a property of coprograms that can be used to test whether your coprogram is “productive”, which is a particular way for it to be well-behaved. For more information, see previous posts, and if I ever find a nice introductory article I will announce it somewhere.)

Anyway. Type-directed guardedness checks are great because traditional productivity checkers are based on syntactic guardedness, which means (roughly) they just consider the program’s text without thinking about what’s actually happening in the program. This is okay in the sense that it never produces any false positives: if a syntactic guardedness check returns “true” for your program, then your program is definitely “well-behaved” (in the specific sense of productivity). But failing to take into account the program’s semantics means we misclassify a lot of well-behaved things as badly behaved. Usually false negatives can be rescued by rewording them so that the guardedness is obvious to the compiler, but this often produces code that is harder to write and harder to read.

Atkey and McBride’s framework doesn’t directly take program semantics into account, but it does bring productivity checking into the type system, which is almost as good. Basically, each infinite structure has a type that represents “how much of it has already been completed”, and this allows the type system to control how much of the structure you’re allowed to use. Basically, an incompletely constructed infinite sequence has a type that only supports accessing the pieces that have already been definitely placed, but after you’ve constructed it (i.e. outside the definition) it has a type that supports retrieving arbitrary pieces. This ensures that information never depends on Information From The Future, which means your infinite data structure is “productive” or well-behaved.

So that’s why Atkey and McBride’s paper is exciting. What’s this idea of mine that I mentioned in the introduction?

Termination checking


So, it turns out that termination checking (proving that a computation that isn’t supposed to be an infinite loop, well, isn’t an infinite loop) is also traditionally done using syntactic checks. This has the same drawbacks as productivity checking: lots of false negatives, and rewriting the false negatives so that the checker can see they’re well-behaved gives you code that’s a lot uglier and more distracting.

Now it might be obvious from the way we name these things that type theorists consider recursion and corecursion to be closely related. (Precisely, in category theory the relationship is called being “dual” to each other. As I am not an expert category theorist, I won’t go too far into that in this post.) The relationship is pretty simple to verbally define, but its implications are not necessarily easily to grasp: I can’t quite visualize what arrows I need to flip to transform one into the other. But I have the powerful intuition that, if I can figure out how to “flip the arrow” to turn recursion into corecursion, I can use the same “arrow-flipping” procedure on the productivity checking method described in this paper and get…

a method of termination checking. One that’s type-directed rather than relying on dumb syntactic checks.

This is pretty exciting, because when programming in Idris I upset the termination checker a lot more often than the productivity checker. It’s exciting because here is an academic problem that I don’t know the answer to, and it’s been a while since I’ve seen one of those. It’s exciting because as far as I can tell from a cursory Google Scholar search, no one has done this yet.

A part of me would really love to reread this paper a couple more times, until I understand exactly what they’ve done and (hopefully) how they came up with the idea. That part of me would go on to perhaps do some research on the category-theoretic formulation of recursion and corecursion, so that she can visualize the arrow to flip to turn one into the other. That part of me would be ecstatic if it turned out there was a publishable paper somewhere in here, but even if it turns out someone else has already done it, she’d feel that the independent discovery was worth her time, if only as a recreational activity.

Harsh Reality


Perhaps my intuition is mistaken, and there is no straightforward transformation of this productivity checker into a termination checker. But before I even think about that, I need to remember Serpent.

If I really wanted to, I’m sure I could convince myself that abandoning Serpent is the right choice here. And I’m sorely tempted, but… I’m trying to set a precedent for myself. There are important skills that I need but don’t have, and I’m not going to develop those skills by returning to my old ways of abandoning projects every time I come across an idea that excites me.

So as much as I’d love to drop back into type theory, and think about clock variables and flipping arrows… I think it’s pretty important that I finish Serpent first.


  1. Atkey and McBride, Productive Coprogramming with Guarded Corecursion 

C is declarative, kind of

Inconsequential musing of the day:

If you look at say, a Python file — or Bash or Ruby or almost any other scripting language — you can’t find any information1 about the names defined in a program without actually running the script. This is because in Python, the only way to associate a name with a value is to execute a line of code that causes it to get some value (maybe an assignment statement, maybe a function definition, maybe an eval that contains one of these things). The association of top-level names to values is inherently a procedural process, defined only by the language’s execution model.

Contrast this with the situation in C. C has actual declarations, which associate certain information to a name completely independently of the execution model, and in fact the execution model usually doesn’t2 cause values to become associated with top-level names at all. C’s top-level namespace is specified declaratively: it states directly which names are associated with which values.

Does this matter? In practice, probably not much. In theory, a declarative program layout should be easier to manipulate through automated tooling (for example, to list out the documentation for all of the top-level names defined in a module/file without running any of its code). But most reasonable Python programs are pretty declarative at the top level anyway: you rarely see things like

# webserver.py
turing_machines = [...]

if does_it_halt(turing_machines[0]):
    def post(blah, bleh):
        blee
if does_it_halt(turing_machines[2]):
    def get(bloo):
        bluh

globals()[running_time(turing_machines[1])].__doc__ =
  ''.join(accepted_language(turing_machines[1]))

On the other hand, languages like C do make it a lot easier to catch typos, which I consider to be an appreciable benefit. Not that typos are a particularly difficult to track down class of bug.

But mostly I just found it interesting to think about how C’s model is, in a way, more declarative than lots of more modern languages.


  1. In general. Of course most specific scripts aren’t quite so horrible. 
  2. I think there are a couple exceptions to this having to do with extern and/or static top level variables and perhaps also function pointers (can those be defined at the top level?). But the top level of a basic C program consists entirely of function declarations and no assignment. 

The False God Convenience

I will admit immediately that the title of this blog post is somewhat sensational. I do not consider myself an enemy of convenient things. I love syntactic sugar in my programming languages as long as I know the translation is principled. I certainly don’t think that any programming language should be made purposely difficult to use or learn.

But… I can’t help but feel that a lot of the time, when people set out with the goal of creating a system that is convenient and easy to use, they end up with something that is neither. It seems to me that the languages most people think of as “convenient” are really those with the most features that look convenient and easy in isolation, but that take painstaking care to use properly when combined with other seeming conveniences. There is something to be said for building systems that make intuitive sense to people, but intuition is too unstable to build a core foundation on — combine a bunch of intuitions, and you might get something that doesn’t even make intuitive sense.

Comparisons


For example, take equality comparison in languages like Perl or PHP, where comparing values of disparate types causes some coercion to take place before the comparison happens. The user might expect two similar things (like a number and a string encoding that number) to compare as equal. But when this intuition is encoded into the language, we destroy lots of other intuitions that a user might have about equality, such as transitivity (“if a == b and b == c, then a == c”) or symmetry (“if a == b, then b == a”). By conveniently accommodating one of the user’s intuitions, we’ve decoupled the “==” operator from the most important intuition of all: “are these two things the same?”. So if a user of the language wants to test for two things being the same, it requires a nontrivial and inconvenient analysis of what comparison operators are available in the language.

Look at the table of weak comparisons here. There are no obvious rules like “Falsy things are equal to each other” or “If two things are equal, then anything equal to one of them is equal to the other” or even “two things-that-represent-numbers are equal if they represent the same number”. There’s just a giant table of arbitrary booleans that you must memorize if you want to use nonstrict equality.

Error Codes


Another misfeature often incorporated in the name of convenience is for functions to silently return some kind of “sentinel value” on failure, where by “sentinel value” I mean something that can easily be distinguished from a successful return value given the semantics of the function, but that nevertheless supports all of the same operations that would be supported by a value returned in case of success. For example, in C, a function to search an array for a particular element might return that element’s index if it is found in the array, and -1 otherwise. Clearly -1 isn’t a valid successful return value of the function — index -1 isn’t even part of the array. But for the program to know that there has been a failure, it’s the programmer’s responsibility to check for one before the return value is ever used. Failure to do so is invisible — using a “sentinel value” without such a check, by definition, doesn’t crash the program, since the failure value supports all the same operations as a successful value. If the program later goes on to crash or do something nonsensical, it could be difficult to trace the error back to its source.

Interoperation of Different Types


Contributing to this problem is the practice of having every built-in type support many more operations than it should. In Python, for example, a string supports the interface of being “multiplied” by an integer — something like "hands" * 3 evaluates to "handshandshands". Similarly, in many dynamic languages, every type supports the interface of “being used as a boolean”. In some languages like PHP, even the null value supports some operations that are surprising, such as being compared to integers. These behaviors, while they may seem convenient and easy when you want them (why shouldn’t you be able to repeat a string three times?), mean that a string is not a good error value for a function that normally returns integers, and nothing is a good error value for a function that normally returns a boolean, and NULL isn’t a good error value for any function, because in each of these cases if the user forgets to check for an error, the language will offer them no help in spotting their mistake.

This is especially a problem because it seems, intuitively, that (for example) integers are different enough from strings that they should be okay to use as an exceptional value — and dynamic language designers and library writers often do use “different enough” types as exceptional values. But because both types support too-diverse operations, the language offers no help in formalizing the programmer’s intuition that “I want an integer here, and if for some reason this variable is a string then that’s an error”. Instead, the user has to make the relevant checks (which can often be difficult to get right, if the language goes too far out of its way to make different types interchangeable) manually.

So what instead?


All of these features have a kernel of goodness about them. Sometimes we really do want to test things like “is this string, interpreted as a number, the same as this other number?”. It’s nice to be able to check for errors using the return value of a function instead of wrapping some code in a try/catch block. And it’s good to be able to write functions that work on lots of different types of similar data and treat all the different types the same. But trying to make all of these nice features implicit is a false convenience — it hides important details of what’s going on from the user, causing much more trouble than it gains.

So how do we get features that are actually convenient and easy to use? One thing that helps is to make them explicit.

For example, don’t coerce things before a comparison unless the user asks for it. 11 == toInt("11") is hardly less intuitive than 11 == "11", and arguably is even more intuitive — the former reads as “is 11 the number represented by the string ’11′”, whereas the latter reads more like “is the number 11 the same thing as the string ’11′”, not to mention that requiring toInt saves all the tangled madness of PHP’s weak comparisons. All we have to do is have the user tell us how they want to compare things, instead of trying to guess.

In Haskell there are types for explicitly representing the possibility of failure, which can’t be used like ordinary values unless the failure has been acknowledged and handled. (This could be done in most languages that aren’t C, but almost never is.) So instead of returning -1 on failure to find the index of an element in an array, we return Nothing (which sounds suspiciously like null but for the crucial difference that it doesn’t inhabit arbitrary types, which means there is no risk of accidentally using Nothing where you meant to use an integer). And in case of success we return Just 7 (for example) instead of 7. Since neither of these is an integer (one is an empty box representing failure, and the other is a wrapped integer that is never transparently unwrapped), the user can never forget to check whether there was a failure. If they do, the compiler will point out their mistake.

And instead of making built-in types support lots of different operations, you can have functions that allow the user to ask explicitly for the various behaviors that those operations can have for different types. repeat("hands", 3) is like "hands" * 3, but makes explicit that you’re expecting a string here — this is especially important if you’re using a variable instead of "hands", because otherwise it’s not immediately clear what the “multiplication” is supposed to do. Similarly, if isEmpty([]) ... is a drop-in replacement for if [] ..., except in those languages for which [] is truthy. Do you know off the top of your head which scripting languages treat [] as true and which treat it as false? But with an explicit ifEmpty it’s obvious what circumstances will lead into which branch of the conditional.

My intuitions about the “right” way to design a programming language are pretty complex. I’m not sure I could describe them completely to myself, let alone to someone who doesn’t live in my brain. But I’m pretty sure that there should be mathematical rigor involved, even if you’re designing a language that will never be used by mathematicians, even if you don’t want your users to have to learn any math to use the language.

On Giving Things Up, pt 2

Yesterday’s post was supposed to be about a general principle, rather than a particular example — I got a bit carried away. That principle is: there are a lot of decisions being made in modern programming language theory that feel like sacrifices, like giving things up. But they’re not, really. They’re gaining things — additional structure — and in almost every case, whatever you wanted to do would have fit into the structure already.

Example: static typing
I hope I don’t get carried away again, but let’s try one more example before we continue about the general principle. A lot of people really like dynamically typed languages as opposed to statically typed languages. I used to feel like this too. Type systems like Java’s and C’s felt incredibly restrictive to me — why couldn’t I just do it in Python, where I write a function once and it automatically works on any value?

Except, of course, it doesn’t automatically work on any value. When you write something like

  def double_and_subtract_one(x):
      return x*2 - 1

you are expecting x to be a number. Sure, you can pass it a dictionary if you define a subclass of dict with an __lmult__ method that returns a number, or whatever, but it’s not what you’re expecting. You, the programmer, have specific ideas about the types1 of most of your variables, even if Python doesn’t.

If I may be so bold: I don’t think you’ve ever wanted or will ever want to write code that couldn’t, in principle, pass a sufficiently smart type checker. You don’t want to write code where it’s impossible to say that you won’t ever subtract an object from a string, or call a method on an object that doesn’t have one. Maybe there are no modern typecheckers smart enough2 to have the same ideas as you do about how your code works, and in that case I would agree that no existing statically typed language is suitable for you. But nothing you want is inherently opposed to the idea of type analysis itself.

Generalities
Okay, back to the general principle: you can “give up” these “freedoms”, and often it will turn out that you never would have used these freedoms anyway. But why would you want to do this? Isn’t it still better to have more freedom, just in case?

Well… it’s not that simple. You can look at it as losing freedom, but as I touched on in the opening paragraph, you can also think of it as gaining structure or knowledge. And this means that if you keep those freedoms (which, again, you were probably never ever going to use), you give up structure that you and the compiler could use to write better, safer code.

Type systems are a bit polarized for an example, so let’s return to the previous subject of termination checking. You can either allow general recursion and potentially shoot yourself in the foot, or you can have a termination checker and restrict yourself a tiny bit.

What do you lose if you go with a termination checker? The ability to write a very small class of bizarre3 programs — those that cannot be proven not to loop forever without doing useful work. Even if you’re writing an operating system you don’t want to do this — you don’t want to forever get stuck incrementing a single counter because you missed a loop boundary condition, you want to loop forever doing the bookkeeping required to run an operating system.

What do you gain? Well, it’s not the ability to write programs that do useful work — you can write those in a language with general recursion. What you gain is help from the compiler. Without a termination checker, you have to step through your code manually before you can understand whether it’s going to run the way you want it — keeping track along the way of every possible state transition, every combination of variable values at any point. The freedom to write malformed programs comes with an exacting obligation to ensure that you’re not writing malformed programs. Accept a termination checker, and this requirement is lifted — leaving you with more time to attend to the interesting, application-specific logic of your program.

Freedom and the Turing tarpit are two sides of the same coin. All human applications are built of structure.


  1. If you’re a fan of duck typing, maybe you have ideas about interfaces, rather than particular types. But a lot of statically-typed languages support interfaces too — tracking the interfaces supported by a particular variable is not really much different from tracking the types. 
  2. But if you haven’t looked at Haskell or OCaml or another language of that family, please don’t conclude this until you have. Haskell is an entirely different world than C or Java. 
  3. Actually, what termination checkers lock you out of is a bit more worrying than my previous post suggests. Since I wrote that post, I have learned the following theorem: a language whose programs always terminate is unable to express certain always-terminating programs (unsurprising), including (gasp) the interpreter for that language itself. I certainly didn’t expect this and I will admit it makes me uneasy. Nevertheless, it seems more like the same kind of thing as the famous “no system can represent itself” result from Goedel, rather than “this system is unexpectedly weak”, so I think my point still stands. 

On Giving Things Up

There are a few functional programming languages (like Agda, Coq, and Idris) that take a radical step: their compilers will attempt to prove that a given program eventually terminates, and if it can’t prove this, that’s a compiler error.

“But wait!” you say. “What about programs that must run forever?” These languages have an answer to this problem, but it’s a bit beyond the scope of this post. I might write about it in the future; if you’re interested, look up codata.

No, it’s another “problem” with termination checking that I want to talk about. Some of you are doubtless familiar with the famous halting problem — and the fact that it is undecidable. And termination checkers for languages like Agda and Coq don’t try to solve the halting problem — instead, they’re conservative. There are a couple techniques you can use to try to determine whether a program halts, techniques which will never give you a false positive — never tell you that a program halts when it runs forever. Agda’s termination checker tries a bunch of these techniques, and if none of them work, then it assumes the program doesn’t terminate and signals a compile error. This situation is often described as “Agda has a very powerful type system, but it’s not Turing complete. It gave up Turing completeness in exchange for being really really sure about correctness.” As if this is a drawback of focusing so hard on correctness.

I used to feel this way too — that Agda might be a cool language, but it wasn’t Turing complete, so it is ultimately useless, a thing that belongs locked away in some ivory tower. But actually, if you think about it, Turing completeness is not all that big a deal.

Yes, you want a programming language to be more powerful than a calculator. In fact, you want it to be very powerful indeed — you want it to be powerful enough to express anything you’d ever want to write.

But now I’m going to make a bold claim. I don’t think you’ve ever wanted to write a program for which you can’t know whether it terminates. I don’t think you ever will want to, or need to, write such a program.

You might want to write programs that loop infinitely. Agda can do that. What it can’t do is express programs that loop infinitely without accomplishing anything. I don’t think that’s very important.

Showing that a language is Turing-complete is a good, convenient way of proving without a doubt that the language is powerful enough for our purposes: it can represent anything representable, so even if it’s not good enough, we can’t do any better. But I’m not convinced that a language must be Turing complete in order to be worthwhile, even as a general-purpose language. “Turing space” is full of incomprehensible madness that is no use to us anyway, so why not “give up” the ability to express those nonsense programs in exchange for a more helpful compiler?