It is not uncommon to see function prototypes are declared without fully specified parameters.
In header file, Instead of writing
void *Arena_alloc (T arena, long nbytes, const char *file, int line);
You can write
void *Arena_alloc();
I used to do that, omitting prototype parameters in header files and forward declared functions. The reason? Keeping function prototypes updated felt dragging things down when function signatures were consistently changing during development.
I know this is a bad practice in engineering. But I never think about it from compiler’s perspective. “It’s going to be fine, as long as the compiler can compile it without warnings.”
Recently I wrote a code generator for my FL compiler. I realized that this kind of code could confuse the compiler to generate bizarre code.
Function Prototype Declaration
C language allows to declare a function that’s not defined in current compilation unit. The declared function is called a function prototype.
From compiler’s point of view, function names are identifiers and identifiers are symbols. Symbols are resolved during linking time. Therefore, if foo calls bar which is defined in another file, C language allows you to declare bar before its use in foo:
void bar(int);
int foo(int a) {
...
bar(a);
...
}
All looks good except ... C language also allows you to declare bar with an empty parameter list.
void bar();
int foo(int a) {
bar(a);
}
Declaration with No Parameter Specified
Empty parameter doesn’t mean that bar takes zero parameters, but bar takes unknown number of parameters. So in last example, the compiler will keep silence when foo calls bar with one parameter. Conceivably foo can also call bar with any number of parameters, even zero.
Isn’t it arity mismatch? Yes it is. And our compiler happily accepts it.
What’s the semantics, then?
What happens when arity mismatch occurs? To understand that, we need to understand how call stacks are generated by compiler. I made a small sample code to illustrate the semantics.
Code lib.c contains two functions
// lib.c
int arg1(int a) {
return a;
}
int arg2(int a, int b) {
return a + b;
}
However, lib.h declares the two functions without specifying their parameters, as below
// lib.h
int arg1();
int arg2();
Now main.c includes header lib.h and call arg1 and arg2 with mismatched arity.
// main.c
#include "lib.h"
#include <stdio.h>
int main()
{
int x = arg1(1, 2);
int y = arg2(3);
printf("x = %d, y = %d\n", x, y);
}
The above sample code can be compiled simply using
clang lib.c main.c
Run the program, we see
$ ./a.out
x = 1, y = 5
(The above code runs on MacOS.)
It’s not only compiles without warnings, but also runs fine! Let’s understand where the answer comes from.
It’s All About Stack Frames
Understanding how stack frames are reserved and removed is the key to understand the result. I’ve written Tiger Compiler Notes: I386 Backend on how compiler generates stack frames.
Specifically, arg1(1, 2) will generate stack frames below
high
| +---+
| ADR|...|<-EBP |...|
| +---+ +---+
| | 2 | | 2 |
| +---+ +---+
| | 1 |<-ESP | 1 |
| +---+ +---+
| |EIP|
| +---+
| |ADR| <-ESP,EBP
| +---+
V
low
The left one shows the stack before calling arg1. The right one shows the stack when arg1 runs. To access the first argument in arg1, compiler will generate code EBP+8, which has value 1, in our code.
When arg1 returns, the call stack is popped without erasing the content, so 2 and 1 are still there. The immediate call arg2(3) will place 3 at ESP and thus the stack becomes
high
| +---+
| ADR|...|<-EBP |...|
| +---+ +---+
| | 2 | | 2 |
| +---+ +---+
| | 3 |<-ESP | 3 |
| +---+ +---+
| |EIP|
| +---+
| |ADR| <-ESP,EBP
| +---+
V
low
The left one shows the stack before calling arg2. The right one shows the stack when arg2 runs. Even if arg2 has never called with two arguments, it uses whatever being left on the stack as its second argument. The first argument is 3, and the second is 2. The answer? 5!
Function Prototypes that misinform compilers
So what do I learn? Function prototypes without full specifications misinform the compiler! Even if the compiler generates running code, the code could be barely correct. It also push more responsibilities to the code writer to maintain a correct semantics of the code. In sum, just don’t do it.