Folly code base extensively uses a macro called UNLIKELY. For instance, this is a code snippet in Notes on Folly: Thread Pool Executors:
if (UNLIKELY(!task || task.value().poison)) {
// Actually remove the thread from the list.
SharedMutex::WriteHolder w{&threadListLock_};
...
}
Similarly, Linux kernel also uses such instruments. How does it work?
Let’s take a look at how the macro is defined in folly:
#if defined(__GNUC__)
#define LIKELY(x) (__builtin_expect((x), 1))
#define UNLIKELY(x) (__builtin_expect((x), 0))
#else
#define LIKELY(x) (x)
#define UNLIKELY(x) (x)
#endif
This macro is transformed to __builtin_expect. In GCC documentation,
You may use __builtin_expect to provide the compiler with branch prediction information.
The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0)) foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
Ultimately the secret is in the compiled code. Let’s compile a small program and take a look at the generated code:
#include <iostream>
int cond() { return 1; }
void bar() { std::cout << "bar\n"; }
void zoo() { std::cout << "zoo\n"; }
void foo(int x) {
// we don't expect to call bar
if (__builtin_expect(x, 0)) {
bar();
} else {
zoo();
}
}
int main(int argc, char **argv) {
foo(argc);
}
Save this code to t.cc, and compile it with g++ -g -O1 t.cc. Note that we must enable optimization to enable branch predication.
Instead of passing a static value to foo, we pass in argc to disable inlining. Let’s check the assembly of function foo:
(gdb) Dump of assembler code for function foo(int):
<+0>: sub $0x8,%rsp
<+4>: test %edi,%edi
<+6>: jne 0x894 <foo(int)+18>
<+8>: callq 0x861 <zoo()>
<+13>: add $0x8,%rsp
<+17>: retq
<+18>: callq 0x840 <bar()>
<+23>: jmp 0x88f <foo(int)+13>
End of assembler dump.
Run echo disassemble foo | gdb a.out
The call zoo is raised to the next instruction of x != 0. If we switch to use
void foo(int x) {
// we expect to call bar
if (__builtin_expect(x, 1)) {
bar();
} else {
zoo();
}
}
The generated code is
(gdb) Dump of assembler code for function foo(int):
<+0>: sub $0x8,%rsp
<+4>: test %edi,%edi
<+6>: je 0x894 <foo(int)+18>
<+8>: callq 0x840 <bar()>
<+13>: add $0x8,%rsp
<+17>: retq
<+18>: callq 0x861 <zoo()>
<+23>: jmp 0x88f <foo(int)+13>
End of assembler dump.
(gdb) quit
where the code has placed bar close to x != 0.
What UNLIKELY does is actually simple: it arranges instructions in the else branch to make it closer to instructions generated from if‘s condition. This optimization makes instruction execution faster because processors execute instructions in pipelines, and the arrangement of the code makes it unlikely for processors to wastefully load instructions and then flush them for failed branch prediction.