Notes on Folly: How UNLIKELY Macro Works

2020-09-17
c++

Have you seen the UNLIKELY annotation before? How does it work behind the scene?

Folly code base extensively uses a macro called UNLIKELY. For instance, this is a code snippet in Notes on Folly: Thread Pool Executors:

if (UNLIKELY(!task || task.value().poison)) {
  // Actually remove the thread from the list.
  SharedMutex::WriteHolder w{&threadListLock_};
  ...
}

Similarly, Linux kernel also uses such instruments. How does it work?

Linux kernel also uses __builtin_expect.

Let’s take a look at how the macro is defined in folly:

#if defined(__GNUC__)
#define LIKELY(x) (__builtin_expect((x), 1))
#define UNLIKELY(x) (__builtin_expect((x), 0))
#else
#define LIKELY(x) (x)
#define UNLIKELY(x) (x)
#endif

This macro is transformed to __builtin_expect. In GCC documentation,

You may use __builtin_expect to provide the compiler with branch prediction information.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

if (__builtin_expect (x, 0))
  foo ();

indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);

when testing pointer or floating-point values.

Ultimately the secret is in the compiled code. Let’s compile a small program and take a look at the generated code:

#include <iostream>
int cond() { return 1; }
void bar() { std::cout << "bar\n"; }
void zoo() { std::cout << "zoo\n"; }
void foo(int x) {
  // we don't expect to call bar
  if (__builtin_expect(x, 0)) {
    bar();
  } else {
    zoo();
  }
}

int main(int argc, char **argv) {
  foo(argc);
}

Save this code to t.cc, and compile it with g++ -g -O1 t.cc. Note that we must enable optimization to enable branch predication.

Instead of passing a static value to foo, we pass in argc to disable inlining. Let’s check the assembly of function foo:

(gdb) Dump of assembler code for function foo(int):
   <+0>:     sub    $0x8,%rsp
   <+4>:     test   %edi,%edi
   <+6>:     jne    0x894 <foo(int)+18>
   <+8>:     callq  0x861 <zoo()>
   <+13>:    add    $0x8,%rsp
   <+17>:    retq
   <+18>:    callq  0x840 <bar()>
   <+23>:    jmp    0x88f <foo(int)+13>
End of assembler dump.

Run echo disassemble foo | gdb a.out

The call zoo is raised to the next instruction of x != 0. If we switch to use

void foo(int x) {
  // we expect to call bar
  if (__builtin_expect(x, 1)) {
    bar();
  } else {
    zoo();
  }
}

The generated code is

(gdb) Dump of assembler code for function foo(int):
   <+0>:     sub    $0x8,%rsp
   <+4>:     test   %edi,%edi
   <+6>:     je     0x894 <foo(int)+18>
   <+8>:     callq  0x840 <bar()>
   <+13>:    add    $0x8,%rsp
   <+17>:    retq
   <+18>:    callq  0x861 <zoo()>
   <+23>:    jmp    0x88f <foo(int)+13>
End of assembler dump.
(gdb) quit

where the code has placed bar close to x != 0.

What UNLIKELY does is actually simple: it arranges instructions in the else branch to make it closer to instructions generated from if‘s condition. This optimization makes instruction execution faster because processors execute instructions in pipelines, and the arrangement of the code makes it unlikely for processors to wastefully load instructions and then flush them for failed branch prediction.