oreilly.comSafari Books Online.Conferences.


Getting Familiar with GCC Parameters
Pages: 1, 2, 3, 4

Options Related To Function Calling

gcc basically offers you several ways to manage how a function is called. Let's take a look at inlining first. By inlining, you reduce the cost of a function call because the body of the function is directly substituted into the caller. Please note that this is not done by default, only when you use -O3 or at least -finline-functions.

How does the finished binary look when gcc does inlining? Observe Listing 2:


inline test(int a, int b, int c)
        int d;
        printf("%d * %d * %d is %d\n",a,b,c,d);

static inline test2(int a, int b, int c)
         int d;
         printf("%d + %d + %d is %d\n",a,b,c,d);

int main(int argc, char *argv[])

Listing 2. Inline in action

Compile Listing 2 using the following parameter:

$ gcc -S -O3 -o <result-file.s> <listing-2.c>

-S makes gcc stop right after compilation stage (we'll cover it later in this article). The results are as follows:

        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        movl    $6, 16(%esp)
        movl    $3, 12(%esp)
        movl    $2, 8(%esp)
        movl    $1, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        movl    $15, 16(%esp)
        movl    $6, 12(%esp)
        movl    $5, 8(%esp)
        movl    $4, 4(%esp)
        movl    $.LC1, (%esp)
        call    printf

Both test() and test2() are indeed inlined, but you also see test(), which stays outside main(). This is where the static keyword plays a role. By saying a function is static, you tell gcc that this function won't be called by any outside object file, so there is no need to emit the codes on its own. Thus, it is a space saver if you can mark them as static whenever possible. On the other hand, be wise when deciding which function should be inlined. Increasing size for a small speedup isn't always worthwhile.

With certain heuristics, gcc decides whether a function should be inlined or not. One of the considerations is the function size in term of pseudo-instructions. By default, the limit is 600. You can change this limit via -finline-limit. Experiment to find better inline limits for your own case. It is also possible to override the heuristics so gcc always inlines the function. Simply declare your function like this:

__attribute__((always_inline)) static inline test(int a, int b, int c)

Now, on to parameter passing. In x86 architectures, parameters are pushed to the stack and later popped inside the function for further processing. But gcc gives you a chance to change this behavior and instead use registers. Functions with up to three parameters could use this feature by passing -mregparm=<n>, where <n> is the number of registers we want to use. If we apply this parameter (n=3) to Listing 2, take out the inline attribute, and use no optimization, we get this:

        pushl   %ebp
        movl    %esp, %ebp
        subl    $56, %esp
        movl    %eax, -20(%ebp)
        movl    %edx, -24(%ebp)
        movl    %ecx, -28(%ebp)
        movl    $3, %ecx
        movl    $2, %edx
        movl    $1, %eax
        call    test

Instead of stack, it uses EAX, EDX, and ECX to hold the first, second, and third parameter. Because register access time is faster than RAM, it is one way to reduce runtime. However, you must pay attention to these issues:

  • You MUST compile all your code with the same -mregparm register number. Otherwise, you will have trouble calling functions on another object file since they assume different calling conventions.
  • By using -mregparm, you basically break the Intel x86-compatible Application Binary Interface (ABI). Therefore, you should mention it when you distribute your software in binary only form.

You probably notice this kind of sequence at the beginning of every function:

push   %ebp
mov    %esp,%ebp
sub    $0x28,%esp

This sequence, also known as the function prologue, is written to set up the frame pointer (EBP). It is useful to help the debugger do a stack trace. The structure below helps you visualize this [6]:

[ebp-01] Last byte of the last local variable
[ebp+00] Old ebp value
[ebp+04] Return address
[ebp+08] First argument

Can we omit it? Yes, with -fomit-frame-pointer, the prologue will be shortened so the function just begins with a stack reservation (if there are local variables):

sub    $0x28,%esp

If the function gets called very frequently, cutting out the prologue saves your program several CPU cycles. But be careful: by doing this, you also make it hard for the debugger to investigate the stack. For example, let's add test(7,7,7) at the end of test2() and recompile with -fomit-frame-pointer and no optimization. Now fire up gdb to inspect the binary:

$ gdb inline
(gdb) break test
(gdb) r
Breakpoint 1, 0x08048384 in test ()
(gdb) cont
Breakpoint 1, 0x08048384 in test ()
(gdb) bt
#0  0x08048384 in test ()
#1  0x08048424 in test2 ()
#2  0x00000007 in ?? ()
#3  0x00000007 in ?? ()
#4  0x00000007 in ?? ()
#5  0x00000006 in ?? ()
#6  0x0000000f in ?? ()
#7  0x00000000 in ?? ()

On the second call of test, the program is stopped and gdb prints the stack trace. Normally, main() should come up in Frame #2, but we only see question marks. Recall what I said about the stack layout: the absence of a frame pointer prevents gdb from finding the location of the saved return address in Frame #2.

Pages: 1, 2, 3, 4

Next Pagearrow

Sponsored by: