BSD DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Diving into Gcc: OpenBSD and m88k

by Miod Vallat
10/02/2003

This article describes how the m88k-specific backend of the GNU C compiler, gcc, was fixed, from the discovery and analysis of the problems to the real fixing work. Since it started with almost zero gcc internals knowledge, it should be understandable by anyone able to read C code, and proves that diving into gcc is not as hard as one could imagine.

Most of the code snippets displayed below come from gcc 2.95 sources, in the m88k specific code (found inside gcc/config/m88k/m88k.*). For more details about the gcc internals, the reader is welcome to refer to the Resources.

Some Background History

The Motorola 88000 architecture is not well-known today. Think of it as a bridge between the famous Motorola 68000 family and the well-known PowerPC family. Although now a dead architecture, many fine m88k-based systems were produced from 1988 to 1992, such as the Data General Aviion workstations and Motorola's own embedded systems.

Due to the elegance of its design and the availability of second-hand machines, the m88k systems became and remain quite popular among hobbyists. No wonder that several free operating systems are ported, or are being ported to, these machines. The most advanced effort was the OpenBSD/mvme88k port to the Motorola VME boards.

Nivas Madhur started the OpenBSD/mvme88k port in 1995. Back then, OpenBSD would still ship with gcc 2.7.2.1 and local patches (these were the days!), and the problems Nivas had to face were dire kernel bugs, so no real effort was done on the toolchain. Optimization was disabled in order to prevent compiler bugs, if any, from interfering.

Nivas Madhur eventually stopped working on the port. Dale Rahn integrated it into the OpenBSD main sources, but Dale did not have the resources to maintain it. Steve Murphree, Jr eventually took over. At some point, OpenBSD started to use gcc 2.8, which did not fix the optimizer problems.

Steve was very close to producing an OpenBSD/mvme88k release. Unfortunately, some weeks before the code freeze, the in-tree gcc was updated to egcs 1.1 and compiler problems started to plague the port: even at -O0 non-optimization level, the compiler would not always output correct code. As the kernel was compiled with -O as an exception to the -O0 rule, people started running unreliable kernels.

At some point, the userret() code path factorization in the OpenBSD kernel for all architectures required kernels to be built with optimization, relying on userret() being an inline function. Unfortunately, gcc, by design, will never inline functions (even if explicitly requested) at -O0. The non-return point had been crossed: gcc had to be fixed.

Starting Debugging

Finding and fixing compiler problems is never easy. It's like climbing a mountain: you need a good rope, and good assets. Moreover, a debugger is mostly useless: you're not trying to find why code behaves incorrectly, but rather what causes gcc to produce incorrect code.

In my case, I made sure to keep a known working gcc 2.8 binary in a safe place, which could be used to bootstrap gcc 2.95 first, then as a working reference. I also made sure I had a good backup.

Stack Me Harder

After compiling gcc 2.95 with gcc 2.8, my first test was to compile and run a kernel and hope for the best. After a long compilation, my hopes were smashed very quickly: the kernel failed very early in an assertion:

panic: kernel diagnostic assertion "obj == NULL || anon == NULL" failed:
  file "/usr/src/sys/uvm/uvm_page.c", line 899

dropping me into the debugger. Yet the function parameters from the traceback were apparently correct!

From the assertion message and the debugger traceback, I could easily reconstruct the code flow. Since the assertion failure would happen in an uvm_pagealloc_start invocation, I built the following simple program to reproduce a similar flow:

$ cat assert.c
#include <stdio.h>
#include <sys/types.h>

#define KASSERT(e)      ((e) ? (void) 0 : __assert( __FUNCTION__, #e))

void
__assert(const char *funcname, const char *error)
{
    printf("Assertion failed in %s: %s\n", funcname, error);
    /* exit(0); */
}

void *
uvm_pagealloc_strat(void *obj, u_int64_t off, void *anon, int flags, int strat,
    int free_list)
{
    KASSERT(anon == NULL);

    return obj;
}

void *
uvm_pagealloc(void *obj, u_int64_t off, void *anon, int flags)
{
    KASSERT(anon == NULL);

    return obj;
}

main()
{
    char *pg;
    char *kobj = "kobj";

    pg = uvm_pagealloc(kobj, 0, NULL, 0);
    pg = uvm_pagealloc(kobj, 0, NULL, 0);
    pg = uvm_pagealloc_strat(kobj, 0, NULL, 0, 0, 0);
    pg = uvm_pagealloc(kobj, 0, NULL, 0);
}
$

When compiled with gcc 2.95, this program reproduced the problem:

$ gcc295 -O0 -o assert assert.c
$ ./assert
Assertion failed in uvm_pagealloc: anon == NULL
$

gcc 2.8 worked fine:

$ gcc28 -O0 -o assert assert.c
$ ./assert
$

More interestingly, the assertion would only fail for the first call but not afterwards. This would hint toward either incorrect stack or incorrect register usage, eventually corrected by the side effects of multiple function calls.

After some tinkering, I finally ended with this interesting sample program:

$ cat eighty2.c
#include <stdio.h>
#include <sys/types.h>

void
odd64(int oddmaker, u_int64_t stamper, int value)
{
    printf("odd64: value=%d\n", value);
}

void
odd32(int oddmaker, u_int32_t stamper, int value)
{
    printf("odd32: value=%d\n", value);
}

void
even64(int oddmaker, int evenmaker, u_int64_t stamper, int value)
{
    printf("even64: value=%d\n", value);
}

void
even32(int oddmaker, int evenmaker, u_int32_t stamper, int value)
{
    printf("even32: value=%d\n", value);
}

main()
{
    odd64(0, 0, 1);
    odd32(0, 0, 1);

    even64(0, 0, 0, 1);
    even32(0, 0, 0, 1);
}
$

When run, this program would produce:

$ gcc295 -O0 -o eighty2 eighty2.c
$ ./eighty2
odd64: value=3
odd32: value=1
even64: value=1
even32: value=1
$

instead of:

$ gcc28 -O0 -o eighty2 eighty2.c
$ ./eighty2
odd64: value=1
odd32: value=1
even64: value=1
even32: value=1
$

The problem was apparently tied to the use of a 64 bit argument, but only in some cases. Why?

Let's examine the m88k calling convention for these routines. The canonical m88k calling convention mandates that the arguments are passed in registers r2 to r9, with extra arguments passed on the stack. If an argument can not fit in one register (such as a double, or an int64_t), it will be put in two consecutive registers starting at an even number, so that double word load and store instructions can be used. In our case, the calling convention would be:

Calling convention for even64()

  r2 - oddmaker
  r3 - evenmaker
  r4, r5 - stamper
  r6 - value

Calling convention for odd64()

  r2 - oddmaker
  r3   (wasted)
  r4, r5 - stamper
  r6 - value

However, looking at the code generated by gcc 2.95, odd64() would be invoked with

  r2 - oddmaker
  r3   (unused)
  r4, r5 - stamper
  stack - value

Why would the last parameter be passed on the stack with the new compiler?

Let's dive in to the gcc sources. A large piece of code in gcc/calls.c is responsible for function invocations, choosing where to pass arguments, whether on the stack or in a specific register. To do this, it relies upon a set of macros provided by the processor-dependent backend of gcc: the FUNCTION_ARG macro set.

In this particular case, the macros misbehave as soon as they encounter a 64-bit parameter, for which it is necessary to skip and waste an odd-numbered register. As such, the problem probably lies in FUNCTION_ARG or FUNCTION_ARG_ADVANCE. The first macro will decide where to put the argument while the second one updates a position counter for the next FUNCTION_ARG update to know at which register number or stack location to start.

A simple grep through the gcc sources shows that gcc 2.8 invokes FUNCTION_ARG_ADVANCE in five places:

$ grep FUNCTION_ARGS_ADVANCE *.c
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, TYPE_MODE (type), type,
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, Pmode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
function.c:      FUNCTION_ARG_ADVANCE (args_so_far, promoted_mode,
$

as does gcc 2.95:

$ grep FUNCTION_ARGS_ADVANCE *.c
calls.c:      FUNCTION_ARG_ADVANCE (*args_so_far, TYPE_MODE (type), type,
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, Pmode, (tree) 0, 1);
calls.c:      FUNCTION_ARG_ADVANCE (args_so_far, mode, (tree) 0, 1);
function.c:      FUNCTION_ARG_ADVANCE (args_so_far, promoted_mode,
$

but note how the first invocation uses a pointer dereference now? Of course, the FUNCTION_ARG_ADVANCE macro is not entirely expansion-safe:

$ cd config/m88k
$ head -1069 m88k.h | tail -18
/* A C statement (sans semicolon) to update the summarizer variable
   CUM to advance past an argument in the argument list.  The values
   MODE, TYPE and NAMED describe that argument.  Once this is done,
   the variable CUM is suitable for analyzing the *following* argument
   with `FUNCTION_ARG', etc.  (TYPE is null for libcalls where that
   information may not be available.)  */
#define FUNCTION_ARG_ADVANCE(CUM, MODE, TYPE, NAMED)                    \
  do {                                                                  \
    enum machine_mode __mode = (TYPE) ? TYPE_MODE (TYPE) : (MODE);      \
    if ((CUM & 1)                                                       \
        && (__mode == DImode || __mode == DFmode                        \
            || ((TYPE) && TYPE_ALIGN (TYPE) > BITS_PER_WORD)))          \
      CUM++;                                                            \
    CUM += (((__mode != BLKmode)                                        \
             ? GET_MODE_SIZE (MODE) : int_size_in_bytes (TYPE))         \
            + 3) / 4;                                                   \
  } while (0)
$

Notice how CUM, the first parameter, is used unprotected? Guess what happens in the CUM++; statement when CUM happens to be *args_so_far? Our exact problem! Compiling a call to odd64() would trigger the CUM++; statement from the macro invocation using the pointer. This is just one more bug caused by unnoticed preprocessor-unsafe code, especially since it had been working correctly in previous gcc versions.

As a result of the bug, args_so_far would end up incremented, pointing to a semi-random memory location holding a huge value. As a result, subsequent FUNCTION_ARG invocations would consider that all the r2-r9 registers are in use, placing the remaining arguments on the stack.

That's not all. There is another bug left in there. Look more closely at the macro expansion:

enum machine_mode __mode = (TYPE) ? TYPE_MODE (TYPE) : (MODE);      \
...
    CUM += (((__mode != BLKmode)                                        \
             ? GET_MODE_SIZE (MODE) : int_size_in_bytes (TYPE))         \

It is obvious that the GET_MODE_SIZE macro should be applied to the revisited __mode, not the initial MODE. Yet people keep telling me gcc 2.95 works for them under SysV/m88k. Happy fellows.

I eventually rewrote this macro as a function in my tree, as it had more bugs and needed to follow the FUNCTION_ARG logic more closely, which would make it too big to be worth kept as a macro:

/* Update the summarizer variable CUM to advance past an argument in
   the argument list.  The values MODE, TYPE and NAMED describe that
   argument.  Once this is done, the variable CUM is suitable for
   analyzing the *following* argument with `FUNCTION_ARG', etc.  (TYPE
   is null for libcalls where that information may not be available.)  */
void
m88k_function_arg_advance (args_so_far, mode, type, named)
     CUMULATIVE_ARGS *args_so_far;
     enum machine_mode mode;
     tree type;
     int named;
{
  int bytes;

  if ((type != 0) &&
      (TREE_CODE(type) == RECORD_TYPE || TREE_CODE(type) == UNION_TYPE))
    mode = BLKmode;
  bytes = (mode != BLKmode) ? GET_MODE_SIZE (mode) : int_size_in_bytes(type);
  if ((*args_so_far & 1) && (mode == DImode || mode == DFmode
       || ((type != 0) && TYPE_ALIGN (type) > BITS_PER_WORD)))
    (*args_so_far)++;
  (*args_so_far) += (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
}

Pages: 1, 2

Next Pagearrow





Sponsored by: