In the previous articles, we explained what was necessary in order to run both statically and dynamically linked Linux binaries on the NetBSD/PowerPC platforms, but we have only worked on program launching and system calls.

Because most real-life programs cannot live without signals, it's time to focus on the way signals are handled in Linux emulation. At the end of this article, we will also have a look at a few system-call-specific bug fixes.

Signals

Now that dynamic executables work and get their arguments, the next important step is setting up signal handling. Signals are the interactions from the kernel to the user program. The kernel will have signals pending for a process for various reasons: the process made a memory fault, it has data ready for asynchronous I/O, or the user decided to suspend the process.

A process may choose to trap a signal or not. If a signal is delivered to a process that does not trap for that signal, the kernel will apply a default behavior that halts the program and eventually dumps the core. This is the easy part because there is no actual interaction between the kernel and the process.

The process can choose to trap a given signal. It does this using the signal() or sigaction() system call to install a signal handler for a given set of signals. When the kernel has a signal to deliver to the process, it will go to user space to run the signal handler. This is slightly more complicated, because the kernel has to trick the process so that it runs the signal handler on its return to user space.

Here is how the job is done: To make a system call, the program uses a software exception instruction. This instruction transfers control to the kernel and a trap frame is built on the kernel stack (this is done by locore.S or by the hardware, depending on the architecture). The trap frame holds the register's values so it is possible to return control to the user process.

When the kernel has to invoke a signal handler, it has to modify the context stored in the trap frame so that on return to user space, the process will jump to a "signal trampoline" set up on the user stack. This signal trampoline invokes the signal handler, and also sets up the signal trampoline code and a signal context expected by the signal handler on the user stack. All of this is done by the sendsig() function in the kernel.

After the signal handler returns, the trampoline code calls the sys_sigreturn() system call. This system call must undo everything that was done on the user stack to invoke the signal handler.

Previously in this series:

This is the global picture for NetBSD signal handling. Let us now look at what differs between invoking a signal handler in a Linux process and in a NetBSD process. The only thing that the user process will see is the user stack, and even in the user stack, there is no reason why it should assume a particular layout beyond the signal context. We can use NetBSD code to set up the user stack, except for the signal context, which must be defined in a Linux fashion.

The job can therefore easily be done by picking up the NetBSD signal trampoline from sys/arch/powerpc/powerpc/sigcode.S, and the NetBSD sendsig() and sys_sigreturn() implementations, which can be found in sys/arch/powerpc/powerpc/sig_machdep.c for the PowerPC port of NetBSD.

Of course, this code needs to be adjusted so that a few necessary translations are done between the Linux and NetBSD structures, and we must set up a struct linux_sigcontext instead of a native NetBSD struct sigcontext on the user stack. The modified code fits in sys/compat/linux/arch/powerpc/linux_machdep.c.

After a few mistakes in the way the stack is set up, the emulated program should be able to catch signals. Here is a sample program that tests signal-catching:

/*
 * signal.c - A signal-catching test program
 */
#include <stdio.h>
#include <unistd.h>
#include <signal.h>

void *func (int, struct sigcontext *);
void *func2 (int, struct sigcontext *);

int main (int argc, char **argv) {
  printf ("Starting execution\n");
  if (signal (SIGHUP, *func))
   perror ("signal() failed");
  if (signal (SIGINT, *func2))
   perror ("signal() failed");
  printf ("signal() successful. Now sleeping\n");
  while (1)
   sleep (600);
  printf ("I should not come here\n");
  return 0;
}

void *func (int sig, struct sigcontext *scp) {
  printf ("Signal Handler: sig=%d scp=0x%lx\n", sig, (unsigned long)scp);
  printf ("context.signal=0x%lx\n", (struct sigcontext*)scp->signal);
  printf ("context.handler=0x%lx\n", (struct sigcontext*)scp->handler);
  printf ("context.oldmask=0x%lx\n", (struct sigcontext*)scp->oldmask);
  pause ();
  printf ("func() exitting\n");
  sleep (2);
  return NULL;
}

void *func2 (int sig, struct sigcontext *scp) {
  printf ("Signal Handler: sig=%d scp=0x%lx\n", sig, (unsigned long)scp);
  printf ("context.signal=0x%lx\n", (struct sigcontext*)scp->signal);
  printf ("context.handler=0x%lx\n", (struct sigcontext*)scp->handler);
  printf ("context.oldmask=0x%lx\n", (struct sigcontext*)scp->oldmask);
  printf ("func2() exitting\n");
  return NULL;
}

This program should output text when sent a kill -1, and then sleep. A new kill -1 should have no effect, until a kill -2 is sent.

Tuning: Fixing system-call-specific issues

A simple bug fix: ioctl() issues
Now the time has come to try running real Linux binaries, and see what happens. We discover many small problems here. For example, the Linux ioctl() TIOCGETA and TIOCGWINSZ fails without any reason.

ioctl() is used to make non-standard operations on devices. It is widely used to get and set terminal parameters. For example, ioctl() TIOCGETA is used to get the terminal's struct termios, and ioctl() TIOCGWINSZ is used to get the terminal window size. If you need more information about ioctl(), refer to the ioctl(2) man page.

After some investigation with ktrace(1), it is obvious that the ioctl com argument was wrong: Linux tried to do a ioctl() TIOCGETA, and NetBSD understood another ioctl() (and thus, it failed). This is caused by a struct linux_termios mismatch.

The ioctl com parameters are calculated on the ioctl type (read, write, read/write, or nothing), its group (the letter in the ioctl definition), its number, and the size of the third argument to ioctl(). Here the problem is that in our NetBSD definition, the struct linux_termios is not the same size than the real Linux's struct termios. This happens because the struct linux_termios is defined in sys/compat/linux/common/linux_termios.h. It is considered to be architecture-independent, but it is not. Moving the definition to an architecture-dependent file fixes the problem.

One fake bug: lstat() issues
There are also fake problems. For example, lstat() fails with glibc-2. A program build on a glibc-1 LinuxPPC system worked fine on the Linux system with glibc-1, but it broke on NetBSD when using glibc-2. If I had a glibc-2 LinuxPPC system to try out my binary built on a glibc-1 LinuxPPC system, I would have been able to understand quickly that the failure was normal: A program using lstat() and dynamically linked against glibc-1 cannot work with glibc-2. Let's study why it failed.

glic-2.1.3 sources are available here. Alternatively, you can browse the source using CVSWeb.

Here is a simple program that tests lstat(). It was build on a LinuxPPC system that uses glibc-1.

/*
 * lstat.c - A lstat() tester
 */
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>

int main (int argc, char **argv) {
 const char *file_name = "/etc";
 struct stat buf;
 int res;

 if (argc >= 2)
  file_name = argv[1];

 res = lstat (file_name, &buf);
 if (res < 0) {
  printf ("res=%d file_name=%s &buf=0x%lx\n", res, file_name, &buf);
  perror ("lstat() failed");
  exit (-1);
 }
 return 0;
}

Now, if we try to use the libc-2.1.3, the same binary will fail. According to the kernel trace, the lstat() system call is successful, but the program gets a -1 return value (errno set to EINVAL). The modification of the result is done with glibc glue. Looking at glibc-2.1.3 sources, we discover there is a mechanism for dealing with the multiple versions of the struct stat that exists on the Linux system (Linux-2.4 defines a struct old_kernel_stat and a struct stat). glibc has to detect the version of the stat structure expected by the program, and if the kernel does not provide that structure, it has to convert it. Here's how it works:

  • lstat() is defined in glibc/io/lstat.c, and it calls __lxstat(), with _STAT_VER as the first argument. This function gets statically linked into the executable, and therefore the _STAT_VER parameter is hard-coded into the executable with a value specific to the struct stat that is expected. When linking with glibc-1.99, the value is 0.

  • __lxstat() is defined in glibc/sysdeps/unix/sysv/linux/lxstat.c, it tests the first arguments (it calls it vers), return if it is _STAT_VER_KERNEL, or calls xstat_conv, giving it vers as first argument if not (xstat_conv is called with _STAT_VER). The call from lstat() to _lxstat() is dynamic. __lxstat() compares the vers version to _STAT_VER_KERNEL that is specific to the current kernel's struct stat. On glibc-2.1.3, this value is "3."

  • xstat_conv() is defined in glibc/sysdeps/unix/sysv/linux/xstatconv.c. Its job is to convert the kernel's struct stat into what the executable expects. It checks two possibilities about the vers parameter:

    • If it is equal to _STAT_VER_KERNEL, just return
    • If it is equal to _STAT_VER_LINUX, the struct old_kernel_stat is converted to a struct stat, and we return.
    • Otherwise, return an error (EINVAL).

Obviously, when running on a glibc-2 system -- a binary linked with glibc-1 -- we are hitting the "otherwise" case in xstat_conv(). The conclusion is that glibc-2 does not expect the user to use lstat() in a binary built for glibc-1. Building the binary on a glibc-2 Linux system fixes the problem, and the binary works fine with NetBSD's Linux emulation. There was no fix to do in the NetBSD emulation code, so we could consider it a glibc-2 bug.

open() unable to create files

This is a really annoying bug: The bug causes open() to ignore the O_CREAT flag. Therefore, open() system calls requiring a file creation fail because the file does not exist. The reason is silly: In Linux's fnctl.h, the O_CREAT flag definition is like this: #define O_CREAT 0100. Looking at it, if you do not use C octal notation every day, you may think that this is a hexadecimal value, and that the Linux code adds the leading "0x" where it needs to use this value. Therefore, you might write this in NetBSD's linux_fcntl.h file: #define LINUX_O_CREAT 0x0100

If you use octal notation, just remember that in C, "0100" means 100 in octal, which is 40 in hexadecimal. You may think this the silliest mistake described in this document. Well, I did it so I hope this section will be useful for people who have forgotten how to define an octal value in C.

X11 client failures

Everything starts with trying to run an X client. All programs fail the same way, with the same error. This simple program was able to reproduce the problem:

/*
 * simplex.c -- A simple X tester
 * build with gcc -I/usr/X11R6/include -L/usr/X11R6/lib
 * -lX11 -o simplex simplex.c 
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <X11/Xlib.h>

int main (int argc, char **argv) {
        Display *display;

        if (!(display = XOpenDisplay (argv[1]))) {
                perror ("XOpenDisplay");
                exit (1);
        }
}

When executed, this code produces the following error:

XIO: fatal IO error -11 (Unknown error 4294967285) on
     X server "10.0.12.137:0.0" after 0 requests (0 
     known processed) with 0 events remaining.

This problem is a side effect of a nasty bug in the way errno was handled. Here the program expects no error, or a errno = 11 (EAGAIN); but in fact, it gets errno = -11, which does not mean anything to a Linux binary. The test program was thus confused, and claimed that an unknown error occured. In fact, the program got the good errno, except that it was negative. The following test program highlights the bug:

/*
 * errno tester
 */
#include <stdio.h>
#include <unistd.h>
extern int errno;

int main (int argc, char **argv) {
        int dontcare;

        dontcare = setuid(0);
        printf ("errno = %d\n", errno);

        return 0;
}

Natively on Linux, this program output 1, and emulated on NetBSD/PowerPC, it did -1, thus demonstrating the bug.

There is a reason for handling negative error numbers. Linux uses negative error codes inside the kernel. On most platforms, this negative code is returned to the user, and glibc converts it to a positive errno, which is what a userland Unix program expects.

This operation can be found in the glibc sources. On most platforms, errno is set through the use of the __set_errno macro in the INLINE_SYSCALL macro, which is used as a wrapper for all system calls. For Linux/i386, this is defined in sysdeps/unix/sysv/linux/i386/sysdep.h.

In i386, ARM, and m68k Linux, __set_errno is used with a minus sign, so that the negative error code returned by the kernel turns into a positive errno:

__set_errno (-_sys_result);

On the PowerPC, things are quite different. The Linux kernel returns a positive value. When the kernel returns an error, glibc system call handlers jump to the __syscall_error() function, which is defined in sysdeps/unix/sysv/linux/powerpc/sysdep.c. This function sets errno using the __set_errno macro, but here there is no minus sign:

int
__syscall_error (int err_no)
{
  __set_errno (err_no);
  return -1;
}

In its Linux emulation, NetBSD mimics the Linux way of using negative error codes inside the kernel, and returns negative error codes to userland. This is okay for i386, alpha, and m68k, but it causes a bug on the PowerPC platform, because Linux's libc expects the kernel to return a positive errno, and does not make it positive if it is negative.

So let's have a closer look on how error numbers are handled in NetBSD's Linux emulation. Most error numbers are defined in sys/compat/linux/common/linux_errno.h, and some architecture-dependent error numbers are defined in sys/compat/linux/arch/powerpc/linux_errno.h, for the PowerPC port.

These error codes are used in an array that translates native NetBSD error codes to Linux error codes. This is the native_to_linux_errno array, which is built in sys/compat/linux/common/linux_errno.c. Here are the first four lines of the array definition:

const int native_to_linux_errno[] = {
   0,
   -LINUX_EPERM,
   -LINUX_ENOENT,
   -LINUX_ESRCH,
(snip)

This array is used in sys/compat/linux/common/linux_exec.c as the e_errno field of the struct emulsw that is defined in sys/sys/proc.h). This later e_errno field is used when leaving the kernel, in sys/arch/powerpc/powerpc/trap.c:trap().

if (p->p_emul->e_errno)
         error = p->p_emul->e_errno[error]; 
frame->fixreg[FIRSTARG] = error;

Everything is now architecture-independent in the way the errno is handled except the final step in trap(). To make the errno positive on return to userland, we have two options. First, modify trap() so that if the current program is a Linux binary, the errno is made positive before returning to userland. This would make the above code look something like this:

#ifdef COMPAT_LINUX 
if (p->p_emul == &emul_linux)
        /*
         * Linux uses negative errno in kernel, but   
         * returns a positive errno to userland.  
         */ 
        frame->fixreg[FIRSTARG] = -error; 
else
        frame->fixreg[FIRSTARG] = error; 
#else 
frame->fixreg[FIRSTARG] = error; 
#endif

The other option is to make all errno values positive for the PowerPC in sys/compat/linux/common/linux_errno.c. That latter option may seem like a bad choice because it requires the modification of an architecture-independent source file in order to fix an architecture-dependent problem. On the other hand, modifying trap.c is just fixing an architecture-dependent problem in an architecture-dependent file, so it does not have this drawback.

Introducing positive numbers in linux_errno.c turns out to be the best choice because other Linux ports could have the same problem. Having the ability to choose the errno sign in a machine-dependent header file without adding tests in the machine-dependent code was therefore a good idea. It is achieved by introducing a LINUX_SCERR_SIGN macro in the architecure-dependent linux_errno.h, which is - for all ports that need a negative errno value to be returned to userland, and + for ports that need a positive errno values. So far, the + only applies to the PowerPC.

This is how the native_to_linux_errno array then gets defined in sys/compat/linux/common/linux_errno.c:

const int native_to_linux_errno[] = {
   0,
   LINUX_SCERR_SIGN LINUX_EPERM,
   LINUX_SCERR_SIGN LINUX_ENOENT,
   LINUX_SCERR_SIGN LINUX_ESRCH,
(snip)

With this fix, errno is correctly handled on the PowerPC, and X binaries (and other programs) are fixed! After this last bug fix, Linux compatibility on NetBSD/PowerPC reaches a state where it is possible to run interesting real-life Linux binaries such as Netscape Communicator.


Return to ONLamp.com.