This document deals with the main problems encountered when implementing Linux binary compatibility for PowerPC-based NetBSD ports. It is intended to document various parts of the emulation subsystem, and to highlight some architecture-dependent issues that can arise in argument passing, signal handling, and with the way some system calls work. I hope it will help potential developers to do further work on the NetBSD binary compatibility framework.
Most, if not all, of this paper is intended for technically oriented readers. It is assumed that the reader has some understanding of the C programming language and has a good understanding of how processes are managed on a Unix system. Information about this, and much more, can be found in Design and Implementation of the 4.4BSD Operating System, or The Linux Kernel.
In this part, we will introduce Linux emulation and the way it is implemented. Then we will describe the different steps required in order to run statically linked Linux binaries on a NetBSD/PowerPC system.
Some programs such as Netscape or Sun's JDK are not distributed with
source code, so it is not possible to port them to NetBSD. We have to make
do with a Linux binary, sometime a FreeBSD binary, but never a NetBSD
binary. Nevertheless, users want these kind of applications to run on
their NetBSD machines. To address this problem, Linux compatibility was
developed on NetBSD. This Linux emulation is available through the
COMPAT_LINUX kernel option on the NetBSD ports that support it (i386,
alpha, and m68k). The compatibility subsystem emulates Linux system
calls, and not the program itself. From the Linux program's point of
view, the NetBSD kernel just looks like the Linux kernel. The Linux
binary is thus able to run on NetBSD, at normal CPU speed. All its
system call are intercepted and mapped to native NetBSD system calls.
The overhead of Linux compatibility is hence very small.
A userland executable interacts in only two ways with the kernel. On one hand, we have calls from the executable to the kernel, which are system calls. On the other hand, we have the interaction from the kernel to the executable, which is signal delivery.
In order to emulate Linux binaries, the NetBSD kernel must mimic Linux kernel behaviour for system calls and signal delivery. Signal delivery is the trickiest part of the job, and not all executables actually need signals for normal operation, so we will keep signal handling for later. On the other hand, system calls are mandatory. If you want your program to do simple operations, such as reading a file or writing some text to a terminal, you need to make system calls. You might build an executable that does not make any system calls, but I am not sure that running it will be actually of any interest. So let us talk about system call emulation.
The main idea is to translate system calls. Each system call has a
system call number and some arguments. If you run a Linux binary on
NetBSD without writing any compatibility support in the NetBSD kernel, it
will not work, because the executable will use a system call number
that is incorrect for NetBSD. For instance, let us assume that our
program uses the nice() system call, which is syscall #43 on
Linux/PowerPC. If you run it as a Linux binary on NetBSD/PowerPC, it
will actually call fchflags(), which is the syscall #43 on
NetBSD/PowerPC. And even if the syscall is the same, the arguments will
probably not fit. For instance, Linux will use a 32-bit long where NetBSD
uses a 64-bit long long for the same argument, and this will cause the
program to fail.
The NetBSD kernel must therefore first match the Linux executable. That is, it must recognise it as a Linux binary and not as a NetBSD binary. Then, when the program makes a system call, the NetBSD kernel will translate the Linux system call to a NetBSD system call. Of course, executables matched as NetBSD binaries have their system calls unchanged.
|
For this step, we will need the kernel sources of both NetBSD and Linux. NetBSD kernel sources can be found here, but if you plan to actually work on the kernel sources, you would do better using CVS (see the documentation to learn how to use CVS to track NetBSD-current). You can also browse the source files using CVSWeb.
Linux sources can be found on various FTP sites, for example,
ftp://ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.0.tar.gz for the
2.4 kernel. Grab the latest kernel, which will certainly be something
other than 2.4 when you read this paper. It is not mandatory to get
the latest kernel, but it is better to do so.
First, let us have a look at NetBSD syscalls. They are defined in the
machine-independent part of the kernel sources, in
sys/kern/syscalls.master. This file is used to automatically create the
files sys/kern/syscalls.c, sys/sys/syscall.h, and sys/sys/syscallargs.h.
Each syscall in syscalls.master is basically the system call name with
"sys_" prepended to it. Here are a few lines from the
sys/kern/syscalls.master file:
0 INDIR { int sys_syscall(int number, ...); }
1 STD { void sys_exit(int rval); }
2 STD { int sys_fork(void); }
3 STD { ssize_t sys_read(int fd, void *buf, size_t nbyte); }
4 STD { ssize_t sys_write(int fd, const void *buf, \
size_t nbyte); }
Now, the Linux syscalls: Here the job is a bit more complicated, since
the system call definitions are architecture dependent on Linux. The
different architectures supported by the Linux kernel are in linux/arch.
Each architecture has its directory. For instance, the PowerPC port of
Linux has its machine-dependent source code in linux/arch/ppc/. The
syscalls definition file lives in the kern subdirectory of the
architecture directory, but the name of the file is not the same on all
Linux ports! If you are working on another LINUX_COMPAT port, you can
find the file by greping on system call names, such as mmap() or
uname(). For the PowerPC, the file is linux/arch/ppc/kernel/misc.S. Here
are a few lines from that file :
.long sys_ni_syscall /* 0 - old "setup()" system call */
.long sys_exit
.long sys_fork
.long sys_read
.long sys_write
This Linux file lists all the syscalls, using the syscall number order.
The arguments to the syscalls are not shown. To find out the arguments
of a given system call, you will have to grep for its name in
linux/arch/ppc/kernel and/or linux/kernel, find the function
implementing the system call, and look at the function parameters.
And now, let us move to the compat directory in the NetBSD sources,
which is where we will have to write a few files. For Linux
compatibility on the PowerPC, it is sys/compat/linux/arch/powerpc. Here
we must create a syscalls.master file and fill it with the Linux system
call numbers and the function that implements them in the NetBSD kernel.
The easiest way is by grabbing the syscalls.master file from another port
(I used the syscalls.master from i386 Linux compatibility, which can be
found at sys/compat/linux/arch/i386/syscalls.master), and modify it so
that it reflects Linux syscalls on our target port, here PowerPC.
open() system call (syscall #3) is implemented by the
linux_sys_open() function. Here is the open() system call definition in
Linux compatibility, from sys/compat/linux/arch/i386/syscalls.master:
5 STD { int linux_sys_open(const char *path, int flags, \
int mode); }
This linux_sys_open() wrapper function lives in a file in the
sys/compat/linux/common directory. Its job is to do appropriate argument
translation, and then to transfer control to the sys_open() function of
the NetBSD kernel.
Other Linux system calls are implemented directly by the corresponding
NetBSD system call. This is the case for exit() or fork() (syscalls #1 and #2), which are defined by the sys_exit() and sys_fork() kernel
functions. Here are Linux exit() and fork() definitions, from
sys/arch/compat/linux/i386/syscalls.master:
1 NOARGS { int sys_exit(int rval); }
2 NOARGS { int sys_fork(void); }
Most of the job is quite straightforward: It is just about reordering
system calls. But sometimes, you will find that a given syscall has no
equivalent for the target port. This is true, for example, for the
Linux/i386 vm86() system call, which is left unimplemented in the
sys/compat/linux/arc/powerpc/syscalls.master, using the UNIMPL option in
the second column of the file.
|
Some other syscalls do not work the same way on different architectures,
due to different argument sizes or different argument transmission
mechanisms (in registers vs on stack). For some of them, there are
already alternative implementations of the wrapper function. For
instance, a call to mmap() is implemented by linux_sys_mmap() on the
Alpha, and it is implemented by linux_old_mmap() on the i386.
Now, the idea is to get a good but not perfect syscalls.master, and to
fix problems as they arise later. So once syscalls.master looks good, we
build the linux_syscallargs.h, linux_syscalls.h by typing "make" in
sys/compat/linux/arch/powerpc, and we can start trying to build a
kernel.
Now when we try to build a kernel, of course it will fail, because most of the required source code is still missing, but the idea is that a failed build will tell us which gaps to fill.
First, we want to tell the config(8) tool that we added files to
the kernel. Here we will work on the NetBSD/macppc port, but everything
remains true for other ports. In the sys/arch/macppc/conf directory, we
have a file called files.macppc that lists the files used to build a
kernel for macppc. In order to modularize the compatibility code in the
kernel, we will just add two include statements. These statements will
tell config(8) to include the file describing what is needed for the machine-independent part of Linux compatibility (sys/compat/linux/files.linux),
and the file describing what is needed for the machine-dependent part
(sys/compat/linux/arch/powerpc/files.linux_powerpc):
# Linux binary compatibility (COMPAT_LINUX)
include "compat/linux/files.linux"
include "compat/linux/arch/powerpc/files.linux_powerpc"
There is also the OSS audio compatibility framework, which is required in order to link a kernel with Linux compatibility. This is included by the following lines:
# OSS audio driver compatibility
include "compat/ossaudio/files.ossaudio"
We then have to create the latter files.linux_powerpc file, and fill it
with all the source files created in sys/compat/linux/arch/powerpc so far.
Again, the idea is just to grab the i386 version of that file from
sys/compat/linux/arch/i386, and to comment out or remove every line
referencing files that are not yet in the powerpc directory.
Then we can add the COMPAT_LINUX option to our favourite kernel config
file, and start a kernel build. (If you need some documentation, please
read the documentation here).
Of course it will fail; we expected it. During the various failures, we
can discover that the source code in sys/compat/linux/common needs a lot
of macros prefixed with LINUX_ and a lot of typedefs and struct
definitions prefixed by linux_. The idea is always the same: to
grab the i386 version of the file containing the requested
macro/typedef/struct definition, and to adapt it for the PowerPC.
During this work, the linux/include/asm-ppc and linux/include/linux
directories from the Linux kernel sources will be useful. It is
essential to avoid just copying the i386 version of the different files
needed in the powerpc directory such as linux_termios.h or
linux_types.h. There are very few differences between most of the i386
version and the PowerPC version of the Linux includes we need to define,
but a careful check of every value will avoid lots of trouble finding
out what went wrong later.
After adding a lot of header files, the sys/compat/linux/arch/powerpc
directory starts looking like its i386 counterpart. There are only a few
.c files missing. We then have to define a few functions that are
defined in i386/linux_machdep.c and i386/linux_ptrace.c, else the kernel will not build. Most of the linux_machdep.c file holds functions related to signal delivery, whereas the linux_ptrace.c file holds functions that
enable Linux's gdb use on emulated binaries. Obviously, we don't need
most of this now. So the idea is to write empty functions that just
return zero without actually doing anything. The goal is to have a
kernel that builds, and to add the missing code later.
Remember that each time a .c file is added to the
sys/compat/linux/arch/powerpc directory, it has to be added to
sys/compat/linux/arch/powerpc/files.linux_powerpc, and then, the
config(8) utility must be rerun. This integrates the new file into the
kernel build process. Otherwise, the new file will be ignored.
Once we have a working kernel, we can try our first Linux binary on it. To
do this, we go on a LinuxPPC machine and compile the following
program, linked as a static binary. This is done using the -static flag
with gcc.
/*
* hello.c -- A hello world test
* Build with gcc -static -o hello hello.c
*/
#include <stdio.h>
int main (int argc, char **argv) {
printf ("Hello world!\n");
return 0;
}
Then we try to run the compiled binary on the NetBSD system. Normally, it shouldn't work. Most likely, we get a strange message explaining that a syntax error occurred after a "(", and this sounds like the kernel decided this was a shell-script and gave it to the shell to execute. The dynamic version should just crash, but we will take care of it later.
Our problem is that the kernel was not able to recognise the executable
as a Linux binary. This can be outlined by running ktrace(1) and
kdump(1) on the executable. If the kernel had matched the
executable as a Linux binary, then the kernel trace should contain a
EMUL "linux" record.
So we have to get a working Linux binary-matching mechanism. When
starting a new binary (on execve() calls), the NetBSD kernel performs
some probe tests to find out what to do. Practically, the kernel
maintains a list of struct execsw (struct execsw is defined in
sys/sys/exec.h) describing the available ways of executing a program:
native ELF, native a.out, shell scripts, Linux emulation, and so on.
This list is initialized from sys/kern/exec_conf.c, and is used in
sys/kern/kern_exec.c. A member of the struct execsw is a pointer to a
probe function, whose job is to return 0 if it matches the executable.
For Linux ELF32 emulation, this function is linux_elf32_probe(), which
is implemented in sys/compat/linux/common/linux_exec_elf32.c.
|
This function performs several tests. The first test is the
linux_elf32_signature(), which looks for an interpreter name specific to
Linux. The interpreter is a helper program used to run the executable.
This is the ld.so program used to launch dynamically linked programs.
The linux_elf32_signature() looks in the ELF headers for an interpreter
like /lib/ld.so or /lib/ld-linux.so, which is really Linux-specific. For instance, a NetBSD ELF program uses /usr/libexec/ld.elf_so, and a
System V Release 4 system should use /usr/lib/ld.so.
This test is good for dynamically linked binaries, but it fails for
statically linked binaries, for which there is no interpreter name in
the ELF header. To fix this flaw, there is a second test, enabled by the
LINUX_GCC_SIGNATURE macro, linux_elf32_gcc_signature(), which looks for
a GCC signature in the .comment ELF section of the executable. This is
not a very good test, since this GCC signature is specific to GCC but
not to Linux. Anyway, for some unknown reasons, this test failed on the
PowerPC.
We therefore have to find an alternative way of matching statically
linked Linux binaries. The objdump(1) command is useful to investigate
for such a new method: objdump -h program will dump the ELF section
headers of the program, and objdump -j .name -s program will dump the
content of named section .name. Here is an example of objdump -h output
for a statically linked Linux binary:
$ objdump -h hello
hello: file format elf32-powerpc
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00030930 018000a0 018000a0 000000a0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .init 00000080 018309d0 018309d0 000309d0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .fini 00000028 01830a50 01830a50 00030a50 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
3 .rodata 00003f8c 01830a78 01830a78 00030a78 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 __libc_atexit 00000004 01834a04 01834a04 00034a04 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .sdata2 00000000 01834a08 01834a08 00034a08 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .data 00000cb8 01874a08 01874a08 00034a08 2**2
CONTENTS, ALLOC, LOAD, DATA
7 .got2 00000010 018756c0 018756c0 000356c0 2**0
CONTENTS, ALLOC, LOAD, DATA
8 .ctors 00000010 018756d0 018756d0 000356d0 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .dtors 00000008 018756e0 018756e0 000356e0 2**2
CONTENTS, ALLOC, LOAD, DATA
10 .got 00000010 018756e8 018756e8 000356e8 2**2
CONTENTS, ALLOC, LOAD, DATA
11 .sdata 0000011c 018756f8 018756f8 000356f8 2**2
CONTENTS, ALLOC, LOAD, DATA
12 .sbss 00000024 01875814 01875814 00035814 2**2
ALLOC
13 .bss 000008b8 01875838 01875838 00035814 2**2
ALLOC
14 .stab 00000cfc 00000000 00000000 00035814 2**2
CONTENTS, READONLY, DEBUGGING
15 .stabstr 00000fba 00000000 00000000 00036510 2**0
CONTENTS, READONLY, DEBUGGING
16 .comment 00002060 00000fba 00000fba 000374ca 2**0
CONTENTS, READONLY
Dumping the ELF section header, we can see that all statically linked Linux
programs have a section named __libc_atexit. This is specific to Linux,
and as far as we know, it does not occur on any other operating system.
A good point is that this __libc_atexit section does not seems to be
Linux/PowerPC specific: We can find it in Linux/i386 static binaries as
well.
We therefore have to write a new test in
sys/compat/linux/common/linux_exec_elf32.c, enabled by the
LINUX_ATEXIT_SIGNATURE macro. This test just checks if there is a
__libc_atexit section in the ELF header. With this test, statically
linked Linux binaries are matched. We can check this by enabling the
DEBUG_LINUX macro and looking at what the kernel outputs when we try to
run the binary. With this new test, it is very likely that the hello
world program now runs in compatibility.
In the event it does not work, the way of solving the problem is running
ktrace(1) on the program on the NetBSD box, and the Linux equivalent
(which is strace(1)) on a Linux box, and see what is going wrong.
Possible issues are badly translated syscalls. For instance, if we
incorrectly translated mmap() to dup2(), this shows up immediately on a
kernel trace, because we see that dup2() is called instead of mmap().
We need to rebuild kdump(1) if we want it to display the system
call names and arguments when running emulated binaries. Generally
speaking, we need to recompile kdump(1) each time we modify any
syscalls.master file.
Now that statically linked binaries work, we can try dynamically linked
binaries. Note that you need to download a set of Linux libraries from a
PowerPC Linux box in order to run dynamically linked programs. For the
hello world program, you need at least ld.so.1 and libc.6.
On the PowerPC, dynamically linked programs were immediately matched by
the linux_elf32_signature() test, but running them did not work, either
because it crashed, or because we got a Linux ld.so message saying
that we invoked ld.so without arguments. We will focus on these dynamic
binaries-specific issues in part 2.
Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.
Return to ONLamp.com.
Copyright © 2007 O'Reilly Media, Inc.