IRIX Binary Compatibility, Part 1by Emmanuel Dreyfus
Author's Note: This article details the IRIX binary compatibility implementation for the NetBSD operating system. This includes the creation of a new emulation subsystem inside the NetBSD kernel and a lot of reverse engineering to understand and reproduce how IRIX internals work.
Because this article includes an introduction to all kernel subsystems involved with IRIX binary compatibility, we assume the reader has some experience in user-land Unix programming.
An Introduction to Binary Compatibility
Kernel and User-Mode Overview
Unix systems have two distinct modes of operation, known as user mode and kernel (or system) mode. In user mode, the operating system (OS) executes code provided by users. It could be a Web browser, a computer-science-student's project, a Web server (in this case, the user running the program is usually the system administrator), and so on. This code is run with limited privileges. It has limited access to the computer's memory, and usually no access at all to the hardware.
When running in kernel mode, the OS is only executing trusted code, which was loaded at boot time. This code is known as the OS kernel. The kernel has full access to the memory and hardware. It is here to provide services to user programs:
- It gives user programs access to the hardware. It provides an abstraction layer, presenting files and terminals to user programs where in fact only zeros and ones exist on hard disk and display I/O controllers.
- It periodically switches execution between several user programs (which are called processes), maintaining the illusion of multitasking.
- It ensures that a user accesses resources which correspond to the user's privileges.
User processes call kernel code by issuing a trap. A trap is a hardware or software exception that suspends user process execution, and gives control to kernel code. The kernel will handle the exception, after which it may return to user mode and resume the execution of the user process, or it may destroy the user process. Example of traps are division by zero, memory faults (accessing any virtual addresses where no physical memory is mapped), timer interrupts (that are used to switch between user processes), or requests by the user process to access some resource controlled by the kernel.
These requests can be opening a file, reading from a network
creating a new process. The process does this by issuing a system call,
fork(2). The system call is in fact a CPU
instruction that causes a trap.
Here is an example of MIPS assembly to call the
fork(2) system call on
li $v0,2 # 2 is the system call number for fork() # v0 is the register holding the system call number syscall # syscall is the CPU instruction to do a system call
syscall instruction execution, the kernel executes a particular
trap handler, which is known as the system call handler. For NetBSD/mips, it
can be found in
sys/arch/mips/mips/syscall.c:syscall_plain(). The system
call handler expects an argument, which is the system call number. The
system call handler uses a table, called the system call table, to look up a
kernel function that will be called in order to complete the system call. On
NetBSD, the system call table for native processes is generated from
System calls are the way a user process requests action from the kernel, but there is also a mechanism used by the kernel to notify the user process of unusual conditions: signals. Signals are issued by various traps and system calls, to notify the process that it raised an exception: memory fault (the famous segmentation fault, well known to students learning C), division by zero and so on.
For each signal, the user process can decide to take default action on
some signals (by default, some signals cause program abortion, other
are simply ignored), to ignore it, or to execute a function called a
handler. This choice is made using the
signal(3) library call or the
sigaction(2) system call.
Binary Compatibility at a Glance
There is a clean separation between user mode and kernel mode. User processes run on top of the kernel with very little knowledge of what is inside a system call. All they do is issuing system calls, expecting a behavior documented by kernel developers in a set of man pages. Most programs do not care about kernel internals and will just work if you change the kernel, as long as the system call behavior is left unchanged.
This is how NetBSD binary compatibility works. When launching a new
the kernel is able to distinguish between native NetBSD binaries and,
for example Linux or FreeBSD binaries on NetBSD/i386. It will hence
alternative system call table for this program, which will contain
appropriate entries for the emulated OS. For instance, NetBSD/i386 uses
sys/compat/linux/arch/i386/syscalls.master to provide the system call
table for Linux binaries.
When a Linux binary running on NetBSD does a system call, the NetBSD kernel will run the appropriate function in the Linux system call table. This function emulates the behavior of the Linux system call so that the user program is fooled into thinking that it is running on the Linux kernel whereas it is in fact running on the NetBSD kernel.
Some system calls have the same behavior in NetBSD and in the emulated
OS; in this case, the emulation system call table just uses the same
corresponding function. Sometime the behavior is a bit different. For
instance some flags have different values, or there are different
system call semantics. In this case, the system call table references an
emulation function, which will call the native function after adapting the
arguments and/or behavior. This is done, for instance, in
sys/compat/linux/common/linux_misc.c:linux_sys_uname() for Linux
uname(2) emulation. Last but not least, the emulated system call may have no
native equivalent. The emulation function that implements the system calls
must hence do all the work, or just act as the work has been done and just
return, hoping that the user process will not notice the broken behavior (yes,
sometimes it works).
The other part of the job is implementing signal emulation. Care should be taken in order to ensure the system call handler is called in the same way the emulated OS would have done it. This job leads to the manipulation of machine registers and assembly language, and hence it is quite machine dependent.
Pages: 1, 2