ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


IRIX Binary Compatibility, Part 1

by Emmanuel Dreyfus
08/08/2002

Author's Note: This article details the IRIX binary compatibility implementation for the NetBSD operating system. This includes the creation of a new emulation subsystem inside the NetBSD kernel and a lot of reverse engineering to understand and reproduce how IRIX internals work.

Because this article includes an introduction to all kernel subsystems involved with IRIX binary compatibility, we assume the reader has some experience in user-land Unix programming.

An Introduction to Binary Compatibility

References

Throughout this article, we reference various NetBSD kernel source files and NetBSD manual pages.

Kernel and User-Mode Overview

Unix systems have two distinct modes of operation, known as user mode and kernel (or system) mode. In user mode, the operating system (OS) executes code provided by users. It could be a Web browser, a computer-science-student's project, a Web server (in this case, the user running the program is usually the system administrator), and so on. This code is run with limited privileges. It has limited access to the computer's memory, and usually no access at all to the hardware.

When running in kernel mode, the OS is only executing trusted code, which was loaded at boot time. This code is known as the OS kernel. The kernel has full access to the memory and hardware. It is here to provide services to user programs:

User processes call kernel code by issuing a trap. A trap is a hardware or software exception that suspends user process execution, and gives control to kernel code. The kernel will handle the exception, after which it may return to user mode and resume the execution of the user process, or it may destroy the user process. Example of traps are division by zero, memory faults (accessing any virtual addresses where no physical memory is mapped), timer interrupts (that are used to switch between user processes), or requests by the user process to access some resource controlled by the kernel.

These requests can be opening a file, reading from a network connection, or creating a new process. The process does this by issuing a system call, like open(2), read(2), or fork(2). The system call is in fact a CPU instruction that causes a trap.

Here is an example of MIPS assembly to call the fork(2) system call on NetBSD:

li  $v0,2   # 2 is the system call number for fork()
            # v0 is the register holding the system call number
syscall     # syscall is the CPU instruction to do a system call

On the syscall instruction execution, the kernel executes a particular trap handler, which is known as the system call handler. For NetBSD/mips, it can be found in sys/arch/mips/mips/syscall.c:syscall_plain(). The system call handler expects an argument, which is the system call number. The system call handler uses a table, called the system call table, to look up a kernel function that will be called in order to complete the system call. On NetBSD, the system call table for native processes is generated from sys/kern/syscalls.master.

System calls are the way a user process requests action from the kernel, but there is also a mechanism used by the kernel to notify the user process of unusual conditions: signals. Signals are issued by various traps and system calls, to notify the process that it raised an exception: memory fault (the famous segmentation fault, well known to students learning C), division by zero and so on.

For each signal, the user process can decide to take default action on some signals (by default, some signals cause program abortion, other are simply ignored), to ignore it, or to execute a function called a signal handler. This choice is made using the signal(3) library call or the sigaction(2) system call.

Binary Compatibility at a Glance

There is a clean separation between user mode and kernel mode. User processes run on top of the kernel with very little knowledge of what is inside a system call. All they do is issuing system calls, expecting a behavior documented by kernel developers in a set of man pages. Most programs do not care about kernel internals and will just work if you change the kernel, as long as the system call behavior is left unchanged.

This is how NetBSD binary compatibility works. When launching a new program, the kernel is able to distinguish between native NetBSD binaries and, for example Linux or FreeBSD binaries on NetBSD/i386. It will hence choose an alternative system call table for this program, which will contain appropriate entries for the emulated OS. For instance, NetBSD/i386 uses sys/compat/linux/arch/i386/syscalls.master to provide the system call table for Linux binaries.

When a Linux binary running on NetBSD does a system call, the NetBSD kernel will run the appropriate function in the Linux system call table. This function emulates the behavior of the Linux system call so that the user program is fooled into thinking that it is running on the Linux kernel whereas it is in fact running on the NetBSD kernel.

Some system calls have the same behavior in NetBSD and in the emulated OS; in this case, the emulation system call table just uses the same corresponding function. Sometime the behavior is a bit different. For instance some flags have different values, or there are different system call semantics. In this case, the system call table references an emulation function, which will call the native function after adapting the arguments and/or behavior. This is done, for instance, in sys/compat/linux/common/linux_misc.c:linux_sys_uname() for Linux uname(2) emulation. Last but not least, the emulated system call may have no native equivalent. The emulation function that implements the system calls must hence do all the work, or just act as the work has been done and just return, hoping that the user process will not notice the broken behavior (yes, sometimes it works).

The other part of the job is implementing signal emulation. Care should be taken in order to ensure the system call handler is called in the same way the emulated OS would have done it. This job leads to the manipulation of machine registers and assembly language, and hence it is quite machine dependent.

Implementation Plans

IRIX 6.5 is known to be a System V Release 4 (SVR4) derived Operating System, and thanks to Christos Zoulas, NetBSD already contains a SVR4 binary compatibility option. The code for this SVR4 emulation can be found in sys/compat/svr4 in the NetBSD kernel sources.

NetBSD already has a binary compatibility with major OSes such as Solaris 2 or SCO OpenServer through this SVR4 compatibility option. The first problem was to decide if the IRIX compatibility would be implemented by improving the SVR4 compatibility, or by introducing an IRIX specific compatibility option.

The answer to this first question is obvious once you compare the system call tables for plain SVR4 and for IRIX 6.5. The table for SVR4 can be found in NetBSD kernel source in sys/compat/svr4/syscalls.master. The table for IRIX can be found on an IRIX system in /usr/include/sys.s, and in NetBSD kernel sources, it can be found in sys/compat/irix/syscalls.master.

In the IRIX 6.5 system table, only the first 88 system calls are plain SVR4. Following are 147 system calls that are either IRIX specific, or are just SVR4 system calls with different system call numbers. This strongly suggests that IRIX binary compatibility in NetBSD should have its own syscalls.master, since it would be a pain to add dozens of #ifdef's in SVR4's syscalls.master.

Related Reading

UNIX in a Nutshell
By Arnold Robbins

Thus, we needed a sys/compat/irix directory in NetBSD kernel sources, with an IRIX-specific syscalls.master file. However, there are a lot of plain SVR4 system calls in IRIX, therefore a lot of code in sys/compat/svr4 is used by the IRIX binary compatibility. This code is built when the kernel is built with the IRIX binary compatibility option (COMPAT_IRIX) is set, even if the SVR4 binary compatibility option (COMPAT_SVR4), is not.

Setting Up the New Compatibility Option

Overview

In order to create a new compatibility option in NetBSD, we need

Most of the work can be done in an incremental way, starting from code which is duplicated from the NetBSD native version, and modifying it until IRIX binaries work. The only field where a NetBSD version is not very relevant is the syscalls.master file, because we know that everything will be changed later. A null syscalls.master file, which defines no system calls at all is a good start.

Registering Our New Emulation

Now let us see how an emulation option is made visible to the NetBSD kernel. Everything is done in the sys/kern/exec_conf.c. In this file, an array called execsw_builtin is defined. Each entry in this array is a struct execsw, as defined in sys/sys/exec.h

The struct execsw describe a particular execution environment. This includes foreign OSes emulation, and natives situations as well : There are entries in execsw_builtin for shell scripts, a.out native binaries, ELF native binaries, and 32-bit ELF binaries running on 64-bit NetBSD systems.

Informations held by the struct execsw include pointers to a function responsible for identifying the executable format (a.out, ELF, ECOFF, Mach-O...), a probe function that should be able to tell if this exec switch is able to handle a particular binary, and functions for setting up the program's initial stack, CPU registers, and for writing a core dump to the disk.

The struct execsw also holds a pointer to a struct emul, which is defined in sys/sys/proc.h. Whereas the fields of struct execsw are used at program creation and termination, the struct emul is used during the program normal operation. It contains pointers to the system call table, and to various functions used to handle traps and signals.

The distinction between struct execsw and struct emul is there because some OSes supports several executable formats. For instance, NetBSD itself supports native a.out or ELF binaries. Both kind of binaries share the same system table and signal handlers, and therefore they have the same struct emul. But the binary loading is different, hence they have two distinct struct execsw in execsw_builtin.

The first job is to create the struct emul for IRIX binary compatibility. This uses the IRIX system call table (which is empty so far), and all other fields are copied from the NetBSD native struct emul, which is found in sys/kern/kern_exec.c. It is named emul_netbsd. The struct emul for IRIX is naturally named emul_irix.

Then we can add the entry for IRIX in the exec_builtin array. IRIX uses ELF binaries, so this entails not much more than copying NetBSD ELF native's entry, and replacing the struct emul emul_netbsd by our emul_irix.

Matching the Binaries We Can Run

Now we have registered a new execution environment with the kernel. The next step is to have it actually run something. The struct execsw includes a probe function whose purpose is to tell the kernel if the execution environment described by this entry is able to handle a given binary.

In order to use our new execution environment, we must therefore write a probe function. Usually, this kind of function tries to find a signature specific to an OS in an ELF section, or a magic number in a a.out header that would identify the binary.

For IRIX, things are a bit complicated, since IRIX uses no less than three different kind of ELF executables. Theses correspond to the three Application Binary Interface (ABI) supported in IRIX: o32, n32 and n64. The ABI is the set of conventions that explains how the stacks and registers should be used when calling a function, or doing a system call.

o32 is the traditional 32-bit SVR4 ABI for MIPS processors. Here is a pdf document extensively describing o32 from the SVR4 ABI MIPS processor supplement. n64 is the 64-bit ABI, used for 64-bit ELF binaries. Finally, n32 is a hybrid ABI used to increase performance of applications using a 32-bit address space on 64-bit processors. The difference between o32 and n32 is that 64-bit registers are used instead of 32-bit, where relevant, and more function arguments are transmitted through registers instead of the stack. The goal behind n32 is to improve performance on 32-bit applications that are not necessarily able to build as 64-bit, because some assumption were made on pointer size, for instance.

At the time this paper was written, the NetBSD/mips kernel is only able to run o32 binaries. The goal is hence to match IRIX o32 binaries. o32 binaries are themselves divided into two families: static o32 and dynamic o32. Dynamic o32 binaries are the easy part of the job; therefore we will start with them.

ELF binaries are divided into ELF sections. The sections can be inspected using the objdump(1) command:

objdump -h file                 will list the ELF sections of file, and 
objdump -j .section -s file     will dump .section from file.

All dynamic ELF executables have an .interp section that contains the name of the ELF interpreter, which is also known as the dynamic linker. On NetBSD, this is /usr/libexec/ld.elf_so. See ld.elf_so(1) for more information about ELF dynamic linking.

On program startup, the kernel loads the executable and the interpreter, and then transfers control to the interpreter. The interpreter loads the shared objects into memory, and then transfers control to the dynamically linked program.

On IRIX, things are a bit strange: the interpreter is libc itself. Libc loads the dynamic linker (/usr/lib/rld), which in turn loads all the shared objects and then executes the program by calling it's main() function.

This IRIX particularity is good for matching IRIX binaries: the interpreter name is /lib/libc.so.1, which is quite unusual. Another good point to examine is that because it maintains three different ABIs, IRIX has three different sets of libraries, and therefore three different interpreter: /lib/libc.so.1 for o32 binaries, /lib32/libc.so.1 for n32 and /lib64/libc.so.1 for n64. It is therefore quite easy to check whether a dynamic executable is an IRIX o32 binary : we just have to peek at the .interp section and see if the interpreter is /lib/libc.so.1

In This Series

IRIX Binary Compatibility, Part 6
With IRIX threads emulated, it's time to emulate share groups, a building block of parallel processing. Emmanuel Dreyfus digs deep into his bag of reverse engineering tricks to demonstrate how headers, documentation, a debugger, and a lot of luck are helping NetBSD build a binary compatibility layer for IRIX.

IRIX Binary Compatibility, Part 5
How do you emulate a thread model on an operating system that doesn't support native threads (in user space, anyway)? Emmanuel Dreyfus returns with the fifth article of his series on reverse engineering and kernel programming. This time, he explains thread models and demonstrates how NetBSD emulates IRIX threads.

IRIX Binary Compatibility, Part 4
Emmanuel Dreyfus tackles the chore of emulating IRIX signal handling on NetBSD.

IRIX Binary Compatibility, Part 3
Emmanuel Dreyfus shows us some of the IRIX oddities, the system calls that you will not see anywhere else.

IRIX Binary Compatibility, Part 2
Emmanual Dreyfus shows us how he implemented the things necessary to start an IRIX binary. These things include the program's arguments, environment, and for dynamic binaries, the ELF auxiliary table, which is used by the dynamic linker to learn how to link the program.

Static binaries are more tricky. At a glance, the only difference between o32 and n32 static binaries is the presence of a .MIPS.options section in o32 static binaries. This could have been a good test, but unfortunately, it has some false negatives. One can find a few static o32 binaries in IRIX 5 that do not have this ELF section (the expr(1) command for instance).

But since IRIX's file(1) command is able to distinguish o32 and n32, there must be a reliable difference. The answer is in an IRIX header file: /usr/include/sys/elf.h. This file defines the ELF header, in which we can find an e_flags field. Two bits in this field are used for IRIX binaries to distinguish between the three ABIs. In order to tell if an IRIX binary is o32, n32, or n64, we just have to check for theses two bits in the ELF header.

This is quite a robust test to distinguish between o32, n32, and n64 IRIX binaries; however, it can still have some trouble when it is used to distinguish between IRIX and non-IRIX static binaries. Fortunately, there are not a lot of static binaries on an IRIX 6.5 system, and it is not trivial to build static o32 programs. (In fact, I was not able to find how to do it.) Hence, we can afford to use a weak matching scheme; as soon as we can match the few static binaries from IRIX 6.5 correctly, we are safe.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to the BSD DevCenter.


Copyright © 2009 O'Reilly Media, Inc.