|
Previously in this series: Linux Compatibility on BSD for the PPC Platform -- The Linux compatibility layer allows BSD to run Linux binary applications. Emmanuel Dreyfus explains how he implemented this on NetBSD for the PowerPC platform. |
In this article, we'll take a closer look at the problems that prevent dynamic Linux binaries from working in compatibility mode on the NetBSD/PowerPC platform. This includes the way the arguments are passed to the Linux program, and ELF auxilliary table handling.
The first problem here is Linux's ld.so did not get its command-line arguments. In fact, no program running in Linux emulation -- either statically linked or dynamically linked -- was actually able to get its arguments. This could be outlined by building this sample program on a Linux box (statically, of course), and trying to run it on the NetBSD box:
/*
* arg.c -- An argument printer
*/
#include <stdio.h>
int main (int argc, char **argv) {
int i;
for (i=0; i<= argc; i++) {
printf ("argc[%d]=%s\n", i, argv[i]);
if (argc > 1)
return atoi (argv[1]);
return 0;
}
This programs tests argument and return value passing between
the kernel and the emulated executable. When running it, we get no
output at all. The program got a null argc, which demonstrated the
problem passing command-line arguments.
The arguments are passed to the program using the stack. When preparing
the program launch, the kernel sets up the stack so the program
will be able to find argc, argv, and envp. To inspect this
mechanism a bit deeper, we can use a stack dumper, like the following
piece of code :
/*
* sd.c -- A stack dumper
*/
#include <stdio.h>
#include <sys/types.h>
#include <ctype.h>
extern long end;
extern long etext;
extern long edata;
extern char **environ;
void stackdump (long, char **);
int main (int argc, char **argv) {
long sign = 0x89abcdef;
printf ("argc=0x%p\n", &argc);
printf ("argv=0x%p\n", &argv);
printf ("environ=0x%p\n", &environ);
stackdump (sign, argv);
return 0;
}
void stackdump (long arg, char **argv) {
unsigned long i,j;
long signature = 0x01234567;
if (0)
printf ("%lx %lx\n", arg, signature);
printf ("etext=0x%lx\nedata=0x%lx\nend=0x%lx\n", etext, edata, end);
for (i = (((long)argv-0x400)/16)*16; i <= 0x7fffffff; i=i+16) {
printf ("%08lx ",i);
for (j=0; j <= 15; j=j+2) {
printf ("%02x", (*(char*)(i+j)));
printf ("%02x ", (*(char*)(i+j+1)));
}
for (j = 0; j <= 15; j++) {
if (isprint (*(char*)(i+j)))
printf ("%c", *(char*)(i+j));
else
printf (".");
}
printf ("\n");
}
}
This program also uses global and local variables to help study argument
passing. It dumps the stack from an arbitrary address until it
reaches the end of the stack and crashes, because pages after the stack
are not accessible when running in user mode. We do not really care
about this crash because it displays what we are looking for. However,
this can be a problem if you are working on a terminal that is unable to scroll back and want to pipe the stack dump's output to more(1) or
less (1).
If you want to do this, you will have to modify the program so
it catches the SIGSEGV signal. You will also have to ensure that
linux_sendsig() in linux_machdep.c does not crash anything. Most likely,
you will keep that function empty. The easy solution is certainly
to get a terminal that has a scrollback feature.
|
Dumping the stack, you can see the parameters you give to the program
and its environment. Stackdump also give you the address of argc, which
is the place where the program stores argc on the stack. In fact, the
program copied that value from an upper address on the stack before
entering main(). If we do not get the appropriate value for argc, we must find out where the program gets its argc, and fix the way
the NetBSD kernel sets up the stack so that argc gets written where the
emulated binary expects it.
Note: This is a stack dump with the desired stack layout, not the original one.
argc=0x7fffe8a8
argv=0x7fffe8ac7fffe8a0 7fff e8c0 0180 0744 0000 0001 7fff e904 ................
7fffe8b0 7fff e8b0 0000 0006 0184 0000 0184 0000 ................
7fffe8c0 7fff e8e0 0180 05cc 0000 0000 0000 0000 ................
7fffe8d0 7fff e8e0 4186 65e0 7fff e9e0 4186 5d60 ....A.e.....A.]'
7fffe8e0 7fff e8f0 4188 9580 7fff e9e0 4186 5d60 ....A.......A.]'
7fffe8f0 0000 0000 0000 0000 0000 0000 0000 0000 ................
7fffe900 0000 0001 7fff eab0 0000 0000 7fff eab5 ................
Next to this copied argc, here at 0x7fffe8a8, stands a pointer to
**argv, at 0x7fffe8ac. This is more interesting because looking at the
pointed address, at 0x7fffe904, we can find the **argv pointer that was
set up by the kernel. Next to it, at 0x7fffe900, we have the argc value
set up by the kernel. In this example, everything is fine, but if the
kernel does not set up argc at the place the executable expects it,
searching around the place pointed by the pointer to **argv (here at
0x7fffe8ac) is a good option.
When searching for the argc value set up by the kernel, the idea is to
look for an integer value (4 bytes on the PowerPC) equal to the actual
number of arguments given to the program (the program name itself being
the first argument, so that number is at least 1). Next to argc we
have **argv, which points to the *argv array. Each element of this array
is a pointer to a null terminated argument string, so it is easy to
identify.
We can figure out what the problem is by trying stackdump with various
arguments. On the PowerPC, the problem was that we needed to set up argc
on a 16-byte boundary. And there was a special trick if argc was
already to appear on a 16-byte boundary, because the emulated binary
then expected it to be 16 bytes lower on the stack.
To fix this problem, and get arguments passed to the program, we
need to modify the stack pointer before writing argc, **argv
and **envp on the stack. Setting up the stack is normally done by the
copyargs() function, which lives in sys/kern/kern_exec.c. But it is
possible to supply a customized copyargs() function by filling the
appropriate field of COMPAT_LINUX's struct execsw. This is done in
sys/kern/exec_conf.c, using the linux_copyargs_function macro. That
macro should be defined in sys/compat/linux/arch/powerpc/linux_exec.h.
Thus, by modifying this macro, we can use a customized copyargs()
function. The Alpha port of COMPAT_LINUX already did this. The
customized function is linux_copyargs(), and it is in the
sys/compat/linux/arch/alpha/linux_exec_alpha.c file.
Because there is already a linux_exec.c in sys/compat/linux/common, this file
cannot be called linux_exec.c, because when you build the kernel, all
object files fit in the same build directory. Having the same name twice
will result in the second object file overwriting the first one, and
this will lead to a link error. That file was intended to be architecture-independent, so we use the Alpha version
with some PowerPC add-ons. The result is the
sys/compat/linux/arch/powerpc/linux_exec_powerpc.c file, which is common
to the Alpha and the PowerPC platforms. It should be moved to the architecure-independent sys/compat/linux/common/linux_exec.c file later.
Linux_copyargs() first calls the standard copyargs() function, to set up all the argv and envp arrays. It leaves a linux_elf_aux_argsize bytes gap for the ELF auxiliary table (we will take a look at this later), and then it attempts to write argc, and the **argv and **envp pointers. The PowerPC-specific alignment is done by this code section:
#ifdef LINUX_SHIFT
/*
* Seems that PowerPC Linux binaries expect
* argc to start on a 16 bytes
* aligned address. And we need one more 16
* byte shift if it was already
* 16 bytes aligned.
*/
(unsigned long)stack = ((unsigned long)stack - 1) & ~LINUX_SHIFT;
#endif
The LINUX_SHIFT command is a macro, defined as 0x0000000fUL in
sys/compat/linux/arch/powerpc/linux_exec.h, and we use an ifdef test to prevent the Alpha version to do this PowerPC-specific fix that
would break NetBSD/Alpha Linux emulation. The file remains
architecture-independent.
With this fix, we managed to get statically linked executables to get their
arguments. However, a dynamically linked program will still fail because
ld.so does not find the ELF auxiliary table.
The ELF dynamic linker (ld.so) needs more information than just argc, **argv, and **envp to actually link a program. It must be able to locate the ELF section where the list of shared libraries needed by the program is located. This kind of information is transmitted to ld.so by setting up the ELF auxiliary table on the stack. This table contains a few
entries, each containing two fields: type and value. The details of
each field are specified in the System V Release 4 PowerPC ABI, that
can be found here.
|
By looking at Linux kernel source file linux/fs/binfmt_elf.h, in the
create_elf_tables() function, we can learn how the table should be laid
out so Linux's ld.so works. The job is nearly the same on
the PowerPC and Alpha platforms, so we can use the NetBSD/Alpha version again. The
PowerPC platform just has a special trick: The ELF auxiliary table must also be
aligned on a 16-byte boundary. This is a bit difficult to understand in
the Linux kernel sources, but we can see comments about this in
linux/fs/binfmt_elf.h, and also in the shove_aux_table() function, which is in linux/arch/ppc/kernel/process.c.
We therefore have to add another LINUX_SHIFT conditional before writing the ELF auxiliary table:
#ifdef LINUX_SHIFT
/*
* From Linux's arch/ppc/kernel/process.c:shove_aux_table().
* GNU ld.so expects the ELF auxiliary table to start on a
* 16 bytes boundary on the PowerPC.
*/
(unsigned long) stack =
((unsigned long) stack + LINUX_SHIFT) & ~LINUX_SHIFT;
#endif
Finding out where ld.so really expects the table was fairly difficult:
When dynamic linking does not work, it is impossible to even output a
string from the program, so stack-dumping a dynamically linked
program is not an option. I had to blindly try a few different
alignments and test the result before I managed to get it to work.
When the ELF Auxiliary table is correctly set up onto the stack,
dynamically linked Linux binaries should link and run. Using GNU
ld-1.7.0.so, everything was fine: ld.so got its argument, and the
program was able to run (and then crash, but this was actually caused by
another bug we will study in the next section). However, when
upgrading to GNU ld-2.1.3.so, we discovered a new problem: Dynamically
linked executables did not get their arguments anymore. This problem
will be studied in a later section. In the next section, we will focus on other bug-crashing Linux binaries dynamically linked with GNU
ld-1.7.0.so.
At this point, it is obvious that ld.so was successfully launched: The kernel trace did show attempts to open() and mmap() files such as /emul/lib/libc.so.6. But the mmap() call failed.
mmap() is used to remap physical memory and files into a process's virtual address space. It is widely used when linking shared libraries because
the library code doesn't have to be loaded into the process
memory. mmap() is used to map the shared library file from the disk to the
virtual address space of several user processes. When a process uses the
library, it is loaded into physical memory by the virtual memory
subsystem, but it will never be loaded twice, because other processes
share the library through their virtual memory mappings. The library is
loaded once and used several times. If you need more information
about the mmap() system call, take a look at the mmap (2) man page.
To debug this kind of problem, it is useful to make a small test program
that uses the bogus system call. Here is a simple mmap() tester:
/*
* mmap.c -- mmap() tester
*/
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
int main (int argc, char **argv) {
int fd;
char* ptr;
fd = open ("/etc/passwd", O_RDONLY, 0);
if (fd < 0) {
printf ("open failed\n");
exit(-1);
}
(void*)ptr = mmap (NULL, 512, PROT_READ,MAP_PRIVATE|MAP_FILE, fd,
0);
if (ptr == NULL) {
perror ("mmap failed");
exit(-1);
}
printf ("%c-%c-%c-%c\n", ptr[0], ptr[1], ptr[2], ptr[3]);
return 0;
}
Using this program, it is clear the problem is caused by our
mmap() emulation, and nothing else. After some investigation, we
found the problem was caused by the size of the offset
argument to mmap(). This argument is 32 bits long on a PowerPC Linux
system, and it is 64 bits long on a PowerPC NetBSD system. The result is
that when a Linux executable made a mmap() system call, NetBSD used for
offset the actual argument given by the Linux executable, plus the next
32 bits of data on the stack.
Adding a wrapper function that correctly handles the offset argument and
transfers control to linux_sys_mmap() fixes the problem. This
wrapper function is defined in sys/compat/linux/arch/powerpc/linux_mmap_powerpc.c. Obviously, this is
not very clean design, and it would be better to define a linux_off_t in
architecture-dependent linux_mmap.h files, and then use them in the
architecture-independent linux_sys_mmap() function.
After this mmap() fix, we are able to run dynamically linked programs
such as the stack dumper or the argument printer. Everything is fine
with ld-1.7.0.so (which is available with Linux's glibc-1), but
upgrading to ld-2.1.3.so, which comes with glibc-2, breaks argument passing for dynamic executables.
|
When using ld-2.1.3.so, the argument-passing problem was a bit weird:
ld.so was able to link the program, and this meant that it was able to
find the program's arguments (if ld.so does not get the arguments, it
complains by displaying an error message). That suggested the stack
layout for arguments was good. But on the other hand, the program itself
wasn't able to retrieve its arguments anymore: When running the
argument printer, the program displayed a null **argv. This suggested
the stack layout for the arguments was bad.
Running the stack dumper, it was obvious that the program expected its
arguments 16 bytes lower than the place they actually were. Modifying
the stack layout or the stack pointer did not fix the problem, because
if the arguments were set up where the program expected them, then ld.so
did not find them, and it was not able to link the program.
In fact, the problem is that ld.so and the executable expected the
arguments to be on two different places. Duplicating the arguments was
therefore a possible workaround to the problem. With such a duplication,
here is the stack layout the kernel produced before transferring
control to ld.so:
7fffe9b0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................
7fffe9c0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................
You can recognize on each line argc (here 0000 0001), the **argv pointer, a null pointer, and the **envp pointer. When the kernel transferred control to ld.so, the stack pointer was at 0x7fffe9c0. Ld.so
was able to find its arguments at 0x7fffe9c0, and the idea was that the
program would find its arguments 16 bytes lower, at 0x7fffe9b0.
Unfortunately, this does not work, because ld.so makes use of the stack. It
uses the space between 0x7fffe9b0 and 0x7fffe9bf, and when it transfered
control to the program, the stack layout is like this:
7fffe9b0 0000 0000 0000 0000 0000 0000 0000 0000 ................
7fffe9c0 0000 0001 7fff eab0 0000 0000 7fff eab5 ................
And again, the program was not able to find the arguments, because the
place where it expected them is erased by ld.so.
A good solution here would be to understand why ld.so gives a
stack pointer that is 16 bytes too low to the program. It was not
possible to achieve this, so I had to hack a bad solution. The idea here
is that ld.so gives the program a stack pointer which is 16 bytes too
low. So if we can regain control after ld.so has done its job, and
before the program is actually started, we can adjust the stack
pointer so that the program can find its arguments.
The problem is how to get control between ld.so and the program. Because ld.so does not return to kernel mode before launching the program, we have to fool ld.so into thinking it is launching the program, whereas it is actually running our code.
This can be done by setting up an entry
in the ELF auxiliary table that describes where the program entry point
is. Ld.so then uses that entry to launch the program. We can modify
this entry in the ELF auxiliary table so that ld.so will transfer
control to a small piece of code we uploaded onto the process stack.
This code would adjust the stack pointer and then jump to the real
program entry point. This approach is a ugly hack, but at least it
worked. Here is the stack pointer adjustment code (thanks to Wolfgang
Solfrank for helping me writing it) :
#include <machine/asm.h>
#define LINUX_SP_WRAP_OFFSET 0x10
.globl _C_LABEL(linux_sp_wrap_start)
.globl _C_LABEL(linux_sp_wrap_end)
.globl _C_LABEL(linux_sp_wrap_entry)
_C_LABEL(linux_sp_wrap_start):
addi 1,1,LINUX_SP_WRAP_OFFSET
mflr 12
bl 1f
1:
mflr 11
mtlr 12
lwz 12, _C_LABEL(linux_sp_wrap_entry)-1b(11)
mtctr 12
bctr
_C_LABEL(linux_sp_wrap_entry):
.long 0 /* orginal prog entry point. setup by the kernel
*/
_C_LABEL(linux_sp_wrap_end):
Its use is triggered by the LINUX_SP_WRAP macro, which is defined in
PowerPC-specific linux_exec.h, just like the LINUX_SHIFT macro. The
kernel just copies this code from kernel space to the user stack, sets
up the program entry point at the linux_sp_wrap_entry location, and sets
the entry point in the ELF auxiliary table to the location on the stack
where the code was just uploaded.
We can have a closer look at what the assembly instructions actually do.
First, we adjust the stack pointer, which is GPR1, by adding 16 to it.
This is done by the addi 1,1,LINUX_SP_WRAP_OFFSET.
Then we load in GPR12, the value at the linux_sp_wrap_entry location. To
do this, we will have to tamper with the Link Register, so it is saved prior to that operation and then restored. This is done with the
mflr 12 instruction, which saves the Link Register to GPR12, and by the
mtlr 12, which restores the Link Register to the value contained in
GPR12.
The next goal is to get the value at the linux_sp_wrap_entry address in
GPR12. By the bl 1f instruction (the f stands for the next label 1), we
branch to label 1, and we save the Program Counter into the Link
Register. mflr 11 copies the value contained in the Link Register into
GPR11. We now have the address of label 1 in GPR11.
The difficult part is the lwz 12, _C_LABEL(linux_sp_wrap_entry)-1b(11)
instruction, which adds the difference between the address of
linux_sp_wrap_entry and the address of label 1 (the 1b stands for the
previous label 1) to GPR11 and loads the word located at the resulting
address into GPR12. We end up with the linux_sp_wrap_entry address in
GPR12.
We copy the value of GPR12 to the CTR register, using
the mtctr 12 instruction. Then we can use the bctr instruction,
which branches to the address contained in CTRM.
This may look a bit complicated, but this is caused by two problems we need to address: First, we want the code to be able to be relocated
(hence the use of the Link Register), and second, we want to do a long
branch to the program entry. We must use the CTR to do this long branch.
This hack was rather inelegant, but it fixed the problem. Using this method, it
was possible to get arguments in programs linked with ld-2.1.3.so. What
is surprising is that it did not break linking with ld-1.7.0.so.
Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.
Return to ONLamp.com.
Copyright © 2007 O'Reilly Media, Inc.