IRIX Binary Compatibility, Part 5
Pages: 1, 2, 3
Lists, Locks and Reference Counts
But things are not that simple. For various reasons, we need to be able to walk through the threads list in a share group. Therefore, we need a chained list of processes in the share group.
There are handy macros in the NetBSD headers
<sys/queue.h> that are really helpful here. Documentation
can be found in the queue(3) manual page. There are macros to
declare the list, to walk through it, to insert elements in it, and so on. All
the pointer arithmetic is done in the macro, and the programmer can focus on the
actual use of the list. This is wonderful.
Creating the list is easy. To store the pointers to next/previous elements
in the list, we can use a field of the emulation specific part of struc proc
(it is pointed to by the p_emuldata field, and we created a struct
irix_emuldata to describe it. Check the previous article about signal emulation if you need details about this). The only problem is
deciding where to store the head of the list. Storing it in the
irix_emuldata structure is not a good idea, since we have no
guarantee at all that the processes holding the head of the queue will not
terminate before the others.
The solution is to allocate a structure, which describes the share group. It
is defined in sys/compat/irix/irix_prctl.h and is called struct
irix_share_group.
/* IRIX share group structure */
struct irix_share_group {
LIST_HEAD(isg_head, irix_emuldata) isg_head; /* list head */
struct lock isg_lock; /* list lock */
int isg_refcount;
};
Here we have the list head, defined using the LIST_HEAD macro,
one lock on the list and a reference counter.
The List
The list declaration will expand in the declaration of a structure. The
first argument to LIST_HEAD is the structure name. This name does
not really matter; we use the name of the field in struct
irix_share_group, as this is more convenient. The second argument
does matter; it is the type of the list elements. Here we specify that our list
is made of struct irix_emuldata elements. (Note that 'struct' is
omitted. The macro will add it.)
In the struct irix_emuldata, we have one field that holds the
list pointers. The declaration is done with another macro:
LIST_ENTRY(irix_emuldata) ied_sglist;
The Lock
The lock is used because several processes could try to manipulate the list at once. This can of course happen on a multiprocessor system, where the two processors may be in kernel mode at the same time, trying to manipulate the same kernel structure. It can also happen on a uniprocessor system, too. Imagine we have to walk the share group list to apply some collective operation. For some reason this operation may cause the process to sleep, awaiting some particular event. During the sleep, other processes will be scheduled, and they may try to modify the list too. When our first process wakes up, it may be working on a list element that no longer exists, which is quite likely to cause a panic.
The solution to this is to use a lock. Locks are manipulated through the
lockmgr(9) function. Simplest uses are to get a shared or
exclusive lock and to release a lock. Here is an example of
lockmgr(9) use:
(void)lockmgr(&lock, LK_SHARED, NULL);
/* Do something */
(void)lockmgr(&lock, LK_RELEASE, NULL);
We use shared locks when we want to read the list. Several processes may be doing this at once. They will all ask for a shared lock and get it.
Exclusive locks are requested when we want a process to modify the list.
Upon exclusive lock request, lockmgr(9) will sleep until all
shared locks are released, which guarantees that no reading process is still
reading. Once exclusive lock is acquired by one process, any other process
calling lockmgr(9) for a shared or exlcusive lock will go to
sleep, thus ensuring that no process can read while our process is writing.
The Reference Count
The reference count enables us to track the irix_share_group
structure usage. When a process is added to the list, we increase the count.
When a process is released, we decrease the count. When the count hits zero, we
can free the structure, since nobody is using it anymore.
One of the first jobs of irix_sproc() is to check if the
irix_share_group structure has already been allocated. If this is
not the case, the structure is allocated and the calling process is added to
the list. The child process will be added later, as we cannot add it before it
is actually created.
Parent and Child Life in the Kernel
fork1() has an argument named func which is used
to specify the child entry point. This can be a NULL pointer if we want to
return immediately to userland, or it can be the address of a kernel function
used to perform some custom tasks before returning to userland. Since we need
to arrange the child environment and to add it to the share group list, we use
this feature. The child entry function is called
irix_sproc_child() and is defined in
sys/compat/irix/irix_prctl.c.
It is worth mentioning that the child entry function must be a kernel
function. Passing a userland address here will cause a panic. To return to a
custom userland address after fork1(9), the solution is to specify
a kernel child entry function and to modify the child registers saved onto the
stack in that function so that, on return to userland, the process will execute
our userland function.
While executing irix_sproc_child(), we need some information
from the parent. (The kernel stores all fundamental information about a
process in the proc structure, so we read that.) This can lead to some
problems if we just let the parent exit irix_sproc() while
irix_sproc_child() is executing. If the parent returns to
userland and catches a deadly signal, our child can be left manipulating a
stale proc structure. This is very likely to cause a panic.
This situation can arise easily in a multiprocessor system where the second CPU will execute the child while the first CPU executes the parent. But it can happen on a uniprocessor system as well. If the child sleeps awaiting some resource, the kernel may schedule the parent to run and resume the child later.
The solution is to force the parent to sleep on a flag and make the child
awaken the parent when it will no longer be fatal for the child to be orphaned.
The address of the flag is passed to the child using the arg
argument to fork1(9). This argument is a pointer that gets passed
to the child entry function. For irix_sproc_child(), we defined it
as a pointer to the irix_sproc_child_args structure (defined in sys/compat/irix/irix_prctl.c):
struct irix_sproc_child_args {
struct proc **isc_proc;
void *isc_entry;
void *isc_arg;
size_t isc_len;
int isc_inh;
struct proc *isc_parent;
struct irix_share_group *isc_share_group;
int isc_child_done;
};
The structure is allocated in irix_sproc() and is used to
transmit a lot of information between the parent and the child: the child proc
structure pointer (isc_proc), the entry,
arg, len, and inh arguments that were
given to the sproc(2) system call (isc_entry,
isc_arg, isc_len, and isc_inh), the
parent proc structure address (isc_parent), the share group
structure address (isc_share_group), and the flag on which the
parent is sleeping, awaiting for the child to finish building itself
(isc_child_done).
Now that you have the whole picture, let us move to something even more
interesting. irix_sproc_child() needs to set up the stack and
registers just like IRIX does. Of course this is not documented anywhere, and
this is why things are getting more interesting. Time has come to reverse
engineer IRIX's sproc(2) implementation. But this will come in the
next part, so stay tuned.
Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.
Return to the BSD DevCenter.