ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


chrooted ntpd in NetBSD

by Emmanuel Dreyfus
02/13/2003

chrooting ntpd

As we explained in Securing Systems with chroot, Part One, a daemon must run with an unprivileged user ID (UID) in order to be safely chrooted. This is a problem, since many daemons need some superuser privileges in order to operate. In some situations, superuser privileges are only necessary during initialization, and it is possible to switch to an unprivileged UID later. This is the case for named, the Domain Name System (DNS) server from the Internet Software Consortium (ISC). named needs superuser privileges in order to bind to UDP port 53 (superuser privileges are needed on almost all Unix systems to bind to ports lower than 1024). Once this is done, named is able to chroot to a directory where the zone files are stored, and it can operate under an unprivileged UID, typically the user named.

ntpd needs superuser privileges for two operations: binding to UDP port 123 (at initialization time) and using time control system calls such as adjtime(2) and ntp_adjtime(2), which are restricted to the superuser.

For the first operation, we could proceed as named does, first binding to UDP port 123, then calling chroot(2) and setuid(2). The problem is the second operation. To be able to chroot ntpd after initialization, we need a way to enable an unprivileged user to control the system clock. Such a feature was introduced in NetBSD 1.6, with the clockctl device.

Related Reading

Unix Power Tools
By Shelley Powers, Jerry Peek, Tim O'Reilly, Mike Loukides

The clockctl device

On NetBSD, the system clock can be affected through four different system calls: adjtime(2), settimeofday(2), clock_settime(2), and ntp_adjtime(2), the last available only if the kernel was compiled with the NTP option.

The clockctl device introduces alternative entry points to these system calls, through a special device file typically named /dev/clockctl. The alternative entry points are done through ioctl(2) system calls on the device file. ioctl(2) is a general purpose system call that enables the user to perform a custom action on a file object. We will see this in more depth in the next part of this article.

If a user has write access to /dev/clockctl, then he can use the alternative entry points and can control the system clock. In order to chroot ntpd, we therefore just need to build a kernel with the clockctl device driver and ensure that the unprivileged user under which ntpd is running in the chroot jail has write access to /dev/clockctl.

In order to be administrator-friendly, NetBSD 1.6 comes with clockctl enabled in GENERIC kernels--the /dev/clockctl file is installed by default, and the startup scripts already know about clockctl. Therefore, the system administrator just has to add one line to /etc/rc.conf. Here are the relevant lines from /etc/defaults/rc.conf:

# To run the ntpd(8) NTP server as an unprivileged user under a
# chroot(2) cage, uncomment the following, after ensuring that:
#       - The kernel has "pseudo-device clockctl" compiled in
#       - /dev/clockctl is present
#
#ntpd_chrootdir="/var/chroot/ntpd"

The next part of this article is more developer-oriented. It deals with the implementation details of the chrooted ntpd. In the next two sections, we will focus on the userland modifications that were required in order to provide a chrootable ntpd, and we will discuss the implementation details of the clockctl device driver.

Userland Modifications: libc

Our goal was to make modifications as minor as possible in the NTP daemon. We especially did not want to introduce a new Application Programming Interface (API). This goal was achieved at the expense of introducing some magic into NetBSD's libc.

When a user program is built, each system call is turned into a library call to a function in the libc known as the system call stub. The function does the actual system call, and may do some additional handling for backward compatibility. The stubs that do more than just the system call have a C source file associated with them. They are listed in the SRC variable in src/lib/libc/sys/Makefile.inc. For an example of a system call stub that does additional handling, see src/lib/libc/sys/lseek.c.

On the other side, some system call stubs are utterly void; they only do the system call. In this case, the source file for the system call stub is automatically generated. These are listed in the ASM variable in src/lib/libc/sys/Makefile.inc. An autogenerated stub looks like this:

#include "SYS.h"
RSYSCALL(chdir)

Once generated, this file is src/lib/libc/chdir.S. The curious reader will look for the definition of the RSYSCALL macro, which is contained in src/lib/libc/arch/powerpc/SYS.h for PowerPC ports, for instance. The macro provides the few assembly lanugage instructions needed for the system call to set errno on error.

Before the clockctl implementation, adjtime(2), clock_settime(2), settimeofday(2), and ntp_adjtime(2) were implemented as the simple system call stubs. This has been changed in order to check for the existence and accessibility of /dev/clockctl.

The code is nearly identical for the four system calls. It can be found for settimeofday(2) in src/lib/libc/sys/settimeofday.c. It performs roughly the following checks:

This turns each call to settimeofday(2) into several system calls: getuid(2), open(2), and ioctl(2). For the sake of performance, we have a keep-state feature, so that libc can remember if a process has already used clockctl. This is done using the __clockctl_fd variable. This variable is carried by libc but it behaves exaclty like a global variable for the process. Of course, each process has its own __clockctl_fd.

__clockctl_fd describes the state of the process regarding clockctl:

On the first call to one of our four system call stubs, if UID is root, __clockctl_fd is immediatly set to -1. Otherwise, we attempt to open and use /dev/clockctl. Should this attempt fail, __clockctl_fd is set to -1. If it succeeds, then __clockctl_fd keeps the file descriptor returned by open(2). Future calls to the stub will use clockctl.

When __clockctl_fd is -1, the real system call is always used.

We end up with an implementation where the API for ntpd and other processes did not change. When the user process attempts to do a system call, we intercept it at the libc level and use either clockctl or the actual system call. This is nice, but the drawback is that we introduce some black magic in libc, which is not a nice solution. The good point is that since we did not change anything in the API, we can replace this black magic with anything else without disturbing user processes. For instance, if we ever introduce capabilities in NetBSD, we can revert to a void system call stub without ntpd being affected.

A word on the ioctl(2) system call and its use in clockctl: before talking about the changes to ntpd, it is worth explaining what the ioctl(2) system call does. On Unix systems, all objects are seen as files. This includes device files, terminals, and so on. Of course there are some object-specific operations that cannot be done through a file interface (read, write, lseek, etc.). These operations include, for instance, getting terminal characteristics when the file is the standard output, or ejecting the disk when the file is a a removable disk's device. Nearly everything that cannot be done through the standard file-related system calls is done using ioctl(2) calls.

Here is ioctl(2)'s prototype:

int ioctl(int d, unsigned long request, void *argp);

d is the file descriptor on which we operate, request is a value indicating which command we want to perform, and argp is an optional argument pointer. The structure of the argument itself depends on the request. Here is an example of ioctl use in a userland program, for getting the terminal width in columns:

/* col.c -- print the terminal width */
#include <stdio.h>
#include <err.h>
#include <sys/ioctl.h>

int 
main(void) {
	struct winsize ws;

	if (ioctl(1, TIOCGWINSZ, (void *)&ws) == 0) 
		printf("terminal width = %d\n", ws.ws_col);
	else
		err(1, "ioctl failed");
	return 0;
}

In ioctl() first's argument, we have 1, standard output, which is attached to the controlling terminal. TIOCGWINSZ is a macro defined in <sys/ttycom.h> for getting window information on terminals. The third argument here is a pointer to a struct winsize where ioctl(TIOCGWINSZ) will write its data.

Of course, TIOCGWINSZ will only work if the standard output is attached to a terminal. It's possible to check this:

$ cc -o col col.c
$ ./col
terminal width = 80
$ ./col > toto
col: ioctl failed: Inappropriate ioctl for device

For the clockctl device, we use four ioctl commands, one for each of our system calls. All are defined in <sys/clockctl.h>: CLOCKCTL_SETTIMEOFDAY, CLOCKCTL_ADJTIME, CLOCKCTL_CLOCK_SETTIME, and CLOCKCTL_NTP_ADJTIME. Each command uses a pointer to a structure holding the system call arguments for its arguments.

To keep things simple, we use exactly the same structures as kernel when passing arguments to system calls. They are defined in <sys/syscallargs.h>:

struct sys_settimeofday_args {
	syscallarg(const struct timeval *) tv;
	syscallarg(const struct timezone *) tzp;
};

struct sys_adjtime_args {
	syscallarg(const struct timeval *) delta;
	syscallarg(struct timeval *) olddelta;
};

struct sys_clock_settime_args {
	syscallarg(clockid_t) clock_id; 
	syscallarg(const struct timespec *) tp;
};

syscallarg() is a macro that deals with machine-dependent alignment and endianness issues. It enables us to deal with machine-independent, system call argument structure declarations, whereas in fact these are really machine-dependent.

There is a special case for ntp_adjtime, which needs to set the value returned to userland. Since ioctl(2) already uses it to indicate error conditions, it is not possible for an ioctl command to set ioctl's return value. We work around this by including the return value in the ioctl argument (this is from <sys/clockctl.h>:

struct clockctl_ntp_adjtime_args {
	struct sys_ntp_adjtime_args uas;
	register_t retval; 
};

The sys_ntp_adjtime_args struct is defined in <sys/syscallargs.h>. The kernel uses it to store ntp_adjtime(2) arguments:

struct sys_ntp_gettime_args {
	syscallarg(struct ntptimeval *) ntvp;
};

Now that we have a precise idea of how the alternate entry points to the time-related kernel functions are made, let us move to kernel changes.

Kernel Changes: clockctl Device Driver Implementation

The clockctl device driver is a plain pseudodevice driver. There is some psuedodevice documentation explaining how to introduce such a driver into the NetBSD kernel. Since the kernel registration process is well described in the document, I will not cover it here. Let us focus on the driver structure itself. It can be found within the NetBSD sources in src/sys/dev/clockctl.c.

Each driver provides a set of functions, known as methods. The kernel calls the driver methods to execute operations such as open, read, write, ioctl, and so on. In NetBSD, the method names must be the name of actual operation prefixed with the driver name. For clockctl, we have clockctlopen(), clockctlread(), and so on.

When the user does an open(2) system call on the clockctl device file, the kernel will use the device major number to identify that the operation must be serviced by the clockctl driver. For character devices such as clockctl, this is done by reading the cdevsw array, which is defined in a machine-dependent file. (Unfortunately, driver major numbers are not unified on different NetBSD ports.) For the i386 port, the array is defined in src/sys/arch/i386/conf/majors.i386.

Once the kernel knows which driver is to service the open request, it just calls the driver's open method. For read, write, ioctl, poll, and other operations, the process is the same. The actual code path is a bit complicated, because there are two abstraction layers before reaching the driver methods: the first makes any object appear to be a file to userland (this is done with struct file, defined in <sys/file.h>), and the second, known as the Virtual File System or VFS, enables the transparent use of different filesystem types (this is done using struct vnode, as defined in <sys/vnode.h>). An in-depth explanation of what happens exactly is out of the scope of this article, but it might pop up in an upcoming part of my series on IRIX binary compatibility on NetBSD.

For clockctl, most methods are meaningless; only ioctl actually contains more than just return 0;. The ioctl method understands four commands, which we described in the section about libc.

The job of the driver is really simple. A code snippet might say more than an explanation:

case CLOCKCTL_SETTIMEOFDAY: {
	struct sys_settimeofday_args *args = (struct sys_settimeofday_args *)data;

	error = settimeofday1(SCARG(args, tv), SCARG(args, tzp), p);
	if (error)
		return (error);
	break;
}

SCARG() is another macro which deals with machine-dependent differences in the way system call arguments are structured. The clockctl driver just calls the function the settimeofday(2) system call normally would have called. The only difference is that clockctl does not check if the user is root, since the permissions are enforced at the filesystem level. To request this ioctl(2) command, you must have opened /dev/clockctl for writing.

Userland Modification: ntpd

As we said, we wanted to make as few modifications to ntpd as possible. This goal was achieved, since we only added command-line options to specify the UID/GID to run the process as, and the directory to chroot to after initialization, using the following two new flags: ntpd [-u user[:group]] [-i /path/to/jail].

There is very little to tell about these changes. There are probably some OSes with ACLified system calls where it was already possible for a non-root user to set the time. Therefore the changes are not really NetBSD-specific. This is why they have been sent to the NTP team in order to be included in the next NTP release. Propagating this change to the NTP team also ensures that the -i and -u flags will not be used for something else in the future. Having to face a conflict between new ntpd flags from the NTP distribution and the NetBSD locally-patched ntpd would be quite uncomfortable.

Conclusions

This work enables ntpd to be chrooted. The method we chose to do this is not perfect. One could argue (and in fact some have) that it is bad to introduce magic into libc. The best solution to chroot ntpd would indeed be to introduce capabilities, which are kind of ACLified system calls. Some Linux distributions now ship a ntpd daemon running under a non-root UID, and they do this using capabilities. This is a much better approach.

However, capabilities alone are a huge project. TrustedBSD is a subproject of FreeBSD that is aimed toward the implementation of filesystem access lists, capabilities, and other security features. The project was started years ago and is not finished yet. On NetBSD, nobody is working on capabilities, and in fact, people are waiting for TrustedBSD to settle before importing some code. It could be a very long time before NetBSD would have capabilities available. In the meantime, clockctl appears to be a good solution for chrooting any time-related daemon.

The advantages of clockctl are simplicity and the fact that we do not modify any existing APIs. The ntpd modifications are only about chrooting, not about the way time is controlled. The day we want to replace clockctl by capabilities, there is nothing to change in ntpd; it will work immediately.

Finally, it is worth mentioning orthogonal efforts to improve general daemon security. systrace was introduced by Matthieu Herrb and Niels Provos from the OpenBSD project, and was integrated into NetBSD by Christos Zoulas. It is now maintained by Niels Provos, who joined the NetBSD team in the meantime. systrace enables the system administrator to write a list of allowed system calls for a given daemon. The kernel will ensure the daemon does not do any other operations. That way, if the daemon gets compromised, it will not be able to execute things like system("/bin/sh"), even if it runs as root.

On another orthogonal direction, Jason Thorpe has made changes to NetBSD-current in order to remove the need for an executable stack. On processors that support it (which is not the case for the old 80x86), the stack can therefore be set non-executable, thus making impossible the whole class of exploits that use stack buffer overflows. The non-executable stack is not a new idea; it can be found in various OSes, but it is extermely effective at reducing security holes, at least on machines with a processor modern enough to be able to set memory as non-executable.

Acknowledgements

Thanks to John Klos, Simon Burge, and Christos Zoulas for reviewing this article.

Emmanuel Dreyfus is a system and network administrator in Paris, France, and is currently a developer for NetBSD.


Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.