BSD DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


Big Scary Daemons Systrace Policies

by Michael W. Lucas
01/30/2003

One of the more exciting new features in NetBSD and OpenBSD is systrace(1), a system call access manager. With systrace, a system administrator can say which system calls can be made by which programs and how those calls can be made. Proper use of systrace can greatly reduce the risks inherent in running poorly-written or exploitable programs. systrace policies can confine users in a manner completely independent of Unix permissions. You can even define the errors that the system calls return when access is denied, to allow programs to fail in a more proper manner. Proper use of systrace requires a practical understanding of system calls, what programs must have to work properly, and how these things interact with security.

First off, what are system calls? Sysadmins fling that term around a lot, but many of them don't know exactly what it means. A system call is a function that lets you talk to the operating system kernel. If you want to allocate memory, open a TCP/IP port, or perform input/output on the disk, that's a system call. System calls are documented in section 2 of the online manual.

Unix also supports a wide variety of C library calls. These are often confused with system calls but are actually just standardized routines for things that could be written within a program. You could easily write a function to compute square roots within a program, for example, but you could not write a function to allocate memory without using a system call. If you're in doubt whether a particular function is a system call or a C library function, check the online manual.

You may find an occasional system call that is not documented in the online manual, such as break(). You'll need to dig into other resources to identify these calls. (break() in particular is a very old system call used within libc, but not by programmers, so it seems to have escaped being documented in the man pages.)

Also in Big Scary Daemons:

Running Commercial Linux Software on FreeBSD

Building Detailed Network Reports with Netflow

Visualizing Network Traffic with Netflow and FlowScan

Monitoring Network Traffic with Netflow

Information Security with Colin Percival

systrace denies all actions that are not explicitly permitted and logs the rejection to syslog. If a program running under systrace has a problem, you can find out what system call the program wants and decide if you want to add it to your policy, reconfigure the program, or live with the error.

systrace has several important pieces: policies, the policy generation tools, the runtime access management tool, and the sysadmin real-time interface. This article gives a brief overview of policies. Next time, we'll learn about the systrace tools.

Reading systrace Policies

The systrace(1) manual page includes a full description of the syntax used for policy descriptions, but I generally find it easier to look at some examples of a working policy and then go over the syntax in detail. Since named has been a subject of recent security discussions, let's look at the policy that OpenBSD 3.2 provides for named.

Before reviewing the named policy, let's review some commonly-known facts about the name server daemon's system access requirements. Zone transfers occur on port 53/TCP, while basic lookup services are provided on port 53/UDP. OpenBSD chroots named into /var/named by default and logs everything to /var/log/messages. We might expect system calls to allow this access.

Each systrace policy file is in a file named after the full path of the program, replacing slashes with underscores. The policy file usr_sbin_named contains quite a few entries that allow access beyond this, however. The file starts with:

# Policy for named that uses named user and chroots to /var/named
# This policy works for the default configuration of named.
Policy: /usr/sbin/named, Emulation: native

The "Policy" statement gives the full path to the program this policy is for. You can't fool systrace(1) by giving the same name to a program elsewhere on the system. The "Emulation" entry shows which ABI this policy is for. Remember, BSD systems expose ABIs for a variety of operating systems. systrace can theoretically manage system call access for any ABI, although only native and Linux binaries are supported at the moment.

The remaining lines define a variety of system calls that the program may or may not use. The sample policy for named includes 73 lines of system call rules. The most basic look like this.

native-accept: permit

When /usr/sbin/named tries to use the accept() system call, under the native ABI, it is allowed. What is accept()? Run man 2 accept and you'll see that this accepts connections on a socket. A nameserver will obviously have to accept connections on a network socket!

Other rules are far more restrictive. Here's a rule for bind(), the system call that lets a program request a TCP/IP port to attach to.

native-bind: sockaddr match "inet-*:53" then permit

sockaddr is the name of an argument taken by the accept() system call. The fnmatch keyword tells systrace to compare the given variable with the string inet-*:53, according to the standard shell pattern-matching (globbing) rules. So, if the variable sockaddr matches the string inet-*:53, the connection is accepted. This program can bind to port 53, over both TCP and UDP protocols. If an attacker had an exploit to make named(8) attach a command prompt on a high-numbered port, this systrace policy would prevent that exploit from working -- without changing a single line of named(8) code!

native-chdir: filename eq "/" then permit
native-chdir: filename eq "/namedb" then permit

At first glance, this seem wrong. The eq keyword compares one string to another and requires an exact match. If the program tries to go to the root directory, or to the directory /namedb, systrace will allow it. Why would you possibly want to allow named to access to the root directory, however? The next entry explains why.

native-chroot: filename eq "/var/named" then permit

We can use the native chroot() system call to change our root directory to /var/named, but to no other directory. At this point, the /namedb directory is actually /var/named/namedb, which is a sensible location for a chrooted named(8) to access. We also know that named(8) logs to /var/log/messages, however. How does that work, if the program is chrooted to /var/named?

native-connect: sockaddr eq "/dev/log" then permit

This program can use the native connect(2) system call to talk to /dev/log and only /dev/log. That device hands the connections off elsewhere. If you didn't know that this was how the program logged, however, you'd be confused. Although the program is running in a changed root, /dev/log is opened before the chroot happens and chroot(2) does not revoke access to open files outside the chrooted area.

We'll also see some entries for system calls that do not exist.

native-fsread: filename eq "/" then permit
native-fsread: filename eq "/dev/arandom" then permit
native-fsread: filename eq "/etc/group" then permit

systrace aliases certain system calls with very similar functions into groups. You can disable this functionality with a command-line switch and only use the exact system calls you specify, but in most cases these aliases are quite useful and shrink your policies considerably. The two aliases are fsread and fswrite. fsread is an alias for stat(), lstat(), readlink(), and access(), under the native and Linux ABIs. fswrite is an alias for unlink(), mkdir(), and rmdir(), in both the native and Linux ABIs. As open() can be used to either read or write a file, it is aliased by both fsread and fswrite depending on how it is called. So named(8) can read certain /etc files, it can list the contents of the root directory, and it can access the groups file.

systrace supports two optional keywords at the end of a policy statement, errorcode and log.

The errorcode is the error that is returned when the program attempts to access this system call. Programs will behave differently depending on the error that they receive; named will react differently to a "permission denied" error than it will to an "out of memory" error. You can get a complete list of error codes from errno(2). Use the error name, not the error number. For example, here we return an error for non-existent files.

filename sub "<non-existent filename>" then deny[enoent]

If you put the word log at the end of your rule, successful system calls will be logged. For example, if we wanted to log each time named(8) attached to port 53, we could edit the policy statement for the bind() call to read:

native-bind: sockaddr match "inet-*:53" then permit log

You can also choose to filter rules based on user ID and group ID, as the example here demonstrates.

native-setgid: gid eq "70" then permit

This very brief overview covers the vast majority of the rules you will see. As in so many things in computing, systrace does 90% of its work with 10% of its features. For full details on the systrace grammar, read systrace(1). Now that you can recognize a systrace policy when you see one, next time we'll look at some of the tools you can use to create your own policies.

Michael W. Lucas


Read more Big Scary Daemons columns.

Return to the BSD DevCenter.



Sponsored by: