One of the more exciting new features in NetBSD and OpenBSD is
systrace(1), a system call access manager. With
systrace, a system administrator can say which system calls
can be made by which programs and how those calls can be made. Proper use
systrace can greatly reduce the risks inherent in running
poorly-written or exploitable programs.
systrace policies can
confine users in a manner completely independent of Unix permissions. You
can even define the errors that the system calls return when access is
denied, to allow programs to fail in a more proper manner. Proper use of
systrace requires a practical understanding of system calls,
what programs must have to work properly, and how these things interact
First off, what are system calls? Sysadmins fling that term around a lot, but many of them don't know exactly what it means. A system call is a function that lets you talk to the operating system kernel. If you want to allocate memory, open a TCP/IP port, or perform input/output on the disk, that's a system call. System calls are documented in section 2 of the online manual.
Unix also supports a wide variety of C library calls. These are often confused with system calls but are actually just standardized routines for things that could be written within a program. You could easily write a function to compute square roots within a program, for example, but you could not write a function to allocate memory without using a system call. If you're in doubt whether a particular function is a system call or a C library function, check the online manual.
You may find an occasional system call that is not documented in the
online manual, such as
break(). You'll need to dig into
other resources to identify these calls. (
particular is a very old system call used within
not by programmers, so it seems to have escaped being documented in the
Also in Big Scary Daemons:
systrace denies all actions that are not explicitly
permitted and logs the rejection to
syslog. If a program
systrace has a problem, you can find out what
system call the program wants and decide if you want to add it to your
policy, reconfigure the program, or live with the error.
systrace has several important pieces: policies, the
policy generation tools, the runtime access management tool, and the
sysadmin real-time interface. This article gives a brief overview of
policies. Next time, we'll learn about the
systrace(1) manual page includes a full description
of the syntax used for policy descriptions, but I generally find it easier
to look at some examples of a working policy and then go over the syntax
in detail. Since
named has been a subject of recent security
discussions, let's look at the policy that OpenBSD 3.2 provides for
Before reviewing the
named policy, let's review some
commonly-known facts about the name server daemon's system access
requirements. Zone transfers occur on port 53/TCP, while basic lookup
services are provided on port 53/UDP. OpenBSD chroots named into
/var/named by default and logs everything to
/var/log/messages. We might expect system calls to allow
systrace policy file is in a file named after the
full path of the program, replacing slashes with underscores. The policy
usr_sbin_named contains quite a few entries that allow
access beyond this, however. The file starts with:
# Policy for named that uses named user and chroots to /var/named # This policy works for the default configuration of named. Policy: /usr/sbin/named, Emulation: native
Policy" statement gives the full path to the program
this policy is for. You can't fool
systrace(1) by giving the
same name to a program elsewhere on the system. The "Emulation" entry
shows which ABI this policy is for. Remember, BSD systems expose ABIs for
a variety of operating systems.
systrace can theoretically
manage system call access for any ABI, although only native and Linux
binaries are supported at the moment.
The remaining lines define a variety of system calls that the program may or may not use. The sample policy for named includes 73 lines of system call rules. The most basic look like this.
/usr/sbin/named tries to use the
accept() system call, under the native ABI, it is allowed.
man 2 accept and you'll
see that this accepts connections on a socket. A nameserver will
obviously have to accept connections on a network socket!
Other rules are far more restrictive. Here's a rule for
bind(), the system call that lets a program request a TCP/IP
port to attach to.
native-bind: sockaddr match "inet-*:53" then permit
sockaddr is the name of an argument taken by the
accept() system call. The
fnmatch keyword tells
systrace to compare the given variable with the string
inet-*:53, according to the standard shell pattern-matching
(globbing) rules. So, if the variable
sockaddr matches the
inet-*:53, the connection is accepted. This program
can bind to port 53, over both TCP and UDP protocols. If an attacker had
an exploit to make
named(8) attach a command prompt on a
high-numbered port, this
systrace policy would prevent that
exploit from working -- without changing a single line of
native-chdir: filename eq "/" then permit native-chdir: filename eq "/namedb" then permit
At first glance, this seem wrong. The
compares one string to another and requires an exact match. If the
program tries to go to the root directory, or to the directory
systrace will allow it. Why would you
possibly want to allow named to access to the root directory, however?
The next entry explains why.
native-chroot: filename eq "/var/named" then permit
We can use the native
chroot() system call to change our
root directory to
/var/named, but to no other directory. At
this point, the
/namedb directory is actually
/var/named/namedb, which is a sensible location for a
named(8) to access. We also know that
named(8) logs to
How does that work, if the program is chrooted to
native-connect: sockaddr eq "/dev/log" then permit
This program can use the native
connect(2) system call to
/dev/log and only
/dev/log. That device
hands the connections off elsewhere. If you didn't know that this was how
the program logged, however, you'd be confused. Although the program is
running in a changed root,
/dev/log is opened before the
chroot happens and
chroot(2) does not revoke
access to open files outside the chrooted area.
We'll also see some entries for system calls that do not exist.
native-fsread: filename eq "/" then permit native-fsread: filename eq "/dev/arandom" then permit native-fsread: filename eq "/etc/group" then permit
systrace aliases certain system calls with very similar
functions into groups. You can disable this functionality with a
command-line switch and only use the exact system calls you specify, but
in most cases these aliases are quite useful and shrink your policies
considerably. The two aliases are
fsread is an alias for
access(), under the native and Linux ABIs.
fswrite is an alias for
rmdir(), in both the native and
Linux ABIs. As
open() can be used to either read or write a
file, it is aliased by both
depending on how it is called. So
named(8) can read certain
/etc files, it can list the contents of the root directory,
and it can access the groups file.
systrace supports two optional keywords at the end of a
The errorcode is the error that is returned when the program attempts
to access this system call. Programs will behave differently depending on
the error that they receive; named will react differently to a "permission
denied" error than it will to an "out of memory" error. You can get a
complete list of error codes from
errno(2). Use the error
name, not the error number. For example, here we return an error for
filename sub "<non-existent filename>" then deny[enoent]
If you put the word
log at the end of your rule,
successful system calls will be logged. For example, if we wanted to log
named(8) attached to port 53, we could edit the
policy statement for the
bind() call to read:
native-bind: sockaddr match "inet-*:53" then permit log
You can also choose to filter rules based on user ID and group ID, as the example here demonstrates.
native-setgid: gid eq "70" then permit
This very brief overview covers the vast majority of the rules you
will see. As in so many things in computing,
90% of its work with 10% of its features. For full details on the
systrace grammar, read
systrace(1). Now that
you can recognize a
systrace policy when you see one, next
time we'll look at some of the tools you can use to create your own
Michael W. Lucas
Read more Big Scary Daemons columns.
Return to the BSD DevCenter.
Copyright © 2009 O'Reilly Media, Inc.