O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


OpenBSD 4.0: Pufferix's Adventures

by Federico Biancuzzi
10/26/2006

On October 18th, OpenBSD celebrated its 11th birthday and ten years of punctual biannual releases. Now it's time for OpenBSD version 4.0, which includes tons of new drivers for wireless, network, and storage chips. Discover what's new and what battles developers must face daily to access documentation and support new hardware.

Warning: Federico Biancuzzi interviewed nearly 20 developers and assembled this long interview under the influence of Humppa-style music!

What is new in OpenBSD 4.0 for wireless drivers?

Damien Bergamini: Five new drivers for WLAN devices have been committed in OpenBSD 4.0. These drivers support the following chipsets:

acx(4) : TI ACX100/ACX111
pgt(4) : Conexant/Intersil Prism GT Full-MAC
rum(4) : Ralink Technology RT2501USB
wpi(4) : Intel PRO/Wireless 3945ABG
uath(4): Atheros AR5005UG/AR5005UX USB2.0

All these drivers require firmwares that are not freely redistributable, with the exception of rum(4) for which Ralink Technology has allowed us to redistribute the necessary firmware files under a BSD-style license.

uath(4) was imported just before the release and is pretty much work in progress so don't expect too much from it for the moment. Work on it is slow as it is based on reverse-engineering efforts (there is absolutely no documentation for it, not even in the form of a Linux driver).

OpenBSD is the first open-source operating system to have support for the Intel PRO/Wireless 3945ABG and the Atheros USB2.0 adapters without the need for blobs.

The zyd(4) driver (for ZyDAS ZD1211 chipset) didn't make it into 4.0 due to some remaining issues in the TX path that we were unable to fix in time for the release. These issues are now fully understood so we'll have a working zyd(4) driver very soon now.

A lot of changes have been made to our generic 802.11 layer and to the existing drivers (e.g. ral(4)) to improve interoperability when operating as an access point with a mix of 802.11b and 802.11g client stations).

The rate control algorithm that was used in ural(4) (AMRR) has been made generic and is now part of net80211. It is used by other drivers like wpi(4), rum(4), acx(4), and the upcoming zyd(4).

ifconfig(8) can now report the received signal strength as a percentage thanks to work done by Reyk Floeter.

New developers are being involved in wireless development now which is very encouraging for the future of wireless support in OpenBSD.

I read that this release includes some new Gigabit Ethernet drivers for chips made by Marvell/SysKonnect and Broadcom. I thought these vendors didn't give away documentation, so what happened?

Mark Kettenis: OpenBSD 4.0 has a new msk(4) driver for the Marvell Yukon 2 Gigabit NICs. These are not radically different from the older SysKonnect NICs supported by the sk(4) driver; Marvell basically replaced the DMA engine while keeping other parts of the chip the same. Still it took me quite a bit of time to produce a working driver because Marvell doesn't give us documentation. It took me hours of staring at the Linux sky2 driver before I grasped how the new bits worked. Figuring out how interrupts worked was especially hard since Linux does its interrupt handling in a completely different way now. An Open Source really isn't a substitute for proper documentation of the hardware. I can see why Marvell is reluctant to release documentation; the hardware is full of bugs and the list of errata would be embarrassingly long.

Brad Smith: These vendors do not give away documentation and the situation has not changed. For the Broadcom chipsets, what has changed is that Broadcom as of 6 months ago has provided FreeBSD with a bce(4) driver for their new NetXtreme II family of Gigabit chipsets.

Within days of the driver being committed Theo de Raadt made contact with the Broadcom engineer whom wrote the driver, David Christensen; 2 weeks later he was able to provide Theo and I with engineering samples. With assistance from Reyk Floeter the driver was ported over to OpenBSD as bnx(4), though with a somewhat rough implementation of the bus_dma code and having some known issues at that time, but it was work in progress.

Two weeks later Marco Peereboom managed to get a hold of a new Dell server in his lab, which happened to pose a set of the PCI Express NetXtreme II chipsets. This, in combination with the fact that the chipset will be very common soon, peaked his interest in trying to assist me with resolving the major remaining issue with the bus_dma code in the TX path. We had 2 testing and debug sessions and Marco was able to come up with the minimal set of bus_dma changes to get the driver going. There were still known issues at this point, but it provided us with a driver that was very much useable by the time the release rolled around.

Since the release improvements have been made in the TX path code to make the driver more robust under heavy UDP transmit traffic load and making the driver use the loadfirmware(9) framework, so that the firmware bloat can be removed from the kernel.

Although this is a fairly decent vendor driver for a change, it is no substitute for having the proper hardware documentation.

pf(4) now supports Unicast Reverse Path Forwarding (uRPF) checks for simplified ingress filtering. What does this mean, in concrete terms?

Damien Miller: uRPF verifies that the source address of packets received on a network interface matches the routing table. It can be used to filter packets that arrive from unexpected directions, such as ones with spoofed source addresses.

It is similar to the "antispoof" keyword that exists in pf.conf(5) already, but antispoof only works for directly connected networks whereas uRPF works for networks one or more hops away, at the cost of being a little more permissive.

A good description of uRPF can be found in RFC3704. Our uRPF implementation is what they call "Strict RPF" and suffers from the main limitation that they describe: it does not work properly when asymmetric routes are present. It would be cool to have an implementation of "Feasible Path RPF", but that would require a higher degree of cooperation with the routing daemons than presently exists.

You have enabled adaptive timeouts by default in pf. Why?

Henning Brauer: We have had that feature--adaptive timeouts--in pf for a long time. The more the state table grows to its limit, the shorter the timeouts, aka, the more aggressive we time out old states. Since no new states can be established when the state table is full, you really don't want your state table to fill up. Adaptive timeouts help a lot here, and timing out old states in that case is way better than preventing new connections. Thus, following the "sane defaults" paradigm, we have enabled adaptive timeouts by default. The parameters for adaptive timeouts are calculated relative to the state table limit.

It seems that you developed some features that let dhcpd(8) interact with pf. Could you tell us more?

Chris Kuethe: PF/dhcpd integration was motivated by the fact that we have an open wireless network at the University of Alberta that was suffering from users camping on addresses, and ill-maintained machines spreading viruses. It was the spread of worms that we most wanted to control. Infected machines easily generate several thousand states, several hundred complaints at the abuse desk and often slow the network to a crawl. And it's just rude to allow an infected machine off your net.

PF's "overload" table is a partial solution to this. Excessively chatty machines would have their states torn down and would be placed into a table whose members were denied further network access. Unfortunately there was no easy mechanism to remove an address from the table automatically, which would lead to a fairly quick denial of service. I remedied this by making it possible for dhcpd to remove an address from the "overload" table when it was leased to a new hardware device.

Second, we found that dhcpd was abandoning a significant part of our address space because machines were somehow camping on an address--using that address without properly leasing it. To discourage this behaviour, I made dhcpd add abandoned addresses to a table and remove them from that table when the address was properly leased. Machines in the "campers" table can be redirected to a web page instructing the user to use DHCP and have further connectivity denied until they do use dhcp. As soon as an address is leased, it is removed from the overload table.

Users of these features are cautioned against placing too much trust in hardware or IP addresses as they can be easily changed with ifconfig. PF/dhcpd should be treated as a nuisance mitigation technique; it doesn't completely solve the problem of infected machines, but it does help in keeping you from getting completely swamped when the next worm comes racing through.

Can you explain the new carp(4) group demotion feature? How can it improve reliability, and how does it interact with applications?

Marco Pfatschbacher: If you are running carp(4) on multiple interfaces and one of the interfaces fails, you want the remaining interfaces to be taken over to the backup host, which avoids routing one part of your traffic into a blackhole. Initially we were just bumping the advertisement skew on all carp(4) interfaces to a value of 240 in case of an error. In response to this, a backup host running in preempted mode would take over all carp(4) interfaces of the failed master.

In more complex setups however, this all-or-nothing behaviour is not always optimal. To allow more control of which carp(4) interfaces fail over together, we converted the global demotion variable into an interface group attribute. Thus one can move interfaces that combined provide one service into a separate group.

Additionally the value of the demotion counter has been added to a previously unused field of the carp(4) protocol header. This allows us to act smarter in cases of multiple errors: Each error condition (e.g., a link failure) increases the demotion counter and the host with the lowest error count will become master.

We also made the group demotion counter accessible to userland, such that system daemons can control carp(4) demotion. bgpd(8) can now hold back a carp(4) takeover until it has synced its routing table. sasyncd(8) [similarly] prevents carp(4) from preempting until it has received the complete SA's from the current master.

The current value of the demotion counter can be get/set via: ifconfig -g group-name. To look at the default group "carp", for example:

$ ifconfig -g carp 
carp: carp demote count 0

A lot of work has been done on ipsecctl, and it now completely supersedes ipsecadm. What features does it offer?

Christian Weisgerber: ipsecctl(8) features an intuitive configuration syntax, similar to pf(4), and sensible defaults. Our motto was that "IPsec should not be an enigma that is only usable by the ultra elite." In addition to ESP tunnels, ipsecctl(8) now allows setting up AH, transport mode, and also over IPv6--the full gamut of IPsec, everything you could do with ipsecadm(8), but in a more accessible manner. ipsecctl(8) will take care of configuring isakmpd(8) for you, making dynamic keying as simple as static setups and eliminating in virtually all cases the need for a cumbersome isakmpd.conf(5) file. Dynamic IKE support and USER_FQDN IDs have been added for the benefit of roaming users. The documentation has been overhauled with an eye towards explaining the most common setups at the beginning. It has been our goal to make setting up an IPsec gateway with OpenBSD as painless as a filtering firewall. sasyncd(8) has been better integrated with isakmpd(8) and carp(4), providing for robust IPsec failover.

What's new in bioctl and storage drivers?

David Gwynne: We've actually had a surprisingly large amount of work on storage in this release. For starters the behemoth that is pciide has had support for the following chipsets added:

One of the more fun additions in this release (in my opinion) is the new sdmmc support that Uwe Stuehler did. I think it's fun because he's completely emulating SCSI in the driver for the flash cards you plug in. Because of that he didn't have to write his own block device support since it just piggybacks the /dev/sd devices. It also means that the flash cards automatically take advantage of the work that has been done in the SCSI midlayer for hotplugging. It only took about 500 lines of code to make all that work. My guess is he'd need several thousand lines of his own code to do the same thing without the SCSI emulation.

For those of you who are in need of some more serious hardware for your storage needs, we have three brand new SCSI/RAID drivers:

mfi is the successor to the ami controllers, and to be honest it does improve on the older generation quite a lot. Also, from what I can tell, Marco Peereboom managed to write the driver before any hardware was realistically available for purchase.

The arc(4) driver was written after I had asked for ciss controllers to work on and someone sent me one in case I got bored with ciss. Instead of getting bored with ciss I got a bit distracted with arc. This hba has the simplest code path for performing IO that I have ever seen. If anyone is looking for hardware that performs well and has a reliable driver, I have to recommend this one because of its simplicity.

mpi(4) is actually a replacement for the mpt(4) driver. The mpt driver suffered several problems. The most annoying was its size and complexity. It was about 10 thousand lines of code spread over half a dozen files, which in turn made it hard to deal with other problems it had. For example, the old mpt driver doesn't support SAS hardware (only SCSI and Fibre Channel, and the FC stuff extremely poorly), it wasn't 64bit or endian clean, its bus_dma usage was borked, and it was lacking support for proper use of RAID volumes. I decided it would be easier to write a driver from scratch than to pull mpt apart, clean it up, put it back together again in a sane fashion, and then add support for the SAS hardware and fix RAID on it. So I spent most of c2k6 working on mpi, and now we have a driver that is about a third of the size of mpt, runs perfectly on any machine with working PCI, supports more hardware, and works with RAID volumes.

bioctl itself hasn't changed that much in this release, i.e., the userland tool looks the same as it did in the last release. What's new is the fact that there are now four drivers in 4.0 that support bioctl. In 3.9 and 3.8 we only had support in ami(4) but now we add mfi(4), arc(4), and ciss(4) to the list. This is cool because it validates what we've been saying all along: RAID controllers don't need magical vendor tools to be manageable. For the tasks that people care about it is possible to provide a generic tool to do them with, and we prove it in 4.0.

pciide(4) now supports a lot of new chipsets made by NVIDIA, ATI, VIA, Promise, and others. How did you develop these drivers? Did you have any problem getting documentation?

Jonathan Gray: For most of the chips they act in a rather similar manner to either older revisions or other chips so it is mainly a matter of matching on new devices. The additional Promise support came from NetBSD with some further changes; it still needs quite a bit of additional work before it is really solid though. NVIDIA and ATI don't release standalone SATA/IDE chips, they release them as part of other south bridge chips that integrate a variety of slower IO devices. Getting documentation on NVIDIA/ATI/SiS/ULI south bridge chips is near impossible from what I can make out. Promise/Highpoint do not release documentation on their chips either, but at least the motherboard chipset manufacturers are slowly moving to a common interface (AHCI).

What advantages does the use of PKCS #5 PBKDF2 provide while generating keys with vnconfig?

Ted Unangst: PKCS #5 PBKDF2 is a method to turn a user's passphrase into a secure key. Previously, whatever passphrase the user entered was used directly as the key. This opened the door for a number of attacks. For instance, someone could precompute how a block of all zeroes would encrypt with every word in the dictionary.

The new -K option fixes this in two ways. First, the passphrase is combined with a random salt. This prevents precomputation. Further, the salt and the passphrase are essentially hashed together many times. This slows down the cracking attempt (after the disk and salt are stolen) by increasing the effort required to guess each password. For example, assume an attacker did not precompute anything, but could guess 100 passphrases per second against vnconfig -k. That's about 40 minutes to guess all of /usr/share/dict/words. If we use an appropriate value of -K that requires 5 seconds to compute the key, it would take closer to about 2 weeks.

A lot of modern systems don't include a serial port anymore, so we have to use USB-to-serial adapters. This release includes three new drivers for various chips. What should we keep in mind when we buy such an adaptor and want to use it with OpenBSD?

Jonathan Gray: Basically whatever you buy is going to work now; there are a few minor exceptions such as the Keyspan adapters, and the MosChip hardware, but they aren't really that common. I've only heard of one device with the MosChip serial chip so far and someone is sending me one so I can write a driver. Not all of the drivers support all types of flow control, most support sending a break; generally this is due to no documentation from the vendor in the case of things like the Arkmicro driver.

A USB serial port does not replace the function of a normal serial port; you can't use them for kernel debugging for example. It is generally the sort of thing people use to manage groups of machines from central points. It is nearly impossible to know which chip you're getting when buying a USB serial device, so normally you buy/test one from a supplier, then get as many as you need, typically going for the cheapest one you can find.

It seems that the installer now supports host-specific files for easy customization. Can you tell us more?

Chris Kuethe: For some time now, the installer would search for a siteXX tarball along with the standard sets (baseXX, etcXX, ...) and if it was found, would allow the user to extract the contents of the tarball into the root of the newly installed filesystem. If this tarball installs an executable named /install.site or /upgrade.site (depending on whether an install or an upgrade is being done) the installer will chroot into the destination filesystem and run that executable. This can be used to pre-populate the system with packages, non-packaged applications, configuration information, etc.

Under certain circumstances that is insufficient. Or put another way, there are times when installed systems need host-specific changes. If these changes can be placed into a tarball named something like siteXX-hostname it becomes very easy to replicate or restore configurations for a large number of systems, over the network.

Kenneth R. Westerback: Alex Holst has been working on a system for maintaining large numbers of customized OpenBSD installs for a while. During c2k6 Michael Knudsen (as I recall) brought his work to my attention again and I finally took a close look at his toolset. When I got back from c2k6 I realized that it would be trivial to greatly simplify his efforts with a simple extension to the install scripts. This was the addition of a new set we eventually decided to call siteXY-<hostname>.tgz. Since the install scripts will always know the hostname early in the install or upgrade process, this was basically a free way to find a customization file for any host. As with all the existing sets, the install and upgrade scripts will list this set if found in the directory specified to hold sets. The sets for other hosts will not appear to clutter up the display. siteXY-<hostname>.tgz will be the last set installed and is marked for installation by default.

Now, if someone installs OpenBSD, then customizes their installation, those changes can be captured as siteXY-<hostname>.tgz and uploaded to the set directory for easy application in the case of disaster recovery. Fancier tricks are possible, like giving your host a "role" as a hostname for initial installs, e.g., "newdnsserver" which would find a customization file for that role. In this case you would have to manually change a few bits to change to a real hostname and then create a new siteXY-<hostname>.tgz to preserve the final configuration for future use.

I'm sure our user community will come up with even more imaginative uses as time passes. I would encourage those interested in creating and maintaining large number of customized OpenBSD installs to check out Alex's ongoing work with siteXYtools. Google will find it.

You developed prebind, a secure implementation of prelinking that is compatible with address space randomization. How does it work, and how does it compare with prelinking?

Dale Rahn: Prebind was developed because someone at my work had pointed out how much time a QT application on an embedded device spent in the dynamic linker before startup. I have maintained the ELF dynamic linker for quite a while and had implemented a symbol cache previously to reduce this startup time, but realised that it still didn't compare to prelinking.

One of the significant security features in OpenBSD is address randomisation (aka ASLR). Prelinking as implemented in Linux removes the randomisation feature so it would not be compatible with OpenBSD's security goals.

Since I was also using a Zaurus quite a bit, I realized how long some large applications took to start up. With some research I found that the data stored during the caching of the symbol lookup in the existing code could be saved as library and symbol index. As long as the library hasn't changed, the index information could be used to do a symbol lookup without ever having to do a symbol lookup.

Much like how prelinking works, there are "common" relocations to most libraries and there are "fixup" relocations necessary for some binaries i.e., binaries that override specific functions like malloc or that link with a different library which causes overrides like pthreads. However those fixups tend to be rather limited in number.

I have been intending to write a formal paper and present it at some conference, however have not had the available time to do that so far. This means that I don't have handy speed comparison numbers to show how much faster it is; however the amount of time spent performing relocations dropped by a factor of at least 10.

To configure prebind it is necessary to be able to write to all of the binaries to be set up as well as all of the libraries which the binary references, typically root.

The default mode is to replace the prebind data found on any programs or libraries touched during the processing; this makes it necessary to be run in the same invocation on all binaries in the system which are to be configured for prebind. Prebinding can also be performed in "merge mode" where existing prebind data does not get modified on libraries so that if a single binary is added to the system, it is not necessary to rerun prebind on the entire system, only the new files will be touched.

If an application uses LD_LIBRARY_PATH to locate its libraries either by the user setting it or if the application sets it in a script before running the main binary e.g., Firefox. If prebind cannot locate all of the necessary libraries it will not prebind the binary. This can be worked around for Firefox by adding Firefox's extra directory to the library search path using ldconfig (then it can be removed for normal operation).

Like prelinking, prebind does not improve the dlopen speed; the symbol lookup cache still exists, but it is not possible to predict which symbols will be present in the loading application to precalculate the prebind data.

The prebind code appears to be fairly robust at the current time; only a couple issues which remain are that are concerning.

If a single library that a prebound binary touches is modified (prebind data stripped, library replaced) then the binary will disable the prebind optimisation and perform normal symbol loading. This would then require all binaries to have prebind run on them again after any library change.

Currently there is no provision to configure the prebind data on a subtree, e.g., a system tree that has not yet been installed. Hopefully this will be fixed in the near future. At that time the possibility of OpenBSD shipping with prebind enabled by default will exist.

The prebind functionality has been incorporated into ldconfig(8).

What's new in the ports framework and in pkg_* tools?

Marc Espie: In 3.9, pkg_add over scp had big issues: pkg_add was starting one separate scp per package, and due to some legacy issues with the way scp works, those processes tended not to die, and to gobble all available processes.

So, instead, we used the same technique that rsync uses: pkg_add opens one single communication channel over SSH, and then uses it to transfer all information it needs. No more rogue processes. Added bonus: it's currently the fastest way to update packages over the network, because there is one single connection, instead of fleeting connections you need to tear up/restart. With decent network, it's as if you were updating packages locally.

Nikolay Sturm: With OpenBSD 4.0 we will improve our support for stable packages. First, there's a policy change. Until now I only backported security fixes to stable; after some discussions I was convinced that it is desirable to backport more changes, so that our users see even less reasons for mixing stable and current packages.

Second, with 4.0 we will provide stable packages for amd64 as well. Fabio Cazzin of NS3 kindly grants us access to a stable build machine. This is, however, an experiment and we'll have to see if it works out.

GNU RCS has been replaced with OpenRCS. Who worked on it? What advantages does it provide over GNU RCS?

Ray Lai: Joris Vink, Niall O'Higgins, Xavier Santolaria, and I worked on OpenRCS. It is compatible with GNU RCS, minus some missing functionality such as branch checkins. We hope to complete the missing functionality soon.

The main advantage OpenRCS provides is security. Throughout its development we have kept security in mind, identifying insecure patterns and eliminating them. As a side effect, our code is clearer and simpler.

Aside from that, our goal has mainly been compatibility with GNU RCS. Once this has been achieved, we may enhance OpenRCS with features, though we are more likely to work on enhancing OpenCVS.

Why doesn't this release come with X.Org 7.0?

Matthieu Herrb: Because it's a pretty big change, and we were not ready in time to include the new modular X in 4.0. Building and testing ports needs time to find and fix problems that can happen. We'll switch to the new modular X shortly so that there's a full release cycle to find and fix remaining issues.

Do you like the way the X.org developers split the system into smaller modules?

Matthieu Herrb: I'm not fond of it, but it's done now and we will work with that. The modular X has some advantages that should make development easier. In particular you won't need to rebuild all of X to try a patch or to apply an errata.

I noticed two interesting log entries: "Widen the aperture used for legacy vga space on macppc, needed for Mac Mini ATI graphics cards" and "Add sysctl_int_lower() API, consequence of which is that root can now lower the machdep.allowaperture variable without rebooting." Please tell us more.

Matthieu Herrb: Older OpenBSD/macppc releases did not limit the address space that was accessible though the aperture driver. It is now enforcing more strict limits, but this had to be done by trial/error, as the X drivers need to poke at various addresses to check if there's an x86 BIOS available or not and other assorted things. This is a step forward in protecting the hardware from malicious code that could be injected into the X server.

In 3.9 and before, the allowaperture variable was completely readonly if securelevel > 0. It can be decreased to restrict hardware access by X if the user decides that he doesn't need X after all.

The new release allows X.org to run without privileges when using the wsfb driver. How does that work?

Matthieu Herrb: This is for hardware running wsfb where X doesn't need hardware access (Zaurus and Sparc only). There were a few stupid things requiring root during X startup. Now the only thing that needs root on these platforms is opening the default log file (/var/log/Xorg.0.log), but you can use the -logfile option if X doesn't have any privilege to specify a file in your home dir, i.e.:

chmod 755 /usr/X11R6/bin/Xorg
startx -- -logfile ~/X.log

Post 4.0, this will be extended to other platforms able to run using the wsfb driver (alpha, macpcc, sparc64 and soon i386 and amd64, using VESA BIOS calls).

What is the status of sparc64 systems?

Jason L. Wright: The GENERIC kernel is capable of running on Ultra 1, Ultra 2, and Ultra 3 class processors now. There are still problems with some of the Ultra 3 based systems, however. schizo(4), the host bridge, appears to be fairly buggy as evidenced by the Linux/OpenSolaris workarounds. Unlike Linux and Solaris, the only documentation or errata we have for these chips is the Linux/OpenSolaris source code. This is NOT documentation and has slowed progress. We tried getting documentation from Sun; might as well have been yelling at a wall, so we do the best with what we have.

The Ultra 3 is still running without its L1 data-cache enabled. The L2 (aka E-cache) and the L1 instruction-cache is enabled. Performance isn't optimal, but the U3s, even with L1 data cache disabled are still the fastest sparc64's supported.

Besides dealing with the schizo(4) bugs and the disabled D-cache, the next major hurdle is support for the Cassini Ethernet controller found on many Ultra 3 systems. Work for this will begin when I finish my move to Idaho (most of my stuff is offline in anticipation of the move).

As far as Niagra goes, I'm not that worried about it. We need SMP support on sparc64 before Niagra is a concern. I would, however, love the opportunity to play with the Fujitsu processors... mmm, full register window set...

There have been improvements and new features supported in the drivers that manage CPU speed control, Intel SpeedStep, and AMD PowerNow in particular. What can we do now? And did you work on this using publicly accessible documentation?

Gordon Willem Klok: Well one thing that we can do now is scale the processor frequency and voltage on AMD's eight generation processors such as the Athlon64, Turion and Opteron this was added for 3.9 to the i386 architecture, 4.0 extended this support to the amd64 architecture. There has also been a lot of progress made with enhanced speedstep; Dimitry Andric has made some big strides in supporting newer Intel processors where they are no longer publishing the model specific p-state data. Support for processor scaling on the Zaurus was added, there were a lot of reliability fixes for the Pentium 4 clock control driver and the powernow variant found on the seventh generation athlon/duron.

For the most part AMD and Intel publicly document the mechanisms for frequency and voltage scaling. From the driver writing perspective these mechanisms are fairly straightforward; you plug values that correspond to the desired combination of frequency and voltage into a register specific to the model of the CPU. The rub is that increasingly ACPI is the "proper" mechanism for retrieving what values correspond to each state and what states each CPU supports.

While ACPI support is being worked on for OpenBSD it isn't ready so we rely on whatever legacy method is available. Unfortunately in the case of AMD this method is deprecated; many BIOS vendors don't include the requisite table so even though every modern amd64 processor supports Powernow in many cases we simply can't use it. In the case of Intel for a given CPU model, the supported states and the magic numbers that we need to plug into the register used to be gathered from a data sheet and written into the driver. These values are no longer spelled out explicitly in the data sheet and only seem to be available to BIOS writers under NDA.

How does OpenBSD 4.0 interact with VMWare, Xen, and other virtualizers? What about the VT features in recent AMD/Intel CPUs?

Anil Madhavapeddy: OpenBSD 4.0 will work normally using hardware virtualisation under Xen 3.0 and VMWare, using VT/SVM from Intel and AMD CPUs. It does not have any special guest tools support however.

Para-virtualisation support for running OpenBSD as an "enlightened" guest OS under Xen 3.0 is currently under development (it was sponsored as a Google SoC project, and development continues). It boots multi-user using a ramdisk, and sources available online.

What is your opinion on these virtualization technologies from a security standpoint? I have already seen some talks about rootkits that bypass the OS and put themselves between the CPU and the OS. Is there anything that OpenBSD plans to do to fight it?

Otto Moerbeek: A large problem with virtualization is added complexity: more code and more configuration. This makes your setup harder to audit and maintain. I already heard of systems where admins were reluctant to apply patches to the host OS, since it ran so many guest systems and they were afraid to take the host OS down. Of course you cannot expect any virtualization layer to protect you from security bugs in the host OS.

It also adds an attack vector: with real hardware, bad guys can try to use bugs in hardware, kernel, OS provided userland, and applications to gain access. With virtualization, they get a whole new layer to attack.

Virtualization can be useful for test setups and can provide isolation between applications, but it is certainly not a magic way to increase overall security.

As OS developers we cannot do a lot to provide extra protection. From our point of view, virtualization just provides an alternative execution environment for our kernel. The kernel has to rely on certain mechanisms, like page protection and the distinction between supervisor and user mode. If the kernel cannot trust the execution environment, all is lost.

Reading the changelog, I found this note: "A large amount of memory leak plugging in various system utilities inspired by Coverity reports, as well as ruling out of hypothetical NULL dereferences," and I saw a lot of fixed memory leaks. Is this something common, or did you run a new checker that helped you spot them?

Otto Moerbeek: This work has been done using Coverity reports on other platforms. We share quite some code with the other BSDs, so Coverity reports from those might apply to us as well. Based on the reports we saw, we hand checked other parts of the source tree as well, to find similar patterns. That resulted in some more fixes. Coverity does not publish specific reports for OpenBSD (yet).

When auditing code, we use various simple tools like grep(1) to hunt for various bug patterns. We also use lint(1), which was much improved lately, mostly by Chad Loder, Theo De Raadt and myself. But the most important tools are eyes and brains.

Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.


Return to O'Reilly SysAdmin

Copyright © 2009 O'Reilly Media, Inc.