ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


OpenBSD 4.1: Puffy Strikes Again

by Federico Biancuzzi
05/03/2007

OpenBSD 4.1 has just been released. Federico Biancuzzi interviewed several developers to discuss some of the new features for networking, active porting efforts (landisk and UltraSPARC III), work on SMP, and the improvements in spam fighting.

How did you improve the support for UltraSPARC III machines?

Mark Kettenis: OpenBSD 4.0 disabled the D-cache on UltraSPARC III CPUs, since running with the D-cache enabled made the system very unstable. This had a serious impact on performance. We eventually found the cause of the instability, which is almost certainly related to a "bug" in the CPU. Unfortunately Sun doesn't publish errata for their CPUs (like Intel, AMD, and IBM do), so it took a lot of effort to track this down. This allowed us to ship OpenBSD 4.1 with the D-cache enabled, which made the machines more than twice at fast.

We also made some serious improvements to the schizo(4) driver for the PCI host bridge that's in most UltraSPARC III machines. This host bridge has a lot of interesting features for tracking errors on the bus. But it is crucial to configure it right, otherwise the kernel will panic on recoverable errors. This took some effort to get it right, but now most hardware seems to work fine. These machines actually make very good machines to do driver development on, since the machine is able to give detailed information on failed bus transfers that will simply lock up your machine on normal i386 hardware.

Support for many VGA PCI frame buffers is still lacking. Most of these are rebranded cards from other vendors, and we have little hope of getting documentation for them.

Do you plan to port OpenBSD to UltraSPARC T1 too?

Mark Kettenis: Eventually, yes. However since these machines have multi-core CPUs we cannot fully support them until we have sparc64 SMP support. So getting multi-processor support is higher on the priority list right now. We've received some hardware donations that will help.

Support for the new PCIe-based machines is already working though, and will appear in OpenBSD 4.2.

Who worked on the landisk port? What are you using these little boxes for?

Otto Moerbeek: The port was mostly done by Michael Shalayeff, Miod Vallat, and Dale Rahn. Theo de Raadt and myself were quite involved as well at various stages. The little box can be used in any place were low-power and low-noise are important. Wireless gateway is an obvious application. I for myself like the little box because--due to its slowness and little memory--it makes you think twice about things like time versus memory use tradeoffs. It promotes a careful attitude when developing, since making a mistake could cost you a lot of time: a make build takes ages, even when using a USB disk instead of a flash memory card.

How is the development of rthreads going on? Is there any improvement in performance with SMP systems?

Artur Grabowski: We're right now in an evolutionary process of making the kernel more rthreads-safe. Rthreads proved to be a good starting point for making a more serious split of processes and threads within the kernel. They are still not very useful because of many limitations of the old model, but we're working on it.

I heard that there were some performance and stability improvements in pthreads. Would you like to tell us more?

Kurt Miller: Around the beginning of the release cycle the lockss.org project contacted me about some deadlocks they were experiencing using our JDK port. Some debugging revealed that one thread was blocked reading from a file descriptor and another thread was blocked on a close() call for the same fd.

It turns out that POSIX doesn't define this behavior and different operating systems handle this in different ways. However, many threaded applications rely on the close() call completing and the blocked read() call to unblock and return EBADF. So I set about to fix this and discovered a series of fd races that could end up in deadlock or incorrect file status flags. Correcting these issues took three rather large diffs; one to work on file status flags, another to close off the races and the last to correct the original problem I described above.

One nice side effect of the new design is reduced contention for the file descriptor table lock. Threaded applications that create and destroy many fd's may see better performance as a result. I've had reports that the improvements have helped the stability of at least Asterisk, a CORBA application, Ada Web Server, and, of course, Java on OpenBSD.

What was the problem with AMD64 time keeping and how did you improve it?

Artur Grabowski: Wasn't a problem as such, it's just that we're moving all architectures to a new time keeping infrastructure that by default tries to be more precise, stable, and MP safe. The time counter infrastructure (originally from FreeBSD) gives by default nanosecond precision (if the hardware can provide good enough timers) or even better and is significantly faster for some tricks we want to do later. The plan is to convert all architectures to use it eventually.

The fsck_ffs(8) command has been improved to be more robust to various forms of inode and superblock corruption. Was this corruption caused by a bug in FFS itself?

Otto Moerbeek: It is hard to tell if the one case of actual corruption that prompted me to work on fsck_ffs was due to a bug in the kernel or hardware failure. fsck_ffs must do its utmost best to not makes things worse than they are. I solved a few cases were wrong bits in the superblock or in inodes could make fsck_ffs crash, and one case where an actual problem in a filesystem could make fsck_ffs do the wrong thing and make things worse. These bugs were found with a filesystem fuzzer i wrote, which is a little tool to corrupt filesystems and then check to see if fsck_ffs can fix the corrupt metadata. Some other bugs were found by code inspection: a signal handler race and some memory and file descriptor leaks.

Anything new in pkg_* tools?

Marc Espie: Mostly nothing that users will see, but there's been a large number of small bug fixes and usability changes. It's much harder to crash the tools and leave the system in an unstable way.

Most of my changes this release have focused on the ports tree instead, giving developers better and faster tools to diagnose problems. We now make use of the fact that packing-lists are finally fully independent from compilation. Specifically, one can request the packing-list of a given package without needing to build it, so finding conflicts between packages, or building databases of all files in every port is much simpler.

Among other changes, the whole way MULTI_PACKAGES ports are built has been simplified a lot, which leads to much simpler maintenance issues, and less packaging bugs. Plus, it's also faster.

These changes mostly derive from this year's ports hackathon, by the way. Just being able to see developers work is a great way to find out stumbling blocks and streamline the process.

There are also simpler ways to build perl ports related to CPAN (the Comprehensive Perl Archive Network), so we're ways more up-to-date this release than we ever were!

Who should we thank for porting OpenOffice?

Robert Nagy and Kurt Miller: Porting OpenOffice (OOo) was a group effort. It is a monster of an application and the largest port we have in the ports tree by far. In the end the majority of the porting and debugging work was performed by Robert Nagy and Kurt Miller.

Robert inherited Peter Valchev's initial work, setup a local CVS repository and build machine and provided access to Kurt so we could collaborate on the porting effort. After fixing some initial problems the build would complete but, soffice wouldn't launch. Debugging this monster proved quite difficult. After some rather extensive debugging sessions Kurt found the reason soffice wouldn't launch was because of a missing file that is created in the packaging phase of the build. After the final build problems in the packaging phase were corrected, OOo launched and the initial port was committed. However getting the port into a fully working state proved to just as difficult as getting it to launch.

From this point we had some pretty glaring problems to address. Several apps deadlocked upon startup, documents would not save in OOo format and building with Java didn't work. The deadlocks proved to very difficult to debug since the code that was having the problem was in C++ templates and gdb doesn't handle this very well at all. Well it turns out the boost headers detect if the application is threaded at compile time and we were missing -pthread from CFLAGS. The Java build problems were caused by a libz conflict where the internal libz in the jdk conflicted with the system libz linked with OOo. The document save problem was a result of not building OOo with Java, so when Kurt fixed the Java build the document save as problem was fixed too.

This just describes what we went through to get i386 up and running. Robert spearheaded the effort to get AMD64 up and running and adding i18n support. He's written an article about it on undeadly too.

I saw the new BSD licensed pkg-config. What are the differences with the old code?

Marc Espie: Complete rewrite from scratch in an appropriate language, namely Perl. The project was initiated by Chris Kuethe, completed by me, and tested out by lots of people. The idea is that you should not be able to see the difference from the GNU version, but it's better. For starters, the code is about 10 times smaller, and more maintainable. It's also slightly more picky, and its internal representation can do very interesting stuff that we don't use yet.

Matthieu Herrb: pkg-config is required by the configure scripts in X.Org's modular tree. So in order to make it possible to build X in the base system, we needed an implementation of pkg-config in the base system (or in Xenocara at least).

What are the goals of Xenocara (an effort to port X.Org 7.2 to OpenBSD)?

Matthieu Herrb: I should say first that Xenocara is not part of OpenBSD 4.1, which still provides X.Org 6.9 using the "old" XF4 source tree. OpenBSD-current switched to Xenocara a bit after the code freeze for 4.1.

The main goal of Xenocara is, as you said to provide a build infrastructure for recent modular X.Org releases (the current code in xenocara is based on X.Org 7.2). The modular X.Org tree is split in hundreds of individual packages that need to be built and installed separately. And there are some third-party packages (Freetype, expat, Mesa, etc.) which are not provided by X.Org anymore. Xenocara is mostly a set of BSD makefiles that drives the build of these packages in the right order, passing their configure scripts the right options to provide a working X environment that is configured appropriatly for OpenBSD.

There are also local patches to the X.Org sources in Xenocara, like we did in the past, that either fixes bugs that were not fixed in X.Org or to provide some extra functionality that X.Org is not interested in supporting (mostly legacy architectures support). Our goal is still to get as many as possible of our local changes committed back to X.Org, in order to minimize the differences. Xenocara is not a fork from X.Org.

I hope that one of the benefits of the modular X.Org, the ability to rebuild just one package individually, will make it easier for OpenBSD users and developers to hack on X and help improve the code, both functionality-wise (DRI) or security-wise.

syslogd(8) can now pipe logs directly to other programs. How would you take advantage of it?

Henning Brauer: I added that for use with log analysis tools like logsurfer (in ports). Basically, logsurfer is a fast regex engine. I use it at bsws to find unusual messages on our central log host. With syslogd being able to log to logsurfer's stdin directly, we get near real-time analysis.

Network guys will be happy to play with the new tool called hoststated. What features does it provide?

Henning Brauer: Well, initially, Pierre-Yves Ritschard finally implemented the missing userland bits for the long present loadbalancer functionality in PF--host availability can now be checked and dead hosts can be removed from the pool. Reyk Floeter then added some layer7 support, so that it now speaks http and can balance to backend servers based on cookies, parts of the URL etc, and can do the SSL crypto work too.

The bridge driver now supports the Rapid Spanning Tree Protocol (RSTP). What advantages does it provide?

Henning Brauer: In one word? It's faster. That's really what it is about. Spanning Tree (STP) just takes quite some time to cope with a failed link. Rapid Spanning Tree (RSTP) does not need re-evaluate and recalculate the entire topology in this case and can thus fail over to alternate links way faster.

What is the status of transparent IPless bridges? What can we do and what can't we do yet?

Henning Brauer: I have always considered transparent IP-less bridges a very stupid setup, and nothing changed in that.

Multiple routing tables. What does it mean for PF?

Henning Brauer: The kernel used to have one routing table per address family--one for inet, one for inet6, one for IPsec, usually. Now it can have multiple tables. From within PF, you can select which routing table should be used for the route lookup later--you can implement policy routing with this. But much more could be done--this is really only the groundwork. It could be possible, in future, to have overlapping address ranges on interfaces and place interfaces into different routing tables, forming a kind of virtual routers. And of course, the routing daemons will learn to make more use of alternate tables.

Claudio Jeker: PF is only a piece in the concept of multiple routing tables. PF is "just" used as packet classifier similar to altq. This makes it possible to classify the traffic and selecting different tables to route that traffic. Using multiple routing tables together with bgpd enables OpenBSD to implement customer VPNs in ISP networks--in Cizzz-coeee speak VRF-lite (virtual router forwarding). 4.1 only includes basic support--PF can be used as classifier and it is possible to specify the table that bgpd should use. There is still a lot missing but OpenBSD is about evolution and not revolution. Expect to get more and better support in the next release.

PF in 4.1 comes with two new default settings: keep state and flags S/SA. Why?

Claudio Jeker: More and more system use scaled TCP windows. PF's state tracking needs to match on the initial SYN packet to get the correct window scaling. When the state is created on a different packet (e.g., a SYN/ACK) the window scaling is wrong and after a few packets the connection stalls. Using keep state and flags S/SA by default should reduce misconfiguration and is therefore a saner default. It is still possible to do stateless filtering by using the no state option.

Henning Brauer: Stateless filtering just doesn't make all that much sense. Stateful is faster, and it is better. On top of that, most of the time we saw stateless filters, the "keep state" was missing on accident, which then lead to all kinds of very hard to track down errors. So in the end this is really just the "good defaults" mantra. You can use "no state" if you really want to filter statelessly.

The flags S/SA is just there so that states are only ever created by the initial SYN packet so that we see all options for the connection like window scaling--missing that can lead to failures that are very hard to find.

You introduced two socket options to play with the IP TTL field. I am wondering if we are going towards a future where each application will try to bypass the OS stack to make its own security check on its own network connections...

Claudio Jeker: The IP_MINTTL option was specially implemented for bgpd. It reduces the risk of attacks by dropping all packets that have a too small TTL. For directly connected routers the TTL can be set to 255 and so packets sent from further away are automatically dropped. This protects the long living bgpd TCP connections for RST and other TCP window attacks. It is a simple mechanism invented to protect underpowered BGP routers that are unable to use real crypto for protection.

Henning Brauer: There were two TTL related socket options added. IP_RECVTTL, when set on UDP or raw sockets, allows an application to receive the TTL of incoming packets. Nothing in the tree uses it so far.

IP_MINTTL on the other hand only works for TCP sockets (so far, it could be implemented for UDP and raw too). Applications that set the minimal acceptable TTL there, and packets with a smaller TTL are discarded. This is intended to be used to implement the ttl security hack in bgp, later generalized standardized as "Generalized TTL Security Mechanism" in RFC 3682. It is very simple: basically, the sending host sets the TTL to is maximum value, 255. The receiving host checks that the TTL, that gets decreased by each router on the way, is not smaller than it expects from the distance to the other host. So for hosts on the same IP network, the receiving host checks the TTL to be 255. This implies that the packet has not been routed at all, so that an attacker had to be on the same network segment as the target. This makes an attack much harder up to unfeasible with point-to-point links. bgpd implements this technique.

Is IPv6 still enabled by default?

Henning Brauer: Of course.

Why did you make pflog a clonable interface?

Henning Brauer: Biggest incentive: spamlogd.

spamlogd needs a pflog interface to track which mailservers you send mail to. The interference between logging and spamlogd sMTP watching was annoying--as in, all that sMTP traffic in you rlogs that you need to filter out. Aside from that, separate logging for, say, debugging, or per vlan, or whatever, on a per-rule basis is a good idea anyway. So: make pflog clonable, and teach PF to send the log traffic to specific pflog interfaces (i.e., for spamlogd).

  ifconfig pflog5 create
  spamlogd -i pflog5
  in pf: pass out log(to pflog5) to port smtp

spamlogd and pflog have been modified so they can work with alternate pflog interfaces of course, instead of hardcoded pflog0 before. And of course pflog0 no longer clutters ifconfig output on machines without pf (same for pfsync, but after 4.1) :)

Oh Bob, please tell us everything about your improvements in spam fighting!

Bob Beck: Well, there's been a number of large-ish changes to spamd.

First of all, greylisting is now the default mode of operation, and we've moved the config files to /etc/mail--so this release does show a bit of a flag day for people running spamd. Other things that changed are:

We've also updated the default /etc/mail/spamd.conf examples to include a couple of useful new short-term blacklists, for those of us not running big sites.

What is the status of WEP/WPA/WPA2 support in OpenBSD 4.1?

Jonathan Gray: Most if not all drivers support some kind of hardware or software WEP. There is currently no working WPA support. WPA builds on 802.1X which in turns builds on EAP which came about due to PPP. Developers using wireless networks tend to prefer using authpf(8) for SSH based access control and IPsec if they require encryption.

From what I've heard, WPA is a compatibility nightmare, for instance to authenticate to a Cisco RADIUS server from a Windows machine you have to manually download a hotfix from Microsoft. No conference I've been to has ever required WPA/802.1X for network access, they don't want to deal with the pain of having to debug it.

So there are a few problems, one is that no one is terribly interested in developing the required code for it, and the other is that all the freely available 802.1X supplicants seem to be vastly overengineered. The focus is more towards having as much hardware as possible just working out of box than dealing with the pain of yet another IEEE state machine.

Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.