OpenBSD 4.1: Puffy Strikes Againby Federico Biancuzzi
OpenBSD 4.1 has just been released. Federico Biancuzzi interviewed several developers to discuss some of the new features for networking, active porting efforts (landisk and UltraSPARC III), work on SMP, and the improvements in spam fighting.
How did you improve the support for UltraSPARC III machines?
Mark Kettenis: OpenBSD 4.0 disabled the D-cache on UltraSPARC III CPUs, since running with the D-cache enabled made the system very unstable. This had a serious impact on performance. We eventually found the cause of the instability, which is almost certainly related to a "bug" in the CPU. Unfortunately Sun doesn't publish errata for their CPUs (like Intel, AMD, and IBM do), so it took a lot of effort to track this down. This allowed us to ship OpenBSD 4.1 with the D-cache enabled, which made the machines more than twice at fast.
We also made some serious improvements to the schizo(4) driver for the PCI host bridge that's in most UltraSPARC III machines. This host bridge has a lot of interesting features for tracking errors on the bus. But it is crucial to configure it right, otherwise the kernel will panic on recoverable errors. This took some effort to get it right, but now most hardware seems to work fine. These machines actually make very good machines to do driver development on, since the machine is able to give detailed information on failed bus transfers that will simply lock up your machine on normal i386 hardware.
Support for many VGA PCI frame buffers is still lacking. Most of these are rebranded cards from other vendors, and we have little hope of getting documentation for them.
Do you plan to port OpenBSD to UltraSPARC T1 too?
Mark Kettenis: Eventually, yes. However since these machines have multi-core CPUs we cannot fully support them until we have sparc64 SMP support. So getting multi-processor support is higher on the priority list right now. We've received some hardware donations that will help.
Support for the new PCIe-based machines is already working though, and will appear in OpenBSD 4.2.
Who worked on the landisk port? What are you using these little boxes for?
Otto Moerbeek: The port was mostly done by Michael Shalayeff, Miod Vallat, and Dale Rahn. Theo de Raadt and myself were quite involved as well at various stages. The little box can be used in any place were low-power and low-noise are important. Wireless gateway is an obvious application. I for myself like the little box because--due to its slowness and little memory--it makes you think twice about things like time versus memory use tradeoffs. It promotes a careful attitude when developing, since making a mistake could cost you a lot of time: a make build takes ages, even when using a USB disk instead of a flash memory card.
How is the development of rthreads going on? Is there any improvement in performance with SMP systems?
Artur Grabowski: We're right now in an evolutionary process of making the kernel more rthreads-safe. Rthreads proved to be a good starting point for making a more serious split of processes and threads within the kernel. They are still not very useful because of many limitations of the old model, but we're working on it.
I heard that there were some performance and stability improvements in pthreads. Would you like to tell us more?
Kurt Miller: Around the beginning of the release cycle the lockss.org project contacted me about some deadlocks they were experiencing using our JDK port. Some debugging revealed that one thread was blocked reading from a file descriptor and another thread was blocked on a
close() call for the same fd.
It turns out that POSIX doesn't define this behavior and different operating systems handle this in different ways. However, many threaded applications rely on the
close() call completing and the blocked
read() call to unblock and return EBADF. So I set about to fix this and discovered a series of fd races that could end up in deadlock or incorrect file status flags. Correcting these issues took three rather large diffs; one to work on file status flags, another to close off the races and the last to correct the original problem I described above.
One nice side effect of the new design is reduced contention for the file descriptor table lock. Threaded applications that create and destroy many fd's may see better performance as a result. I've had reports that the improvements have helped the stability of at least Asterisk, a CORBA application, Ada Web Server, and, of course, Java on OpenBSD.
What was the problem with AMD64 time keeping and how did you improve it?
Artur Grabowski: Wasn't a problem as such, it's just that we're moving all architectures to a new time keeping infrastructure that by default tries to be more precise, stable, and MP safe. The time counter infrastructure (originally from FreeBSD) gives by default nanosecond precision (if the hardware can provide good enough timers) or even better and is significantly faster for some tricks we want to do later. The plan is to convert all architectures to use it eventually.
fsck_ffs(8) command has been improved to be more robust to various forms of inode and superblock corruption. Was this corruption caused by a bug in FFS itself?
Otto Moerbeek: It is hard to tell if the one case of actual corruption that prompted me to work on
fsck_ffs was due to a bug in the kernel or hardware failure.
fsck_ffs must do its utmost best to not makes things worse than they are. I solved a few cases were wrong bits in the superblock or in inodes could make
fsck_ffs crash, and one case where an actual problem in a filesystem could make
fsck_ffs do the wrong thing and make things worse. These bugs were found with a filesystem fuzzer i wrote, which is a little tool to corrupt filesystems and then check to see if
fsck_ffs can fix the corrupt metadata. Some other bugs were found by code inspection: a signal handler race and some memory and file descriptor leaks.
Anything new in pkg_* tools?
Marc Espie: Mostly nothing that users will see, but there's been a large number of small bug fixes and usability changes. It's much harder to crash the tools and leave the system in an unstable way.
Most of my changes this release have focused on the ports tree instead, giving developers better and faster tools to diagnose problems. We now make use of the fact that packing-lists are finally fully independent from compilation. Specifically, one can request the packing-list of a given package without needing to build it, so finding conflicts between packages, or building databases of all files in every port is much simpler.
Among other changes, the whole way MULTI_PACKAGES ports are built has been simplified a lot, which leads to much simpler maintenance issues, and less packaging bugs. Plus, it's also faster.
These changes mostly derive from this year's ports hackathon, by the way. Just being able to see developers work is a great way to find out stumbling blocks and streamline the process.
There are also simpler ways to build perl ports related to CPAN (the Comprehensive Perl Archive Network), so we're ways more up-to-date this release than we ever were!
Who should we thank for porting OpenOffice?
Robert Nagy and Kurt Miller: Porting OpenOffice (OOo) was a group effort. It is a monster of an application and the largest port we have in the ports tree by far. In the end the majority of the porting and debugging work was performed by Robert Nagy and Kurt Miller.
Robert inherited Peter Valchev's initial work, setup a local CVS repository and build machine and provided access to Kurt so we could collaborate on the porting effort. After fixing some initial problems the build would complete but, soffice wouldn't launch. Debugging this monster proved quite difficult. After some rather extensive debugging sessions Kurt found the reason soffice wouldn't launch was because of a missing file that is created in the packaging phase of the build. After the final build problems in the packaging phase were corrected, OOo launched and the initial port was committed. However getting the port into a fully working state proved to just as difficult as getting it to launch.
From this point we had some pretty glaring problems to address. Several apps deadlocked upon startup, documents would not save in OOo format and building with Java didn't work. The deadlocks proved to very difficult to debug since the code that was having the problem was in C++ templates and gdb doesn't handle this very well at all. Well it turns out the boost headers detect if the application is threaded at compile time and we were missing -pthread from CFLAGS. The Java build problems were caused by a libz conflict where the internal libz in the jdk conflicted with the system libz linked with OOo. The document save problem was a result of not building OOo with Java, so when Kurt fixed the Java build the document save as problem was fixed too.
This just describes what we went through to get i386 up and running. Robert spearheaded the effort to get AMD64 up and running and adding i18n support. He's written an article about it on undeadly too.
Pages: 1, 2