ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Puffy and the Cryptonauts: What's New in OpenBSD 4.3

by Federico Biancuzzi
04/29/2008

The OpenBSD project is ready to announce the new release, OpenBSD 4.3, that will be officially available on May 1st (the only way to get it earlier is ordering the CD package).

As usual there are a lot of improvements and new tools and features, and it sounds amazing that they keep delivering these results with a six month release cycle.

Federico Biancuzzi interviewed a large group of developers to talk about the new networking tools (snmpd and snmpctl), the new features and scope of relayd (previously known as hoststated), how the configuration of carp was simplified, improvements in wireless drivers, storage limits and speed-ups, SMP support in sparc64, bug fixes and audits for some tricky coding practices, and much more!


Networking


I read that dhcpd(8) was working by luck, using overflow buffers to store options... would you like to tell us more?

Kenneth Westerback: In October 2007, shortly before 4.2 was released, Nahuel Riva and Gera Richart discovered that a carefully crafted client request could cause dhcpd to crash. A fix was developed by millert@ for 4.3 and lead to the first errata for the about to be released 4.2. In essence a client could request a specific size for the response generated by dhcpd which violated assumptions within the code and resulted in stack corruption.

In January Peter Hessler discovered that another carefully crafted request could cause the option storing logic to write data in memory it shouldn't be using and thus crash dhcpd. As with the October bug, the fault was straightforward to discover once someone encountered a failure.

In between these two discoveries the option processing logic in dhcpd got a thorough going over as I tried to make the logic clearer and understandable. As a result several bugs in handling option storage into the two overflow buffers were fixed. e.g. actually using the second of the overflow buffers! Safer initialization was introduced and more care taken to ensure all option buffers were correctly utilized.

The end result is a much more robust dhpcd for 4.3, which can return more options in each response by correctly utilizing all the option space available to it.

None of these changes should introduce interoperability issues. If anything it should reduce them by more correctly implementing the standards.

What changed in the way TCP responses to highly fragmented packets are constructed?

Markus Friedl: It's just a bug fix. When creating a response to a TCP packet, the stack made some assumptions about the layout of the original mbuf chain. After some IPv6-related changes these assumptions where no longer true, so now we create a packet from the scratch when sending TCP responses (usually for TCP-Resets).

You developed snmpd(8) an implementation of the Simple Network Management Protocol, and snmpctl(8), its control tool. What is the status of the implementation?

Reyk Floeter: I started working on working on snmpd(8) because I needed an alternative to net-snmp which is more secure, less complicated, reduced to basic functionality, and designed for OpenBSD. Many people picked it up and started using it even when it still was in a very early development stage. It has been tested with net-snmp, Nagios and some other free and commercial SNMP implementations or Network Monitoring Systems (NMS).

I also like to thank the user community and some developers for the very good feedback with useable bug reports, code review, and testing. We also tested it against the PROTOS test suite which helped to find some remaining issues; snmpd(8) is running very stable now.

For the future I neither plan to implement every existing MIB nor any exotic SNMP extensions like Agent-X, but there will be further work to add more MIBs related to TCP/IP networking and OpenBSD monitoring. It currently supports most of the SNMPv1/v2c MIBs, IP-MIB, BRIDGE-MIB, HOST-RESOURCES-MIB, IF-MIB, and the OPENBSD-SENSORS-MIB. It is also possible to send SNMPv2 traps via snmpctl(8) or from relayd(8).

hoststated(8)/hoststatectl(8) were renamed to relayd(8)/relayctl(8)... What is the new scope of the tool and what features have you added in this release?

Reyk Floeter: hoststated(8) has been started as a daemon for health checks on load balanced hosts - it was the "Host State Daemon" as a helper to extend pf's load balancing capabilities. The layer 7 relaying code I wrote extended the daemon in a significant way and the old name was a little bit misleading.

relayd(8) is a fully-featured TCP/IP relay, or Application Layer Gatway (ALG), where the health checking of hosts is just a part of the functionality. It currently supports TCP, HTTP, and DNS relaying, SSL "acceleration" or termination and the traditional layer 3 redirections. The grammar of the new relayd.conf(5) configuration file has been redesigned which will need some attention when migrating from hoststated. The grammar is more obvious, "services" became "redirections" because they're using the rdr functionality in pf, and tables look more like in the pf.conf(5) grammar.

relayd(8) is now also able to send SNMP traps via snmpd(8) when the state of a monitored host changes. This is a very nice feature to monitor load balancers in existing NMS. I also like the interface to make this happen; external daemons can open the /var/run/snmpd.sock and send TLV-based IMSG to snmpd(8) - there is no need to link relayd(8) against a SNMP library or to handle any ASN.1/BER encoding outside of snmpd(8) itself.

I heard you simplified the configuration of carp(4) load balancing. Please tell us more...

Marco Pfatschbacher: It has always been a minor inconvenience to set up CARP load balancing. You'd have to create multiple interfaces with the same address, manage those hostname.carp* files, and make sure to get the advskew and the link flags just right. To get rid of the need for multiple interfaces, I had to factor out the virtual host portion of carp into a separate struct that is kept in a list per carp interface. This makes it possible that one carp interface can now contain up to 32 virtual host instances. Rather than creating multiple interfaces with the same address, we can now just create a single carp interface and assign it multiple carpnodes with their respective advskews. This is a time-saver and should ease troubleshooting across CARP members.

Furthermore I replaced the link flags with more descriptive ifconfig balancing options.

Setting up an IP balanced cluster with two hosts now becomes as simple as:

 host-A# ifconfig carp0 192.168.1.10 carpnodes 1:0,2:100 balancing ip
 host-B# ifconfig carp0 192.168.1.10 carpnodes 1:100,2:0 balancing ip

The resulting state on host-A should be sth. like:

carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:00:5e:00:01:01
        carp: carpdev sis0 advbase 1 balancing ip
                state MASTER vhid 1 advskew 0
                state BACKUP vhid 2 advskew 100
        groups: carp
        inet 192.168.1.10 netmask 0xffffff00 broadcast 192.168.1.255

To have things more consistent there's now also an "ARP balancing" equivalent for IPv6: "NDP balancing".

Would you like to talk about the new wireless drivers you worked on?

Damien Bergamini: There will be four new drivers for 802.11 wireless devices in 4.3:

Special thanks go to Jonathan Gray and Theo de Raadt for their efforts to free the firmware for the ral(4) RT2860 devices. Thanks to their determination, we were able to ship the RT2860 firmware under the MIT license just in time for the 4.3 release! bwi(4), upgt(4) and iwn(4) require non-free firmware to operate.

Although iwn(4) and RT2860 are 802.11n devices, they only work in 802.11g mode for the moment. More work needs to be done in our generic 802.11 layer (net80211) to fully support these devices. I plan to work on 802.11n support after WPA support is integrated which is something I'm actively working on.


Storage


Is it true that you improved the speed of flash drives? How?

Stuart Henderson: We noticed that Sandisk CompactFlash cards were a lot faster than most others and wondered why. Naddy pointed out that the cards which performed slowly all needed single-sector I/O and that DMA transfer was being disabled for these. I looked at other OS and found a simple change made in NetBSD that looked promising, ported it across and did some testing - it didn't cause any negative effects, and improved performance a great deal in some cases, so into the snapshots it went. After a little while with no reported problems it seemed pretty safe, so it's now committed.

Looking at the dmesg submitted since then, I noticed that some of the new machines with solid-state hard drives (like the Eee PC) are also affected, so it turns out it was a really good time to make this change.

What limits does this release have when dealing with storage?

Otto Moerbeek: We finished very large parts of large disk support. Large disks are disks that have more than 2TB capacity, the sector count of such disks overflows an 32-bit integer variable. We now use in all layers of disk related code (disklabel, buffer cache, drivers, ffs and ffs2) 64-bit integers to address disk blocks. The disklabel format has also changed to support large disks and partitions (up to 128PB, though the current code limits it at 64PB).

The actual largest file you can create did not change, with the default block size the maximum file size FFS can store is 1PB. But the kernel limits file sizes to a maximum of 2^31 pages, so it ends up at being 8TB on most platforms. You'll probably need to create a sparse file to actually create one like that ;-)

The original version of FFS supports up to 1TB filesystems, due to it's larger inode FFS2 can support up to the largest partition we can handle. There are important issues though: a filesystem check of a large filesystem takes a lot of memory. That is, is may need more memory than a user process can allocate. To be able to actually use a large filesystem you'll need to newfs(8) it with larger fragment and block sizes than the defaults normally used. Solving this is high on the wanted features list, of course. Also, using FFS2 for any filesystem used in the install or upgrade process is not supported.

As for other filesystems we support, I do not know the limits of either the format itself or our implementation.


Platforms and Drivers


What's new in OpenBSD/sparc64?

Mark Kettenis: The most exciting new feature in OpenBSD/sparc64 is SMP support. All supported systems should work, with the exception of the Enterprise 10000 (for which support was added after the 4.3 release was made). The release has still some stability issues on systems with more than 4 CPUs (most of these are already fixed in -current), but in general the sparc64 SMP kernel is remarkably stable. Snapshots have been built on a dual UltraSPARC-IIIi machine since November, less than half a year after Theo complained that his shiny new Sun Fire V215 only "half" worked.

Adding SMP support to OpenBSD/sparc64 was fairly easy after Art Grabowski (art@) changed the scheduler code to be more machine independent. The fact that Sun obviously designed the UltraSPARC CPUs with the intention to build SMP machines also helped a lot.

Device support has been extended, and more of the onboard hardware such as temperature sensors and gigabit ethernet devices now work. Some of this work has been made possible because Sun opened up lots of documentation. David Gwynne (dlg@) played a major role in making this happen.

With SMP working on sparc64 adding support for Sun's new UltraSPARC T1 and T2 CPUs makes a lot more sense. And I'm happy to announce that OpenBSD 4.4 will include support for at least the UltraSPARC T1 machines.

What features have you added to eeprom(8)?

Mark Kettenis: One of the nice things about sparc and sparc64 systems is that they have Open Firmware. Open Firmware includes a full description of what hardware is present in the system. This information is used by OpenBSD to attach the right device drivers. Solaris has a tool to get a nice printout of the device tree, and I had been collecting the output for sparc64 machines. These dumps proved to be an invaluable tool for supporting new systems and new hardware. I really wanted a similar tool for OpenBSD. Federico G. Schwindt (fgsch@) saw that he could do this without too much effort and integrated it into eeprom(4). Since macppc systems are also based on Open Firmware, I later added support for these systems to eeprom(4) too.

Would you like to talk about the new audio drivers you worked on?

Alexandre Ratchov: The new envy(4) driver is for certain "professional" audio interfaces like the M-Audio "Delta" series. It plays and records in full-duplex 10/12 channel audio streams with 24-bit precision at up to 96kHz sample frequency. Basically, such cards are intended for multi-tracking, that's why they do not support the usual stereo 16-bit linear format.

Unfortunately most audio ports expect the device to support stereo 16-bit format, which makes envy(4) devices nearly unusable with them. However recent ports by jakemsr@ make such cards much more usable in the future (some are already in -current).

Having devices that support a large number of channels and "unusual" encodings gives the basics for working on interesting userland code. It also constrains the generic audio(4) driver which helps us spotting (and fixing) bugs in the code common to all audio drivers.

Jacob Meuser: At the beginning of the release cycle, work was still focusd on fixing existing issues in audio(4) ad ossaudio(3), the userland OSS emulation layer. Once the high level audio interfaces were working to a satisfactory level, focus switched to the low level drivers.

Changes for auich(4), eso(4), auvia(4) and ac97(4) were brought in from NetBSD, which fixed most of the unresolved bugs in the gnats database.

The ac97(4) and auvia(4) changes allowed for multichannel playback if the codec supported it. It just so happens that I have such devices. I kind of liked that, so I added support for multichannel playback to cmpci(4) as well.


Kernel-land and Development


I read that you Improved lkm(4) subsystem on amd64. What is changed?

Mike Belopuhov: Actually I've fixed an lkm(4) subsystem on amd64 :-)

The problem was that the code loaded by the lkm(4) is placed into the kernel_map (that is a main kernel memory allocation area) and this map is located outside of the kernel's jumpable area on the amd64 platform, i.e. kernel can't execute instructions placed in the kernel_map.

The kernel on amd64 uses only relative 32-bit jumps and is loaded into the upper 2 GB of the whole virtual address space. This is caused by a certain restrictions in the gcc compiler. Therefore kernel_map is located below the kernel load address and although the kernel can read and write any memory, it can jump to only the addresses within upper 2 GB.

So I've set up a separate lkm_map that is located in the jumpable area and taught the kernel to reserve space for lkm's there. This effectively allows the kernel to call any function within a loadable module.

This solution is based on the work by NetBSD developers.

Would you like to describe the new M_ZERO flag for malloc(9)? What are the advantages?

Artur Grabowski: Kernel size and simpler code. Since zeroing memory after allocating it is such a common idiom, being able to do both operations in one call saved a significant amount of code in the kernel.

"Many dangerous unsigned comparisons with -1 when checking the results of read and write calls have been eliminated". What was the problem and how did you fix it?

Ray Lai: The read(2) and write(2) return the number of bytes read or written on success, or -1 on error. A lot of people try to read from or write to a buffer like this: read(..., ..., sizeof(buf)). They then check if there is either an error (returns -1) or an incomplete read or write (returns a value less than sizeof(buf)) like this: read(..., ..., sizeof(buf)) < sizeof(buf).

The problem is that read(2) and write(2) return signed integers (ssize_t) so that they can return -1, but sizeof() returns an unsigned integer of the same size (size_t). When a signed and an unsigned integer of the same size are compared, the signed integer is first converted to its unsigned equivalent. This results in the -1 becoming the largest possible size_t (SIZE_MAX), so the check becomes: SIZE_MAX < sizeof(buf). This check never fails. In a loop, this can cause an infinite loop.

Kenneth Westerback fixed many of these by simply converting the check from "read(..., ..., sizeof(buf)) < sizeof(buf)" to "read(..., ..., sizeof(buf)) != sizeof(buf)". A signed to unsigned conversion still happens, but the check works for the majority of the cases (unless sizeof(buf) equals SIZE_MAX).

atomicio() was a created as an elegant wrapper to the clunky read(2) and write(2) API. It was first introduced in OpenSSH and has been copied to a number of other programs (netcat, OpenCVS, sendbug). atomicio() is called like read(2) and write(2), but has an additional argument in the front, either "read" or "vwrite" to specify which action you would like to perform ("vwrite" is used instead of "write" because the prototype for write(2) requires a "const void *" for the second argument. This allows atomicio() to use read(2) and write(2) interchangeably). It takes a size_t as the buffer size and also returns a size_t to show how many bytes it has processed. This makes it possible to check for both error and underflow by simply doing: atomicio(read, ..., ..., sizeof(buf)) != sizeof(buf).


Management and Ports


Would you like to talk about the work you did on serial console automagic configuration?

Kenneth Westerback: Prior to 4.3 there was limited support for the automatic configuration of serial consoles during the install process. The install script for i386 and amd64 offered the user the chance to automatically enable the first serial device found in the dmesg as a serial port. This had been added in 3.6 or 3.7 to allow easy installation on devices like Soekris that were intended to be used in production with serial consoles. But the support hadn't been extended to other architectures or really completed.

Then David Gwynne (dlg@) got a newish Sun sparc64 box whose serial console speed was 115200 and thus did not work with the default /etc/ttys 'console' entry which specified a speed of 9600. After this bit him, dlg@ expressed an interest in having the speed of the serial console automatically detected and set in the installed /etc/ttys. Theo pointed out that stty(1) was on the install media and 'stty speed' would reveal the speed of the console being used to install. As a result the install script (install.sh) was changed so that it used 'stty speed' to detect the speed of the console being used to install, and modified the 'console' entry of the installed /etc/ttys appropriately. The modification is only done if the 'console' entry is 'on' in /etc/ttys.

This modification made dlg@ happy.

After some thought it was realized that this meant serial consoles 'just worked' on all architectures that make use of the 'console' entry. i.e. all of our architectures except i386, amd64, macppc, alpha and zaurus.

Theo then pointed out that the dmesg on all architectures should have a line that specifies the serial console device, e.g. 'pccom0: console' on i386. It turned out that this wasn't quite true but some quick work by Miod Vallat (miod@) and Mark Kettenis (kettenis@) among others made it so. Then it was noticed that the /etc/ttys entries had unnecessarily diverged in their 'getty' fields. miod@ modified them to all use '... std.9600', give or take a speed or two.

With this done, it was possible to re-do the serial console logic used for i386/amd64 and extend it to alpha, macppc, and zaurus. This generalization even shrank the install scripts.

As a result, on 4.3 most architectures have serial consoles that 'just work'. The others now offer the chance to configure a device as a serial console if a serial device is discovered in the dmesg. The default choice of device and speed will the the current console or the first serial device found, and the default speed will be current console speed, or '9600'.

I saw that your work on ldattach(8) is used to attach a line discipline to a serial line to allow for in-kernel processing of the received/sent data. How does it work?

Marc Balmer: Line disciplines, in order to become active, need a program to open a filedescriptor on a tty(4) device and attach the line discipline to it using the TIOCSETD ioctl(2) call. Traditionally there was one such attachment program per line discipline which would be started either from the command line or in /etc/rc.local. With the introduction of new line disciplines I introduced ldattach(8) as a single program that can be used to attach any of the supported line disciplines. So instead of adding a new attachment program for a new line discipline, ldattach(8) can be extended. ldattach(8) has a slightly different syntax than the older slattach(8) and nmeaattach(8) programs to allow it to be used from the /etc/ttys file as well. For details, please see the manual page.

What is changed in the way sendbug(1) handle comments?

Ray Lai: sendbug(1) has two types of comments, lines that start with "SENDBUG: " or stuff between angle brackets "<...>". Not many people type lines starting with "SENDBUG: ", but angle brackets are used quite often, notably C files, which sometimes have lines such as "#include <stdio.h>". Combine the two, and you have mangled bug reports.

This issue was initially discovered by Deanna Phillips, who noticed that dmesgs included in bug reports were being mangled (notably lines such as "azalia0: <blahblahblah>"). My initial fix was to avoid parsing for angle bracket comments in the dmesg. It wasn't a very elegant fix, and added a bunch of code to determine where the dmesg began, but it worked okay for dmesgs. At the time I couldn't think of a better solution.

One day, Mickey included a diff in a bug reports and noticed that the include headers were being mangled. This time I thought about the problem some more, and realized that comments are only added by the initial bug report template, never by the bug reporter. This meant that if I could just remember the text in the comments, I could strip those out and ignore everything else.

In the latest and greatest version of sendbug(1), that is exactly what it does. It stores the comments in an array and strips out any text that matches. So unless someone includes lines such as "<PR category (one line)>", there should be no more false positives.

What are the major new features and changes included in the recent versions of OpenSSH?

Damien Miller: OpenSSH 4.8 is what was shipped on the OpenBSD 4.3 CD. 4.9 and 5.0 were security releases to fix a bad policy (executing ~/.ssh/rc for forced commands) and an X11 hijacking problem respectively. Both these fixes were backported for OpenBSD 4.3. Otherwise, all three version are pretty much identical.

The big new feature is chroot support for sshd(8). This has been requested by many users, but we have been reticient to do it because it can lead to vulnerabilities if either the sshd-side implementation or the configuration of the chroot environment is incorrect.

Most of the requests for chroot support have been by administrators who want to set up restricted file servers, so to avoid most of the common mistakes Markus Friedl implemented an in-process sftp server. This effectively links sftp-server(8) into sshd(8), rather than forking and executing it as a separate process. This particularly helps chroot setups because no special support files (e.g. /dev nodes) are required.

Users who need to chroot interactive sessions naturally need to configure a full chroot environment including binaries, /dev nodes, libraries and support files. This also applies to file serving via scp(1) too unfortunately, as it isn't as neatly designed or contained as sftp-server(8).

The other changes in this release are minor improvements and bugfixes.

Is the development on cwm(1) going on?

Okan Demirmen: cwm(1) has undergone many important changes in 4.3. In addition to a large code clean-up by oga@, license issues have been resolved. The remaining 9wm code was removed and rewritten by oga@, and with permission of the original author, cwm(1) is now ISC licensed.

With many new cwm(1) users, a few novel features have been added, such as the ability to resize windows and move the pointer with keyboard bindings. Additionally, default keybindings can now be overridden by user defined ones, while also allowing one to 'unmap' a keybinding. This can be useful to resolve a conflict with an application's key mapping.

One very convenient and valuable new feature, 'exec window manager', allows one to either restart cwm(1), or switch to another window manager, namely another version if cwm(1), without restarting the X server. Stay tuned as cwm(1) develops.

What is changed in the ports system?

Marc Espie: Not much is new in the infrastructure this release. Small bug fixes. New ports at a normal rate... 4.3 is close to 5000 packages, and we just crossed that barrier post 4.3.

We are slowly recognizing some weird update scenarios and fixing them.

As far as new ports/significant updates go, things are in ways better shape than in the past. We have a crack team of porters these days. As far as 4.3 goes, we are catching up on GTK and Gnome ports, for instance, thanks to Jasper and friends. Simon keeps Perl ports up-to-date, Deanna and Ajacoutot ensure we have a lot of nice games (highlight on Micropolis, the freed Simcity that Deanna ported), well, there are a lot of others I'm not citing by name who are doing an excellent job in keeping us up-to-date and improving a lot of things: Kurt is doing solid work on Java and foundation libraries, for instance. Our German quality insurance team makes sure nothing unseemly passes through, and the Swiss take care of our database needs.

Together with Mark Kettenis, we spent a long time tracking an obscure segfault in KDE4. Turns out that, with pthreads, every thread but the first had an incorrectly aligned stack on i386. Basically, there was a mix-up in the code between the stack and the frame pointer, with the result that the stack was always in the middle of a 16 byte block. This is usually of little consequence, except for vector operations on mmx and the like which expect a very stringent alignment. By the way, qt4 rocks: they have specific code in there to speed up most graphics operations depending on the processor...

You might think, this is so unlikely to happen, that it is just an obscure bug-fix, but in reality, this pattern (multi-threaded app, accelerated vector operation) is at the core of most video players. Turns this is a big reliability fix for vlc, xine, and friends.

In OpenBSD tradition, there's also a lot of background work that you won't see, as a user, in 4.3, but which is getting stuff prepped for the future...

As far as I'm concerned, I've spent a lot of time on make, recently. This is work that is not yet finished, but I've fixed 90% of the issues that prevented the use of parallel make to build anything non-trivial (and I really hope to fix the remaining 10% before 4.4). As a result, in 4.3, you can try to run make -j on the src tree, and there's a good chance it will work (some races still are not fixed, so depending on your machine, it may still fail). As far as I know, it works all the time on the xenocara tree, and it works on quite a few ports.

This is a huge boon for various work, including ports, since we can use a lot of machines to their full potential. You'll probably notice that SMP support across a wide variety of platforms has improved tremendously recently.

Yes, this is something that does not concern users directly. You can use packages, you don't have to recompile everything all the time. But for development, having a compile-test-tweak cycle that goes n times as fast is a tremendous help.

Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.


Return to ONLamp.

Copyright © 2009 O'Reilly Media, Inc.