IP balancing support has been added to carp(4). Can you explain in which scenarios it can be used and how it works?
Marco Pfatschbacher: IP balancing provides the same functionality as ARP load balancing, but without its limitations. It can also share traffic that comes across routers and it works with IPv6.
IP balancing can be used to build high available load balanced systems for servers, VPN gateways, firewalls, or to make OpenBSD based load balancers load balance themselves.
The basic concept is that we use the shared medium properties of the Ethernet to direct the traffic to all carp nodes. Each one hashes the source and destination IP at ip_input() and decides based on the status of the carp load balancing group whether a packet should be accepted or just dropped silently.
The scalability mainly depends on the ratio between network traffic and the load that it causes on the server side. This is because each node has to cope with the incoming traffic of the entire cluster up to ip_input(). For example, IP balance will not help as much to scale plain routers. A CVS server however could be perfectly scaled up to 8 or more nodes.
Currently IP balancing is a little complicated to configure, since each load balancing group has to be built out of multiple carp interfaces. I'm working on a change to integrate the load balancing into a single carp interface.
You also fixed a problem that affected carp...
Henning Brauer: I basically fixed a bug I found the hard way... on a production router. :(
To add/change/delete/get a route, you send a message on the routing socket. That message is echoed to all open routing sockets. So by opening a routing socket and listening to the messages, you can keep track of all changes to the routing tables; the routing daemons use that to keep their views of the kernel routing table in sync, and "route monitor" allows you to see these messages in realtime. When something inside the kernel changes the routing table, it must make sure a message indicating so is generated and sent on the routing sockets. Everything does. Carp did not. Now it does. :)
Carp plays with routes for the IP adresses on the carp interface when changing from master to backup and vice versa. The effect of the missing routing messages was that when the bgpd process was started and the carp interface was in backup state, bgpd got the wrong nexthop for connections over the carp interface, and thus, once a failover happened, bgpd blackholed traffic that was supposed to go over the carp interface.
There are a lot of new features in hoststated(8), would you like to describe the most interesting ones?
Pierre-Yves Ritschard: Hoststated has had a lot of improvements and new features between 4.1 and 4.2.
First and foremost, hoststated now has layer 7 support, which means it is not only able to load balance at the packet level (layer 3), but at the application level. Our layer 7 support includes HTTP SSL termination, generic SSL termination, HTTP header manipulation and more.
Hoststated is also now able to gracefully reload for layer 3 configurations, while layer 7 configuration reload will follow shortly. Additional reporting has been added and hoststatectl can now show host reliability.
As always, we've done our best to provide a clean and consistent configuration syntax, and have more plans to improve it for the next releases!
ftp-proxy(8) is now able to automatically tag packets passing through the pf(4) rule with a supplied name. How does it work?
Henning Brauer: Well, it is really simple. ftp-proxy just adds "tag foo" to the rules it inserts for tracking the data connection, where foo is the name you supplied. That makes it way easier later on to match packets which we handled by these ftp-proxy-inserted rules, be it for filtering or queueing or whatever else pf allows.
What's changed in ftp(1)?
Pierre-Yves Ritschard: There are three new things to note in ftp(1). First it is now able to go through HTTP proxies requiring passwords, just like it was able to send a password to an FTP server. Many environments provide access through authenticated proxies, its nice to be able to still use ftp(1) and pkg_add(1) there.
ftp(1) is now also able to parse Netscape-like cookie jars, all Netscape and Mozilla browsers use this format, while it won't store cookies, it will allow you to read the ones created by your everyday browser. This is especially useful when you need to download a file which requires HTTP authentication through cookies but do not want to rely on your browser's download manager.
Last, Marc Espie provided a way to keep FTP control connections alive even in environments where the TCP session ripping is overly aggressive. ftp(1) can now send NOOP packets every once in a while to maintain a flow of data on the control connection and keep it from being timed out before a transfer on a data channel is done. This will help users of pkg_add relying on a FTP server from seeing timeouts when downloading big packages.
What's new in the ports framework and in pkg_* tools?
Marc Espie: Users shouldn't notice much, but a lot of stuff has changed internally. There's been a large number of internal clean-ups in pkg_add, and a few changes to related tools such as ftp(1). Some of them are not really very visible, since they're mostly preparation for further things to come.
The most useful change is probably the addition of FTP_KEEPALIVE. People who live behind a firewall that drops connections are going to love this. If you set FTP_KEEPALIVE to a duration (say 60 seconds), then ftp will try to ensure an inactive connection doesn't get dropped. This makes a big difference to the reliability of pkg_add over
The second interesting change is that pkg_add now stops at the first location in the PKG_PATH that has suitable candidates for addition/updates. Thus, you can now fill up PKG_PATH with suitable back-up mirrors, and not have to confirm each choice through 10 package candidates.
A few minor issues were fixed as well, pkg_add will yield better diagnostics when, for instance, it can't find a library that matches a dependency. It will also deal with some fringe cases better... hopefully, you won't notice any of this. It just means updates will work transparently.
We now have enough experience to say
pkg_add -u rocks. It works as advertized, and more, and should be able to let you update your system through two or more releases.
As far as ports go, there's more stuff, as usual. And it works better. Most software has been updated to newer versions. If, by any chance, you still build stuff from source (though discouraged), you'll notice that distfiles checksums are now using SHA256, to satisfy paranoid people.
There haven't been a lot of changes to the ports infrastructure, it's obviously fairly stable these days. A few tweaks like STARTDIR have come in. You can use STARTDIR to start a build (or anything) at a given place in the ports tree.
As far as new ports go, the most noticeable one this release is probably apache2. We finally added it to help porting more stuff to apache1, the one and truely apache in the OpenBSD tree. Oh yes, and there's been a big gnome update.
In short, there's nothing really exciting for the end user. Internally, we're very happy to see things ever get more robust. We see less and less bugs in package building, while the number of ports still grows at the same rate.
I noticed that the Gnome desktop received an update after some years of inactivity. Can you tell something more about that?
Jasper Lievisse Adriaanse: When I was working on porting Workrave I noted that first of all Gtk and it's dependencies were badly outdated. Once I had them updated, Workrave started complaining about a lot of missing C++ bindings for Gnome. At that moment I had some spare time from school and decided to give updating Gnome a go. In a rather short period of time me and martynas@ updated most of the Gnome ports. This was really needed as most components of the Gnome desktop were at version 2.12. So, OpenBSD 4.2 ships with Gnome 2.18.
This is the first release that includes Xenocara, a port of XOrg 7.2. What are the differences with the past (XFree/XOrg 6.x) and what do you see as improvements/advantages?
Matthieu Herrb: There are not too many differences from the user point of view. The main difference is for developers or people interested in rebuilding some parts of X after applying a patch: X.Org 7.2 now is built in a modular way, each module using GNU autotools as its build system. Xenocara add a BSD-style Makefile wrapper over this to drive the build in the correct order. In this new world it's no longer required to rebuild all of the X tree to recompile just a little driver or library. You can change to the directory holding the bits you want to recompile and run
make -f Makefile.bsd-wrapper build to build and install it. I hope that will help getting more developers involved with X.Org.
For users the main change will probably be that more video drivers now use the monitor information returned by DDC probes to auto-configure them. This means that more people can run X without bothering about a configuration file.
cwm(1) has replaced wm2 as a simple-looking low-resource window manager. Why?
Matthieu Herrb: Because we are interested in having such a window manager in the tree, with modern features. Unfortunatly wm2 (not to mix with wmii) is not actively developed anymore. Cwm matches some of the criteria and has attracted the attention of enough OpenBSD developers to provide a good ground to implement the missing features (mostly the NETWM protocol support, so that Gnome/KDE applications behave correctly) in the near future.
I saw this note in the changelog: "Bring in GLw from XF4 to xenocara to replace the Mesa version." Is it something you would like to talk about?
Matthieu Herrb: Not really. The version in the XFree86 tree is using some linker tricks to make it compatible with the Motif toolkit (in addition to Xaw), while Mesa has not picked up this code. There's in my knowledge only one OpenBSD user who actually uses libGLw with Motif.
But this gives me the occasion to say that the OpenMotif port has been updated to version 2.3 in the ports. This is a good news for people using motif, since this version adds support for Xft fonts (client-side, anti-aliased). This may draw some more attention to the Motif toolkit, which unlike Gtk or Qt has been standardized by IEEE.
sendbug(1) has been rewritten, why?
Ray Lai: Largely maintainability. The old sendbug was a shell script that had a lot of problems, but nobody wanted to touch it because it was just a giant, convoluted shell script. I am fine with shell scripts, as long as they remain small. Once they grow to the size of sendbug, they are really difficult to maintain. Rewriting it in C made it a lot easier to deal with, since the C environment is a lot more controlled and you don't need to worry about weird environment variables, filenames, or editors causing behavior changes.
While rewriting sendbug, Theo and I discovered that there were a lot of functions dedicated to calling an editor and waiting until the editor closed. Sounds like a simple function, but almost every implementation had bugs in them. So in sendbug I tried to write that function correctly, getting feedback from Theo and Todd Miller. I then copied that function to the other implementations, to eliminate any bugs introduced in variations of this code.
Any change in the way OpenBSD handles bug reports?
Ray Lai: No. Sendbug is only used for sending bug reports, the backend that handles receiving reports remains the same. However, we did add some details about the users' systems to the bug reports themselves, such as the dmesg. People forget to add their system's dmesg to reports all the time, even when they are relevant. This saves some work for the reporter, now that it is done automatically.
Did you see the paper presented by Robert Watson at USENIX WOOT07? I am wondering what users of OpenBSD 4.2 should do about systrace and sudo.
Todd C. Miller: Robert contacted me when he was writing his paper and I reviewed an early draft. There seems to be some confusion with respect to sudo and systrace. The paper describes an experimental version of sudo that was enhanced to use the systrace device directly for the purpose of intercepting the execve system call. This code does not exist in any released version of sudo, though it can still be found in the sudo cvs repository. I had intended this to be part of sudo 1.7 but abandoned work on it when it became apparent that a user could work around the restrictions. If systrace were to be modified to use a look-aside buffer for the kernel arguments I may revisit the sudo systrace support.
Browsing the changelog I found "Fix a 10-year old bug in make(1) which was causing it to spin if given large -j values." Does this mean that we can finally use -j when building the src tree? And what about ports? Did you run any benchmark?
Constantine A. Murenin: It was a pointer arithmetic bug that was corrupting internal datastructures of make(1). It all started with my hardware upgrade, when I decided on using make(1)'s -j option for compiling the kernel. I have shortly noticed how unreliable it was—when a high value was given to -j, say 16 or 24, make(1) would often stop building anything and would start consuming one hundred per cent of one of the CPUs until ^C.
I then turned on some of our malloc.conf(5) options, and was able to reliably crash make(1) on a regular basis. The debugging revealed multiple problems with the memory allocation code in make/job.c, all of which have now been fixed. Additional details are available in an undeadly article.
I did do some benchmarking on building the kernel with various -j options, and results are available in my LiveJournal. In short: the difference with building the kernel is quite substantial, and there are no stalls anymore. As for the userland and ports, espie@ is now working on making make(1) do a better job there—stay tuned for the next release.
Big changes to libc and libpthread, various fixes and cleanups. Would you like to tell us more?
Kurt Miller: This release I focused on code cleanups in libpthread which spilled into libc and librthread a bit. Initially I worked on some basic cleanup of libpthread like dead code removal and data type corrections noted by lint. After that I worked on removal of libpthread specific file descriptor locking from libc. I also added non-static mutex support to libc to address thread-safety in the directory operations functions (opendir(), readdir(), etc) in a way that both thread libraries could support.
The end result is that libc is now thread library agnostic or in other words libc's locking needs can be supported by either libpthread or librthread. It also sets the stage for the removal of libpthread from the system when rthreads is finished.
What is the status of your rthread implementation? I saw these: (1) Fixes in the signal handling code when waking up. This fixes the majority of the rthreads lockings and hangups. (2) Provide hook in ld.so(1) so rthreads can spinlock to protect from races.
Ted Unangst: rthreads are still incomplete, but there's slow progress being made. Because rthreads provide real preemptive multithreading, code that previously worked without locking doesn't. ld.so tries to resolve symbols on the fly, but it needs to be careful about the condition when two threads are both resolving symbols at the same time.
"Kernel work queues, workq_add_task(9), workq_create(9), workq_destroy(9) provides a mechanism to defer tasks to a process context when it is impossible to run such a task in the current context." Translation?
Ted Unangst: Many times, a device driver will receive an interrupt from the hardware, and in response to the interrupt do some work. However, interrupt handlers aren't a good place to do work. The whole kernel is locked up, so to speak, if the work requires completing some blocking action. Previously, drivers would deal with this by creating a kernel thread. The interrupt handler adds a task to a queue and wakes up the thread. Later, the thread can take as long as necessary to complete the task. But this means every driver needs its own thread.
workq is a generic version of that code, so that each device driver can benefit from a more complete implementation.
"Removed unused strcpy and strcat calls from kernel." Why?
Artur Grabowski: Dead code. Not used. Removed. We actually shrunk the kernel by a lot this release by removing functions that nothing was using. I wrote some half magic script that pulled out all the symbols from the kernel object files then pulled out all the used symbols and we just went through the kernel with a big axe killing everything that wasn't used. It even found real bugs (functions that should have been used, but weren't because of some ifdef typos).
Move i386 to new timecounter code. Again?!
Artur Grabowski: Last release was amd64. We've been doing more architectures now. Still not all done, but the ones that could benefit the most from it have been done this release.
What is the story about the Y2K hack in date(1) that you just removed?
Todd C. Miller: As we were approaching the year 2000 a number of hacks were put in place to attempt to deal with ambiguous dates where the century was not specified. For instance, in 1997 a year specified by 02 could be either 1902 if we assume the current century of 2002 if we assume the following one. Now that we are well into the 21st century there is no need for such hacks. Hopefully, by the time the 22nd century approaches people will have gotten used to the idea of four digit years.
I saw various fixes for code handling i386 CPUs... for example: (1) Fixes in the vframe handling for i386 trap code. (2) Fix in the i386 pmap code for a possible AMD bug, which slightly speeds up TLB misses. (3) Fix for Intel errata AI91 in the i386 pmap handler code. (4) i386 TLB handling improved to avoid possible corruption on Core2Duo processors. Would you like to give us an idea of how is handling hardware (especially CPUs') errata and workarounds implementation?
Artur Grabowski: Lots of this is related to just hacking I've been doing in the pmap to make it faster on SMP. And then we started finding all those bugs in there. About the same time people got new machines that showed very strange crashes related to the pmap. We started chasing things and found that the bug we've been seeing, mainly on Core 2 machines, could in many cases be explained by CPU erratas. It wasn't fixed by that but we learned a lot of new things about how the mmu and caches were working in the hairy details and fixed problems with that. The bug is not fixed yet, it's just hidden very well and we still don't know what causes it, but at least we learned a lot and fixed a lot of related things.
Reworking the TLB shootdown code for i386 and amd64 gave you a good speed improvement. How was the situation and what type of changes did you make?
Artur Grabowski: The old framework we had for TLB shootdowns (to keep the MMUs on different CPUs in sync) was very complicated, it used locks, slow path IPI (inter processor interrupt) handling, it had weird race conditions that could cause an IPI handler to wait for the biglock (very bad), and it simply took a lot of work to do such a seemingly simple thing.
I replaced it by something that I jokingly say takes about as many instructions per shootdown that the old code had function calls (it's not excactly true, but it's closer to the truth than one can imagine). Instead of having a huge infrastructure for doing smart guesses about when we need to do TLB shootdowns and when we can avoid them, we now just shoot much more often, but each shootdown costs almost nothing compared to before. If I recall correctly this cut down the time to do a full package build by 20 percent (mostly because all the forks in configure scripts have become much, much cheaper).