oreilly.comSafari Books Online.Conferences.


Behind DragonFly BSD
Pages: 1, 2, 3

Would DragonFly use a branch for ports like OpenBSD does?

MD: Probably not. We have been in rigorous discussion over what kind of ports/packaging system we want to have. We have already agreed that the core ports/packaging system visible to end-users should be a binary system rather then a source build system. This isn't to say that sources would not be made available for customization purposes, just that most users just want to get a port/package installed as quickly as possible. While the BSD ports/packaging system does have a binary install capability it is insufficient for our needs.

The other thing we've decided on is to give the ports/packaging system the ability to isolate installations. One of the biggest problems one has maintaining a large multi-use system is when upgrading a package for one particular subsystem. Upgrades to the packages for one subsystem can interfere with packages already installed for another, and sysops cannot afford to have upgrades break unrelated subsystems. So, for example, we want there to be isolation between the packages associated with, say, the mail subsystem, and packages associated with, say, a workstation user. The two subsystems might install the same package or might install different versions of the same package... we want that to work.

JS: No. I want to reduce interactions between ports and base system, FreeBSD partly succeeded in that part. OpenBSD has a lot of stuff in base and needs such branches e.g. to support updates of the package management tools.

Had you considered using NetBSD pkgsrc for external software instead of maintaining your own ports collection?

MD: We are considering everything. Nothing is set in stone yet.

JS: Yes, we have. I like pkgsrc, because it would save us an enormous effort for setting up a few thousand apps. But pkgsrc also has the problem of support too much and being slowed down by that. If a FreeBSD ports is outdated, the pkgsrc port is even more likely to be.

Someone sent a fake mail on 1st April 2004 featuring a Theo de Raadt interested in LWKT multiprocessor technology. That was funny, but have you had any contact with the real de Raadt now that OpenBSD is working on SMP introduction?

MD: For some reason I usually respond to good April Fool's jokes before I realize that they are April Fool's jokes, so I actually responded to Theo on that one. We did have an interesting discussion but Theo is totally focused on security, and while security is important to us, it isn't our focus.

How does LWKT compare with FreeBSD 4.x and 5.x, NetBSD 2.0 and OpenBSD SMP technologies?

MD: Well, neither NetBSD nor OpenBSD are really designed for SMP. I don't say that in a bad way ... It's simply that with so many platforms to support it's difficult to do a proper ground-up SMP implementation in OpenBSD or NetBSD. FreeBSD was able to do a ground-up SMP implementation primarily because they only had support for two or three platforms (but of course we in the DragonFly project feel that FreeBSD went off in the wrong direction when they did that).

IMHO, I believe LWKT to be a far superior environment for SMP (and UP as well) over the traditional process model that the other BSDs use and over the mutex-centric model that FreeBSD-5 uses. That's one of the primary reasons why we forked the DragonFly project off. LWKT provides an extremely efficient and scaleable programming environment on both UP and SMP systems. FreeBSD-5 has pretty much abandoned UP performance.

JS: It is simpler to use and less error-prone. The token abstraction often simplifies code compared to the mutex model used by FreeBSD 5 and NetBSD. The thread handling itself is very neat too, because it doesn't try to be smart, but simple. No fancy preemption mechanism other than the well-known soft interrupt handling, no ping-pong of kernel threads between CPUs. This is important for performance and makes the system more deterministic.

AMD and Intel have announced some plans for dual core CPUs. Until they become available we can play with Intel HyperThreading technology. What type of interaction is there with LWKT?

MD: Well, we inherited the hyperthreading capable code from FreeBSD-4 when we forked so DragonFly does run hyperthreaded. I have never liked hyperthreading, though, because it results in a nonsymmetrical set of CPUs, which greatly complicates the scheduler's job. FreeBSD-5 has been trying to schedule for hyperthreaded CPUs for over a year now with very little success (i.e. threads still get scheduled inefficiently). DragonFly does not try to schedule specifically for the hyperthreading case but LWKT does a pretty good job and we are at least on par with FreeBSD. We are not going to focus on hyperthreading, though, because it involves an enormous amount of scheduler work for very little gain.

Hyperthreading was a good idea at the time, but it is clear that the future is going to be multi-core.

JS: At the moment, HTT is handled just like another real CPU. We can avoid some pitfalls, because we are not tempted to use spinlocks. But there isn't any preference in the scheduler yet and we know of some problems with NICs, so HTT support is in a bit of flux right now.

Would multithreaded network stacks like that of FreeBSD 5 be possible with LWKT?

MD: FreeBSD-5 multithreads its network stack using a reentrant mutex model. We are taking an entirely different approach. Instead of making the network stack reentrant we are partitioning the work on a connection by connection basis and distributing it across multiple threads. Each thread operates independently of the others and operates without needing mutexes or network-specific locking of any sort. Jeffrey Hsu has been doing most of this work and we feel that it is already far superior to the FreeBSD network stack model. Among other things, we pay very careful attention to the L1/L2 CPU caches and our model partitions the work between CPUs and focuses more on maintaining locality of reference for the PCBs and less on load-balancing. The result is far less duplication of data across CPU caches on an SMP system. We believe that this will result in superior performance in heavily loaded environments.

JS: Well, why would you want to do that? I'm following the changes in the FreeBSD 5 network stack to allow fine-grained locking and those are huge. You could do something similar with tokens, but that would be expensive and slower than the mutex system. The work Jeff put into our tree to partionate work across the CPUs is much more promising and I'm investigating the adoption of the model for other parts of the kernel.

JH: Sure. Multithreading our networking stack in the form of running netisr interrupts on multiple CPUs was the first thing we did. To this day, FreeBSD still does not have multiple TCP threads running on multiple CPUs. So we're actually ahead here.

Polling vs. Interrupts: With FreeBSD you can use polling to improve network performance on busy interfaces. What about DragonFly? Do you plan to introduce polling support in other areas?

MD: We inherited the polling code from FreeBSD, so we have it. Polling is not really needed for more modern NICs, though, because the cards have their own burst/batch/idle timers and can generate a far lower interrupt rate then the actual packet rate.

JS: We already inherited the polling support for the network interfaces, yes. I'm not sure whether it is really worth the effort. The high interrupt rate is reduced by smarter hardware. Most Gigabit-Ethernet devices support that and it greatly reduces the need for polling. Other parts of the system are similar and the overhead of polling has to be carefully evaluated. ATA polling sounds interesting though :)

JH: DragonFly has the same device polling that comes with FreeBSD.

Is DragonFly BSD TCP/IP stack hardened enough to avoid Paul Watson's RST attack?

MD: I believe that Jeff committed some work recently to narrow the possibility of that attack.

What about a bit more of randomness for numbers like dynamically allocated TCP ports and processes PID?

MD: I don't see much point in doing that, though Theo would probably disagree with me. There are only 65535 ports and it takes very little time try them all. We have a randompid sysctl option (inherited from FreeBSD), and I have no objection to adding a random-port option. But it isn't a priority.

JS: I don't believe randomizing PIDs gives us any advantage in security. Hiten committed a patch for the randomization of TCP ports.

Do you have any plan to add memory protection systems?

MD: We will be adding support for the NX (no-execute) page table bit, but I'd rather not try to use segmentation tricks to get non-executable stack support for earlier CPUs. I am on the fence, actually. I could probably be persuaded to hack the segmentation for earlier CPUs but I would prefer not to. The real problem is the inherent insecurity of C code. If it comes down to it I'd rather modify the compiler and libraries (even if it makes the result inefficient) than modify the operating system. We already have the propolice patches integrated into our GCC and it is turned on by default in gcc3.x (DragonFly installs both gcc2.95.x and gcc3.x in the base system). Our compiler currently defaults to gcc2.95.x and will remain defaulted to it for the first release, but by the second release we will have moved entirely over to gcc3.x. There are other tricks one can play, such as randomizing malloc() addresses and such. I am perfectly willing to implement library tricks to reduce the possibility of a hack.

JS: We already ship with a propolice-enabled compiler and base system. Separation of write and execute bits on a per-page base might be added later, but it is not a priority. For AMD64, it is easy to support, for IA32 not.

Is anyone working on virtualization extensions to have multiple independent systems running at the same time? And what about the support of VMware or free virtual machine like Xen?

MD: There are one or two people that have been playing with Xen and DragonFly.

JS: Supporting Xen is a very nice project on its own. As far as I know, nobody is currently working on it. This might change after the system API has stabilized a bit and the native support for AMD64 / x86-64 is ready. Supporting extensions like User-mode Linux is not a priority and I don't think we would add it, because it is far too special.

HP: The idea of porting DragonFly to run over Xen was Kip Macy's. He and I discussed the idea for a good four hours or so and had pretty much things figured out. He has done the hard work of porting and adjusting a lot of the drivers, but there are still a few more rough edges that need to be sorted. I have not been able to do much there apart from getting it running because I got tied up with more important things with regard to DragonFly.

DragonFly runs fine on VMware since Day One. Well, except for a few hiccups that we had, two of which were due to some VMware errata items, and one was an ATA timeout issue that also disappeared with the latest version (4.5.1 or something). I have successfully run DragonFly on VMware with Linux and Windows as the host Operating System.

As for VMware running on DragonFly, that's a totally different ball game.

There are various way to create a cluster. You can use a special software like SETI@home does, you can use some external software that interact with the kernel like openMosix, or you can modify the way the kernel manages resources, removing differences between local and remote. How would DragonFly clustering work?

MD: One of our major goals is to implement SSI (single-system image) clustering. Our goals go far beyond Mosix.

Pages: 1, 2, 3

Next Pagearrow

Sponsored by: