Running Zebra on a Unix Machine: An Alternative to a Real Router?
Pages: 1, 2
As I became more familiar with Zebra, I grew more impressed by the software. Despite the fact that version 1.0 hasn't been released yet (only beta versions are available), I can't remember a single time when the software broke down. I can't say the same thing for the hardware: I had a hard disk crash on me. Fortunately, I had the foresight to install two hard disks and use disk mirroring with the
vinum volume manager. The fact that they have hard disks has always been my biggest problem with host-based routers. Even if they don't crash (which they all do eventually), having a hard disk inside of a router means problems when the power fails: before you know it, your router is doing a file system check.
"Real" routers aren't bothered by the power going away, since all of their software and configuration data is stored in flash or non-volatile memory. They don't even have a shutdown command, just a power switch. Just recently, I found out that it's fairly simple to have a Unix machine boot and run from flash memory. As it turns out, CompactFlash memory cards use an IDE interface. With the right converter, they can be attached to an IDE interface on the motherboard of a PC. The BIOS will then happily recognize the card as a hard disk and boot from it.
Another thing that always used to worry me about host-based routers is IP forwarding performance. But some tests I did with Gigabit Ethernet cards in FreeBSD boxes convinced me that a PC-based system can handle several hundred megabits worth of data coming in or going out. Unfortunately, I was unable to fully test the routing performance due to lack of enough machines to act as source and sink for the necessary amounts of traffic. However, such a test between two boxes doesn't translate to good real-world performance as a BGP router. Currently, a full BGP feed is about 110,000 routes.
Whether or not a system can achieve good forwarding performance with so many routes in its routing table is highly dependent on the route-lookup algorithm it uses for the majority of the forwarded packets. Cisco routers implement several ways to do this. In "process switching," a regular process reads a packet from the buffer where packets are stored as they come in, and then looks up the destination in the main routing table and schedules the packet for transmission on the right output interface. On a Cisco, this is slow. On a Unix machine, this would hardly work at all: the forwarding process would have to contend with other user processes for CPU time, and may even be swapped out to disk!
Fast Switching Methods
To increase forwarding performance, IOS implements "fast switching." This forwarding algorithm uses a route cache that stores the most recently used routes in a data structure that can be searched more efficiently than the main routing table. With fast switching, packets aren't stored in a buffer for further processing, but the forwarding algorithm is executed immediately as the CPU tends to the interrupt caused by the arrival of a packet. When the packet can't be fast switched because the destination can't be found in the route cache (or for another reason), it is handed over to regular process switching. As the packet is then process switched, a route cache entry is created so subsequent packets can be fast switched.
An even faster switching method is Cisco express forwarding (CEF). CEF also operates at the interrupt level, but unlike fast switching, it employs a dedicated process for building the CEF data structures in memory. This CEF table holds a copy of the entire routing table, so there is no need to process switch the first packet towards any given destination.
The fast switching route cache uses a radix tree structure to store next hop information (MAC address and output interface). Since an IP address has 32 bits, the radix tree has a depth of 32 levels, and looking up a route requires a maximum of 32 steps, assuming the right route is present. CEF, on the other hand, uses a 256-way trie structure. This makes it possible to search the tree in only four steps that each evaluate 8 bits in one go. And since the next hop information is no longer stored in the tree structure itself, there is additional flexibility. For instance, the CEF table can encode recursive routing information.
So how is this done under Unix? Not all that differently: the 1990 4.3BSD-Reno interim release introduced a radix tree as the data structure for the kernel routing table. Thus, Unix IP forwarding is not quite as advanced as Cisco's CEF, but it improves on fast switching. This is because the radix tree holds the full routing table, so there is no need to rebuild it during process switching. So a Cisco will be somewhat faster than a Unix system with a similar CPU, but since in practice, Unix systems have much faster CPUs than Cisco routers, it more than makes up for the difference.
A somewhat unfortunate similarity between Cisco and Unix is the size of these tables. On a Cisco, entries in the BGP table, the main routing table, and the CEF table all take roughly 100 to 300 bytes of memory per route. The FreeBSD kernel also uses nearly 300 bytes per route, as do the Zebra main routing table and the BGP table. (Note that on some systems, the kernel has a limit on the amount of memory that the routing table may use.)
All in all, I have to admit a Unix box running Zebra is a decent alternative to a "real" router. On the other hand, I still prefer the tight integration between hardware and software that router vendors offer, as long as their products aren't overpriced and underpowered. It's good to have choices.
Iljitsch van Beijnum has been working with BGP in ISP and end-user networks since 1996.
O'Reilly & Associates recently released (September 2002) BGP.