ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


OpenBSD PF Developer Interview

by Federico Biancuzzi
04/15/2004

OpenBSD's PF packet filter has grown in power and appeal since its introduction in OpenBSD 3.0. With the imminent release of OpenBSD 3.5, Federico Biancuzzi interviewed several leading OpenBSD developers for their thoughts on PF and new features.

This is Part 1 of a two-part series. BSD PF Developers Interview, part two is also online.

Federico Biancuzzi: Can you give us a short introduction about yourself and your role as open source developer?

Daniel Hartmeier: I'm 29 years old and live near Zug in Switzerland. I dropped out of university after the first year and am working as programmer since. Unmarried, no kids, two cats.

It was around November 1999 when I installed OpenBSD (2.5 by then) for the first time. I was working at a small company and we were looking for an OS to use as an Internet gateway. I didn't have much prior experience with Unix, but after a while, that gateway handled most network services we needed. Since then, I run OpenBSD on all servers and desktops I use.

Henning Brauer: Well, I am 25, running an ISP here in Hamburg/DE, mostly doing infrastructure, server- and backend-stuff, and, of course, and unfortunately, some administrivials.

I've been an OpenBSD developer since sometime early in 2002, doing a lot of stuff in pf (biggest single thing was the altq merge I lead and did big parts of myself); Apache maintenance in our tree, where I wrote the chroot extensions as well; various stuff in the network area from NIC drivers to ip layer, and so on, as well as various userland stuff.

Mike Frantzen: I live outside Washington D.C. where I enjoy the biking and hiking trails. I'm an engineer at NFR Security working on its intrusion detection system, where I tell my boss that I browse the web all day long. I hack on OpenBSD in my copious lack of spare time.

Cedric Berger: I'm 34 years old and live in the French-speaking part of Switzerland, near the city of Neuchatel. Five years ago, I went to California with my wife in the hopes of improving my English skills, and after nine months there I started to work for Wireless Networks, Inc., a startup company that currently provides wireless B2B services for the banking industry.

There we developed a little wireless router, and we decided to use OpenBSD TCP/IP stack in it because it was, at that time, the only free OS that had decent IPSec support. So, besides doing hardware design, network/database design, and everything else you'd have to do in a startup, I extracted part of OpenBSD kernel and ported it to a real-time kernel on an ARM processor. This is how I got involved with the Unix world.

O'Reilly Open Source Convention.

Two years ago I came back to Switzerland, when my wife was expecting. I still work for Wireless Networks as a consultant, and when I'm not hacking or babysitting, I like hiking and skying, like the average Swiss person. :)

Ryan McBride: When I wear a suit, I'm an information security consultant; nowadays, I'm just an open source software addict trying to get my daily fix. I currently live in the woods near a small town north of Vancouver, Canada, so when I'm not working on OpenBSD things I'm likely outside chasing deer from my garden or chopping firewood.

Can Erkin Acar: I am 32 years old and I live in Ankara, Turkey with my wife Zeynep. I have (finally) completed my PhD work last June and got married in July. I have been the network administrator of the department during my graduate studies, and now I am working for a network security company.

Federico: How did you join OpenBSD?

CB: I've used PF since 3.0 in an environment where I need to filter thousands of IP addresses individually, and that configuration was not handled very efficiently with early version of PF. So, two years ago I started to think about the table extension and wrote a working prototype of the "table" feature in December 2002. I submitted it to the PF mailing list, most people liked it, and I got invited to join the OpenBSD developer team by Daniel and Theo.

HB: Well, I was using OpenBSD for quite some time at work, and at a certain point the time had come to get more involved. In my case that is strongly related to pf; I apparently was the first one who tried to use pf in a major setup after 3.0 was released. It didn't quite work out, so I had a longish debugging session with Daniel. Turned out I had triggered a bug in an error path in pf, but it took us weeks to find out. After posting my results to misc@ I got this mail from Theo...

MF: Dug Song tricked me! Bad Dug. I wrote a firewall called mongoose for fun during a break from university. A year later OpenBSD dropped IPF in a copyright dispute and Dug invited me up to the MIT hackathon. We rightly chose to go with Daniel's PF for a replacement firewall which was more complete than mongoose and mine had a bunch of portability cruft to let it work under both OpenBSD and Solaris (hey, I like sparc gear).

RM: I've been involved with OpenBSD as a user since 2.3 or so; As far as development... When PF was first added to the tree, I was not working full-time and had been playing around with IPv6. I decided that PF should support IPv6, and the in keeping with the "shut up and hack" motto of OpenBSD development, I sat down and wrote it.

CEA: While searching for an operating system that would replace our Netware servers at the department, I installed OpenBSD 2.7 and instantly felt at home. I was thrilled by pf and subsequent pace of development and started using it immediately. Felt the need and wrote pflogd, and it was imported in to the tree. Also wrote pftop, a state visualization tool, and contributed some minor fixes and improvements to pf. I was invited to the last hackathon by Henning and became a developer.

I am mostly working on userland tools these days, especially privilege separation. There is too much to explore/learn and hack and very little time.

DH: In short, there was a dispute over IPFilter's license (we assumed it had a BSD license, Darren Reed said he never granted the right to distribute modified versions, which is what OpenBSD wanted), and Theo removed IPFilter from OpenBSD after 2.9 was released. I had some spare time and wanted to give it a try. I think it came as a surprise to everyone involved that pf was working so soon and got imported into CVS. Looking back, I'm grateful for the license argument, as I doubt there would have been enough motivation to replace and, eventually, surpass the existing code otherwise.

Federico: What PF parts have you worked on from 3.0 to 3.4?

HB: All.

For a lot of stuff in pf, there is a leading developer, the one who had the idea, or who started working on what was obvious. The result however is in almost all cases a joint effort.

That said, the biggest subproject I did in pf was the altq merge — again, a joint effort, without kenjiro laying the grounds from altq's side, theo's visions and quite some help from daniel this would not have worked. At the last hackathon I did the tagging stuff, which is very powerful. Over the time I did a [lot] of work in the parser, bringing more flexibility to the language, and learned to love yacc while doing so. And much more, but I don't remember everything offhand ... it's been some time and a lot of changes.

Related Reading

BSD Hacks
100 Industrial Tip & Tools
By Dru Lavigne

MF: The TCP state engine (shamelessly based on Guido van Roij's). The scrubber/normalizer which fits right in with the IDS I work on professionally (err, all the web browsing I do at work). The passive fingerprinting. Making fun of Henning and how little beer he can drink.

RM: After the IPv6 work, I added support for multiple translation or routing target addresses, which allows PF to do various types of load balancing of connections; it makes possible things such as using a PF firewall with two different ISP's, or balancing incoming http traffic against multiple web servers. But besides adding new features, I've done a fair bit of work cleaning up and simplifying the internals. I always feel best about my commits which result in a net reduction in the pf codebase.

CEA: I have written the pflogd pre 3.0 and improved the original (binary) log format of pf logs. What started as a project to safely parse pflogs for generating ASCII logs resulted in security extensions to bpf (berkeley packet filter which is used for capturing packets from the net and reading the pf logs) and privilege separation of pflogd, and tcpdump.

Federico: The 3.4 release page shows five new features. How does each one work?

  1. Packet tagging (filtering on tags added by a bridge based on MAC address).

    HB: Well, packet tagging allows you — as the name says — to add a "tag" to a packet, and read it out later on. Basically, that's it. It is so simple, and that is what makes it so powerful. You can use tags to express trust relations between interfaces (tag int1 on interface int1, and allow all packets with this tag out on int2 unconditionally), you can use tags to split classification and policy (tag packets so you end up with several "groups", where each group is one tag, and do the pass/block decision based on tags only), and you can even tag from outside pf — that's what i did for the bridge filters, we can tag packets from there as well, based on mac address for example.

  2. Stateful TCP normalization (prevent uptime calculation and NATdetection)

    MF: Stateful TCP normalization is a set of techniques to remove or resolve ambiguities in network traffic. One of the techniques most important to the average user is TCP timestamp modulation. Most operating systems with high performance networking include a timestamp in every TCP packet.

    Since that timer starts ticking when the machine was booted, a server (or anyone in between) can look at a packet and know the machine's uptime. An attacker could look at a machine's responses to know it hasn't been rebooting since the last patch came out so it is probably still vulnerable. Alternately a stingy internet service provider that charges extra for home networks can look at all of the timestamps coming from a link and count the number of NATted machines by the number of unique timestamps. The PF firewall can scramble both uptime calculation and NAT detection by modulating the timestamps with a random number. There are a variety of other normalization techniques done and others still in development.

  3. Passive OS detection (filter or redirect connections based on source OS).

    MF: Passive operating system fingerprinting has been suprisingly controversial feature. The underlying premise is that various operating systems' TCP stacks have evolved and diverged in different ways over the years. The firewall can look at packets and determine which operating system they came from by looking at those differences. Not only can it differentiate between Linux and Windows, but it can tell between Linux 2.2 and 2.4; it can even determine if you're using Opera. The integration into the firewall allows the administrator to filter or redirect connections based on the operating system of the client. Your enterprise has phased out Windows 98? Redirect them to a web site telling people to upgrade to a newer version (or a free Unix *hint* *hint*). Hate SCO? Redirect all SCO Unixware/OpenServer connections to a web page rant. Find email worms annoying? Block mail that came directly from Windows machines instead of going through a UNIX mail server. Getting slashdotted? Give Unix web browsers a larger share of the bandwidth than non-Unix users. Passive OS fingerprinting in PF has been my gift to BoFHs everywhere ;-)

  4. SYN proxy (protect servers against SYN flood attacks).

    DH: When a TCP connection is established, the so-called TCP handshake takes place. The source of the connection sends a SYN packet, the destination replies with a SYN+ACK, and the source replies with an ACK. After that, the TCP connection is established, and both peers can send data.

    A malicious party can cause a denial of service (DoS) attack by sending large amounts of TCP SYN packets to a server, while picking random fake source addresses (spoofing). The server will send SYN+ACK replies to the random source addresses and wait for the handshake to get completed by an ACK. But the recipients of the SYN+ACKs have never sent the SYN which initiated the handshake, so they will either reply with an RST or not reply at all. In both cases, resources are allocated on the server. Though this is only temporary, the server can suffer to varying degrees of the resource starvation caused by a significant flood of SYNs. Not all operating systems deal with this optimally, some may even become completely unresponsive.

    pf's synproxy sits in between the vulnerable server and potential attackers (usually in form of a border firewall). Instead of forwarding TCP handshake packets as they are seen (when they are valid and allowed), the synproxy intercepts the SYN packet and first completes the TCP handshake itself with the source peer. Afterwards, it replays the SYN with the destination peer, completing the handshake itself again. Once the handshake is completed with both peers, further packets are forwarded as usual.

    This has little impact on legitimate connections, but in case of the attack described above, the first handshake will never complete (as the true sender of the spoofed packet will not see the SYN+ACK, and the recipient of the SYN+ACK does not expect a connection). Before the first handshake is completed, no packet reaches the protected server. The attack can no longer allocate resources on the server. Instead, the resources are allocated on the firewall, which is built with this case in mind and can withstand such attacks more gracefully. In short, synproxy takes the burden of a SYN flood attack off the real server.

    The way synproxy is implemented in pf causes no additional resource allocation, when filtering statefully. A state entry (created on the initial SYN packet) already contains sequence numbers modulators (used to randomize initial sequence numbers, another feature of pf), which is all the synproxy needs. No additional connection tracking or SYN cookies are needed. And all existing features dealing with state table size (adaptive timeouts, limiting state entries created from specific rules, etc.) can be used to address attempts to starve resources on the firewall.

  5. Adaptive state timeouts (prevent state table overflows under attack)

    HB: That is easy — we can scale the state timeouts down based on the number of current states. It helps fighting state table exhaustion.

    When your state table is completely full (i.e., you hit the states limit), no new connections are possible. Thus, you really want to prevent this. Now, when you don't have many states in the table, you might want to run with the "normal" timeouts (set optimization normal). The closer you get to the state table limit, the more agressive you want to be about timing out old states to prevent the said state stable exhaustion. This is exactly what adaptive state timeouts are for. One you hit the adaptive.start number of states, pf starts to scale down timeouts — the closer you get to adaptive.end, the more.

Federico: Mike, how did you get the idea to include p0f features in PF?

MF: One of my coworkers, Greg Taleck, added p0f features to NFR's IDS to resolve traffic abiguities. And then a damn SMTP worm hit. That annoyed me and I wanted to filter all Windows boxes from connecting to my mail server for the duration of the worm. So I talked to Michal Zalewski who wrote p0f v1 and he was cool with integrating p0f into PF but the guy who had been maintaining p0f never responded to relicense the fingerprints. Michal then started p0f v2 which was not encumbered with the maintainer's copyright; I got it working in PF; and then blocked all Windows boxes from connecting to my mail server for the duration of the worm. Hurray! Never underestimate the annoyed developer.

Federico: What are you working on for 3.5?

MF: Working on brewing my own beer. Made a pretty good boston style ale, needed a little more hopping though. A golden ale is next. Theo has been calling me a nasty hobbittsesss lately too, so I've been working on growing hair on my feet. Hopefully, I'll get back to TCP scrubbing and normalization in time for 3.5.

HB: well, the focus this time is obviously bgpd.

bgp, the Border Gateway Protocol, is what ISPs speak to each other to announce reachability of their networks through certain paths. A bgp daemon announces its own networks to its neighbors, and its neighbors announce their networks and all networks including the paths to reach them they learned from their respective neighbors. In the usual so-called full-mesh setup that results in bgpd having a table of about 130 thousand networks (prefixes), and multiple paths to reach each. Of those it picks the "best" path (the algorithm for that decision is actually rather easy), and enters the resulting route into the kernel routing table.

Now, that is a bit more complicated than described here, and it is quite obvious that keeping these huge tables and working on them with reasonable performance is not that easy.

There are a few more or less free bgp implementations, but they all have major design flaws, and the resulting runtime problems. As I've been bitten by those I was considering doing a bgpd for some time, but was a bit scared by the projects size. When I was in Calgary in September I finally talked to theo about it who tricked me into starting coding. Back in Germany I finally did mid-November, and much to my surprise I had a fully working bgp session engine, fully implementing the Finite State Machine described in RFC 1771 as core, withing 9 days, and had sessions established and hold up to other bgp speakers. We found a few bugs later, but it is basically still what I had then. I talked to a few people and showed code, and fortunately, Claudio Jeker joined. He did an incredible amount of work implementing what we call the RDE, Route Decision Engine, that holds the tables of prefixes and paths. At the same time I started working on the code to interface the kernel routing table, which includes holding an internal view of it.

Well, nowadays we are feature complete for the basics.

We have no showstopper bugs we are aware of, heck, I am not aware of any bug right now (tho', let me assure you, there are a few). We learn routes, sync the one picked as best into the kernel routing table, can send them to our neighbors, and can announce our own networks. We have a control utility, bgpctl, too, which can be used to gather and show run-time data, take single sessions up/down, reload configuration, etc. And we have something that I have not seen anywhere before: we can couple and decouple the internal view from the kernel routing table.

So you can start up decoupled, adjust your settings while evaluating the internal view of the routing table, and then, after you are satisfied, you can issue a bgpctl fib couple and the routes enter the kernel. In the same vein a bgpctl fib decouple removes them again, leaving the kernel routing table as it was before coupling. Oh, and opposed to the other implementations, bgpd notices when you statically enter routes to the kernel routing tables and doesn't mess with them. It even tracks interfaces showing up and being removed at runtime like it is possible with PCMCIA and USB-based ones, and cloneable devices like tun and vlan. For most Ethernet devices it can even notice when you pull the cable (or the link gets lost for other resons) and react accordingly.

bgpd is 11500 lines of code as of tonight, of which about 500 are manpages. And it is very fast...

CB: I'm working on many little enhancements in the way PF deals with interfaces.

That includes better support for dynamic/cloneable interfaces, the ability to lock states to a single or group of interfaces, better handling of interface aliases and other related things. I believe there was 12 little points to the commit message. :)

RM: I've been mainly working on the components necessary to deploy OpenBSD in high availability and load balancing configurations, including the Common Address Redundancy Protocol (CARP), which handles IP address failover, and pfsync enhancements which synchronise state between two or mosynchronizes. I also added source ip tracking, which keeps track of states by source IP address, but this work was actually done before 3.4, at the hackathon in Calgary.

CEA: As you may have noticed, I have moved away from pf to privilege separation and bpf. Already worked on privsep for named in 3.5, and now there is at least the DHCP tools waiting for privilege separation. Henning is already working on dhclient. If I can find some time, I want to design some kind of framework for developing userland proxies.

Federico: The OpenBSD 3.5 release page lists six new PF improvements. Could you each explain your own work?

  1. Atomic commits of ruleset changes (reduce the chance of ending up in an inconsistent state).

    CB: This change ensures that when you type pfctl -f pf.conf, then the entire content of pf.conf will be loaded into PF kernel memory, or nothing at all if there are errors. Before that change, it was possible in rare circumstances that only half of the pf.conf ruleset would be loaded inside the kernel.

    So for example, you could have the new RDR rules loaded, but not FILTER rules.

    Or, if your main pf.conf contains load anchor entries, and some of the anchor files had a syntax error, then only part of the anchors would be loaded.

    This change does not bring any new functionality to PF, but it makes pfctl -f more reliable in case of errors (syntax errors, pfctl gets kill(1)ed, not enough memory is available, ...).

  2. A 30% reduction in the size of state table entries.

    RM: Basically I found a little trick of storing the tree indexes inside the state structure, rather than having separate tree nodes that point to the state structure. It's actually a pretty obvious thing in retrospect, but nobody had really considered it. For the end user, all this means is that they can have more states in the same ammount of memory.

  3. Source-tracking (limit the number of clients and states per client).

    RM: Source IP tracking allows you to create an entry for the source of connections and link states to it. This is useful for a number of reasons: first, it allows you to use a round-robin address allocation mechanism for translation or redirection, but ensure that the connections for a particular client are always mapped the same way. This functionality is important for some applications or protocols which rely on source address for identification, or in the case of server balancing, where the application keeps state across multiple connections, so the client must always connect to the same server.

    Second, it allows you to set limits on how many distinct sources can connect to a service, and how many simultaneous connections each source can have. This can be used to connection limit internal clients, or mitigate certain kinds of denial-of-service attacks.

  4. Sticky-address (the flexibility of round-robin with the benefits of source-hash).

    RM: When sticky-address is enabled, we create source-tracking entries for each source ip address, and states are associated with it. In this entry, we store the translation address that was selected by round robin, and the subsequent connection from this source, which hits the nat or rdr rule, will get this translation address rather than the next round-robinaddress. The source-tracking entries last at least as long as there are states associated with it, plus an additional configurable lifetime.

    So if you're redirecting traffic to a pool of web servers, and the first time a client connects, they get redirected to server 4, all connections afterward from that client will hit server 4, so long as the source-tracking entry exists.

    This is very similar in behaviour to source-hash, except it removes the restriction that the pool must be specified as a CIDR netblock; it can be a list of addresses, including network blocks, or more powerfully, it can be a table.

  5. Invert the socket match order when redirecting to localhost (prevents the potential security problem of mis-identifying remote connections as local).

    DH: It is common practice to redirect incoming TCP connections to local daemons using pf, for instance to force HTTP connections through a proxy, or to redirect spam to a tarpit.

    Often, the daemon was bound to 127.0.0.1 and the redirection used 127.0.0.1 as replacement destination. While using the loopback address is convenient in such cases (it's always present), that can have security implications.

    Many daemons assume that the loopback interface is isolated from the real network, i.e., that connections to sockets bound to 127.0.0.1 are local, and may grant some privileges based on this assumption.

    pf redirecting foreign connections to the loopback address is violating that assumption, now suddenly foreign peers might be able to connect to daemons listening on loopback sockets.

    To deal with this potential risk, the network code has been changed so that foreign connections to loopback addresses are first matched against listeners on unbound sockets (listening on any address). Only if no such socket is found, the connection is matched against a specific loopback listener.

    So, if you're running a daemon listening on both 127.0.0.1 and ANY, and use pf to redirect external connections to 127.0.0.1, these connections will now connect to the ANY socket, instead of the 127.0.0.1 one, where the daemon might wrongly assume a local connection.

    This problem only occurs with daemons that follow this pattern (listen on 127.0.0.1 in addition to other addresses, treat 127.0.0.1 as privileged local connections), many daemons are don't.

  6. Significant improvements to interface handling.

    CB: Let's look at the commit message, since it describes things pretty clearly:

    1) PF should do the right thing when unplugging/replugging or cloning/destroying NICs.

    2) Rules can be loaded in the kernel for not-yet-existing devices (USB, PCMCIA, Cardbus). For example, it is valid to write: "pass in on kue0" before kue USB is plugged in.

    3) It is possible to write rules that apply to group of interfaces (drivers), like "pass in on ppp all".

    4) There is a new ":peer" modifier that completes the ":broadcast" and ":network" modifiers.

    5) There is a new ":0" modifier that will filter out interface aliases. Can also be applied to DNS names to restore original PF behaviour.

    pass in from www.openbsd.org:0 will only select the first IP returned by resolver, while pass in from www.openbsd.org will select all IPs. Similarily, pass in from fxp0:0 or pass in from (fxp0:0) will not take into account address aliases on fxp0.

    6) The dynamic interface syntax (foo) has been vastly improved, and now support multiple addresses, v4 and v6 addresses, and all userland modifiers, like "pass in from (fxp0:network)".

    Specifying pass [...] from (ifspec) is now equivalent in all cases to pass [...] from ifspec, except that the ifspec -> IP address resolution is done in the kernel, i.e., will adapt automatically to interface address changes (dhcp, hot plug removal, whatever).

    7) Scrub rules now support the !if syntax.

    scrub in on !fxp0 now works.

    8) States can be bound to the specific interface that created them or to a group of interfaces for example:

    pass all keep state (if-bound)
    pass all keep state (group-bound)
    pass all keep state (floating)

    9) The default value when only keep state is given can be selected by using the "set state-policy" statement.

    if you put set state-policy if-bound then all rules declared with keep state like pass out on fxp0 keep state will be if-bound.

    10) "pfctl -ss" will now print the interface scope of the state.

    Another piece I wrote on the pf@ mailing list gives a few more details about state binding.

Federico: The 3.5 presentation page says "Interface 'cloning', accessed by ifconfig(8) commands create and destroy. For example, `ifconfig vlan100 create'." How does it work?

HB: That's a very cool addition. Let's take vlan, for example.

Previously, you had a fixed number of vlan interfaces in your kernel config. If you needed more, you needed a new kernel and a reboot. Now, you don't have any vlan interface by default — but the kernel has a "template". You create the interfaces as needed on the fly. So, when you configure you first vlan, you could do something along

# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlandev fxp0 192.168.0.1 up

Of course, you can collapse those into one, but it is even nicer: ifconfig creates the interface for you when you configure it, without an explicit create:

# ifconfig vlan0 vlan 100 vlandev fxp0 192.168.0.1 up

is sufficient. When you don't need the interface any more, you just destroy it, and it is gone:

# ifconfig vlan0 destroy

Federico: The 3.5 presentation page says "authpf(8) now tags traffic in pflog(4) so that users may be associated with traffic through a NAT setup." How does it work?

DH: This is best explained with the example in the authpf(8) man page. You can use the following in authpf.rules (the ruleset which is loaded for each user who authenticates)

nat on $ext_if from $user_ip to any tag $user_ip -> $ext_addr
pass in quick on $int_if from $user_ip to any
pass out log quick on $ext_if tagged $user_ip keep state

Nothing special about the usage of tag/tagged here, except that we use a macro that gets expanded to the user's IP address, for instance NATed connections from 10.1.2.3 get tag 10.1.2.3.

The point of adding a unique per-user tag on the internal interface is so that we can pass connections on the external interface, after translation, with a unique rule as well. Without tags, connections from different source addresses would all pass by the same rule on the external interface.

The reason for this construct is that tcpdump on pflog0 shows anchor and ruleset name of the rule that created the matched state, and the ruleset name conveniently contains the user name and pid of the authpf process authenticating the user, for example

# tcpdump -n -e -ttt -i pflog0
Oct 31 19:42:30.296553 rule 0.bbeck(20267).1/0(match): pass out on fxp1: }
129.128.11.10.60539 > 198.137.240.92.22: S 2131494121:2131494121(0) win }
16384 <mss 1460,nop,nop,sackOK> (DF)

The bbeck part is the name of the user that created the connection. This information can be used for logging, accounting or debugging.

Federico: Finally, OpenBSD introduced new tools for filtering gateway failover. Quoting from the 3.5 presentation page:

1) CARP (the Common Address Redundancy Protocol) carp(4) allows multiple machines to share responsibility for a given IP address or addresses. If the owner of the address fails, another member of the group will take over for it.

Ryan, could you explain the new Common Address Redundancy Protocol (CARP)?

RM: The Common Address Redundancy Protocol allows multiple hosts to transfer an IP address amongst each other, ensuring that this address is always available. CARP is much like VRRP, although it improves on it in many ways: it supports IPv6 addresses, provides strong authentication via a SHA1 HMAC, and supports a limited degree of load balancing via an "arp balancing" feature.

CARP is the direct result of our frustration with the current IETF standards process: Cisco maintains that they hold a patent which covers VRRP and none of the right people at the IETF are willing to stand up and tell them their patent is irrelevant. It's a specific case of the general problem of vendors involving themselves in the standards process, then producing patents after the standard is finalised. The same sort of thing is happening with the various IPSec standards. We'd like very much for the IETF to put an end to this, and use a non-RAND intellectual property policy, much as the w3c has done. An open standard is not really an open standard if you have to enter into licensing agreements to use it.

Author's note: The OpenBSD web site has some interesting commentary on Cisco Patents.

2) Additions to the pfsync(4) interface allow it to synchronise state table entries between or more firewalls which are operating in parallel, allowing stateful connections to cross any of the firewalls regardless of where the state was initially created.

Federico: Ryan, how would state table synchronization work?

RM: The pfsync protocol works by sending out state creations, updates, and deletions via multicast on a specified interface. Other firewalls listen for such messages, and import the changes into their state table. There is some additional complexity, of course: we implement some methods for minimizing pfsync traffic, and minimizing the mechanism for recovering from missed messages.

The net benefit of all this is that you can have two firewalls running in parallel and have one firewall backup for the other. In many situations this will be combined with CARP.

I've written an article that gives an overview on why pfsync and CARP are necessary, how they work, examples of how they can be used, and a sample configuration.

If you're considering building yourself a redundant firewall cluster, you'll probably want to read this.

Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.


Return to the BSD DevCenter.

Copyright © 2009 O'Reilly Media, Inc.