TCP Tuning and Network Troubleshootingby Brian Tierney
The other day my friend Bob came to me with a question. He'd written a Java program to copy 100MB data files from his Windows XP computer at his office in Sunnyvale, California, to a Linux server at his company's East Coast office in Reston, Virginia. He knew both offices had 100Mbps Ethernet networks that connected over a 155Mbps Virtual Private Network (VPN). When he measured the speed of the transfers, he found out that his files were transferring at less than 4Mbps, and wondered if I had any idea why.
I wrote this article to explain why this is the case, and what Bob needs to do to achieve the maximum network throughput. This article is aimed mainly at software developers. All too often software developers blame the network for poor performance, when in fact the problem is untuned software. However, there are times when the network is the problem. This article also explains some network troubleshooting tools that can give software developers the evidence needed to make network engineers take them seriously.
How TCP Works
The most common network protocol used on the internet is the Transmission Control Protocol, or TCP. TCP uses a "congestion window" to determine how many packets it can send at one time. The larger the congestion window size, the higher the throughput. The TCP "slow start" and "congestion avoidance" algorithms determine the size of the congestion window. The maximum congestion window is related to the amount of buffer space that the kernel allocates for each socket. For each socket there is a default value for the buffer size, which programs can change by using a system library call just before opening the socket. For some operating systems there is also a kernel-enforced maximum buffer size. You can adjust the buffer size for both the sending and receiving ends of the socket.
To achieve maximum throughput, it is critical to use optimal TCP socket buffer sizes for the link you are using. If the buffers are too small, the TCP congestion window will never open up fully, so the sender will be throttled. If the buffers are too large, the sender can overrun the receiver, which will cause the receiver to drop packets and the TCP congestion window to shut down. This is more likely to happen if the sending host is faster than the receiving host. An overly large window on the sending side is not a big problem as long as you have excess memory.
Computing the TCP Buffer Size
Assuming there is no network congestion or packet loss, network throughput is directly related to TCP buffer size and the network latency. Network latency is the amount of time for a packet to traverse the network. To calculate maximum throughput:
Throughput = buffer size / latency
Typical network latency from Sunnyvale to Reston is about 40ms, and Windows XP has a default TCP buffer size of 17,520 bytes. Therefore, Bob's maximum possible throughput is:
17520 Bytes / .04 seconds = .44 MBytes/sec = 3.5 Mbits/second
The default TCP buffer size for Mac OS X is 64K, so with Mac OS X he would have done a bit better, but still nowhere near the 100Mbps that should be possible.
65936 Bytes / .04 seconds = 1.6 MBytes/sec = 13 Mbits/second
(Network people always use bits per second, but the rest of the computing world thinks in terms of bytes, not bits. This often leads to confusion.)
Most networking experts agree that the optimal TCP buffer size for a given network link is double the value for delay times bandwidth:
buffer size = 2 * delay * bandwidth
ping program will give you the round trip time (RTT) for the network link, which is twice the delay, so the formula simplifies to:
buffer size = RTT * bandwidth
For Bob's network,
ping returned a RTT of 80ms. This means that his TCP buffer size should be:
.08 seconds * 100 Mbps / 8 = 1 MByte
Bob knew the speed of his company's VPN, but often you will not know the capacity of the network path. Determining this can be difficult. These days, most wide area backbone links are at least 1Gbps (in the United States, Europe, and Japan anyway), so the bottleneck links are likely to be the local networks at each endpoint. In my experience, most office computers connect to 100Mbps Ethernet networks, so when in doubt, 100Mbps (12MBps) is a good value to use.
Tuning the buffer size will have no effect on networks that are 10Mbps or less; for example, with the hosts connected to a DSL link, cable modem, ISDN, or T1 line. There is a program called pathrate that does a good job of estimating network bandwidth. However, this program works on Linux only, and requires the ability to log in to both computers to start the program.
Setting the TCP Buffer Size
There are two TCP settings to consider: the default TCP buffer size and the maximum TCP buffer size. A user-level program can modify the default buffer size, but the maximum buffer size requires administrator privileges. Note that most of today's Unix-based OSes by default have a maximum TCP buffer size of only 256K. Windows does not have a maximum buffer size by default, but the administrator may set one. It is necessary to change both the send and receive TCP buffers. Changing only the send buffer will have no effect, as TCP negotiates the buffer size to be the smaller of the two. This means that it is not necessary to set both the send and receive buffer to the optimal value. A common technique is to set the buffer in the server quite large (for example, 1,024K) and then let the client determine and set the correct "optimal" value for that network path. To set the TCP buffer, use the setSendBufferSize and setReceiveBufferSize methods in Java, or the
setsockopt call in C. Here is an example of how to set TCP buffer sizes within your application using Java:
java.net.Socket skt; int sndsize; int sockbufsize; /* set send buffer */ skt.setSendBufferSize(sndsize); /* check to make sure you received what you asked for */ sockbufsize = skt.getSendBufferSize(); /* set receive buffer */ skt.setReceiveBufferSize(sndsize); /* check to make sure you received what you asked for */ sockbufsize = skt.getReceiveBufferSize();
It is always a good idea to call
getReceiveBufferSize) after setting the buffer size. This will ensure that the OS supports buffers of that size. The
setsockopt call will not return an error if you use a value larger than the maximum buffer size, but will just use the maximum size instead of the value you specify. Linux mysteriously doubles whatever value you pass in for the buffer size, so when you do a
getReceiveBufferSize you will see double what you asked for. Don't worry, as this is "normal" for Linux.
Here is the same example in C:
int skt, sndsize; err = setsockopt(skt, SOL_SOCKET, SO_SNDBUF, (char *)&sndsize, (int)sizeof(sndsize)); err = setsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sndsize, (int)sizeof(sndsize));
Here is the sample C code for checking the current buffer size:
int sockbufsize = 0; size = sizeof(int); err = getsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, &size);
Pages: 1, 2