After spending multiple hours trying to diagnose a performance issue with NFS in my lab, I've suddenly stumbled upon a situation in which I can honestly say, I'm completely bewildered.
When copying a 256MB test file to the NFS mount the transfer takes ~70 seconds, which equates to roughly 3.6MB/s, i.e. very slow. The client and sever hardware is modern and equipped with Gigabit NICs and an accompanying Gigabit switch. With this hardware I'd expect average transfer speeds of 30-40MB/s, with the hard drives being the bottleneck.
So here's where it gets strange...
While taking network captures I discovered that if tcpdump is running on the server the file transfer completes in roughly 7 seconds! If I kill tcpdump and rerun the file transfer the elapsed time goes back up to 65s. At 7 seconds per transfer that calculates out to about 36.5MB/s, which I'd be completely happy with if it didn't require a network capture to be running at all times.
So let's step back in time a little and detail how I'm testing.
Test Details
First, create a 256MB file on the client to use during transfer tests:
dd if=/dev/zero of=test.dat bs=32k count=8192
Next, mount the NFS share:
mount -t nfs -o nolock 192.168.50.1:/share /mnt
Then run a quick transfer test:
$ time cp test.dat /mnt/1.dat
real 1m 10.06s
user 0m 0.04s
sys 0m 0.065s
Now, over on the server start a
tcpdumpcapture:
sudo tcpdump -ntttt -i eth0
And back on the client run another transfer test:
$ time cp test.dat /mnt/2.dat
real 0m 7.62s
user 0m 0.04s
sys 0m 0.64s
There's roughly a 63 second difference between the two test runs!
I would love to hear ideas or additional test scenarios from anyone who has an inkling of what the cause might be behind the significant difference in transfer times.
Additional Testing
Solution
I have to thank my co-workers for the tips they provided, which guided me to look into the NIC hardware drivers as the possible culprit. Sure enough, that was the exact issue.
The current version of Debian has r8169 driver version "2.2LK-NAPI", this is the one I'm having the performance issues with. The "testing" release of Debian (Squeeze) has version 2.3LK-NAPI, which I tried, and at first appeared to have succsess, but it was short lived as the performance issue reappeared upon further testing.
I then compiled the driver available directly from RealTek, which at the time was version 6.013.00. It complied and loaded fine but I was unable to set its speed at anything other than 100Mb. If I manually set it to 1GbE it would enter an odd renegotiation loop, switching speeds and duplexes until manually unloaded.
So the final solution (after immense frustration) was to physically remove the network card and replace it with an Intel PWLA8391GT PRO/1000GT. It was recognized by the OS without an issue, the e1000 driver loaded properly, and every test run produced transfer times back in the 7 second range.
The Intel NIC was $10 more than the TRENDnet card I had started with, but the minimal expense is definitely worth it in my opinion. I will be avoiding the r8169 chipset for the foreseeable future.
Testing a "Hubbed" environment
In an attempt to capture some network traffic without involving the client or server I replaced the switch with a 10Mb hub to act as a "poor-man SPAN port".
Interestingly, there is no performance difference during transfer tests in the hubbed environment. With or without
tcpdumprunning the 256MB copy completed in 4m14s, about 1MB/s, which is to be expected across a 10Mb network.
TFTP Transfer Tests
For a comparison UDP transfer I created a 31MB file on the client and used TFTP to transfer it to the server. Two test transfers were conducted, the first without
tcpdumprunning on the server and the second with a capture running.
dd if=/dev/zero of=small.tst bs=31M count=1
time tftp -pl small.tst 10.20.30.1
real 0m 6.13s
user 0m 0.10s
sys 0m 0.71s
Then the second test was run after starting
tcpdumpon the server:
time tftp -pl small.tst 10.20.30.1
real 0m 7.74s
user 0m 0.12s
sys 0m 0.65s
The second transfer was marginally slower with
tcpdumprunning on the server, as you'd expect.
Promiscuous Mode Tests
I wondered if the NIC being placed in promiscuous mode might have anything to do with the performance differences so I ran a couple other tests.
The first test was to manually configure the servers NIC in promiscuous mode and run another file transfer.
sudo ifconfig eth0 promisc
Unfortunately that had no effect on performance:
$ time cp test.dat /mnt/3.dat
real 1m 14.57s
user 0m 0.03s
sys 0m 0.64s
The second test was to run
tcpdumpon the server, but in "no promiscuous" mode (with the -p flag) and transfer the file again.
sudo ifconfig eth0 -promisc
sudo tcpdump -pntttt -i eth0
Looks like promiscuous mode is ruled out as part of the equation, but
tcpdumpstill holds the secret to the performance issues:
$ time cp test.dat /mnt/4.dat
real 0m 7.57s
user 0m 0.04s
sys 0m 0.66s