Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
From: Eric Dumazet <hidden>
Date: 2012-07-16 06:18:54
Also in:
linux-wireless
Subsystem:
networking drivers (wireless), the rest · Maintainers:
Johannes Berg, Linus Torvalds
On Sun, 2012-07-15 at 14:59 -0700, Marc MERLIN wrote:
On Tue, Apr 10, 2012 at 10:27:33PM -0700, Marc MERLIN wrote:quoted
On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:quoted
Please try following patch, as it solved the problem for me (no more order-1 allocations in tx path)I applied our patch to 3.3.1 and cannot reproduce the problem anymore. I'll leave a big wireless copy running overnight just in case, but I think you fixed it.Mmmh, so I'm running 3.4.4 and I had another full machine hang while copying big files (gigabytes) over wireless via NFS. The laptop self recovered after 5mn or so (mouse cursor would not even move) and I was able to kill -9 the process (midnight commander). mc did not actually stop for another 4mn or so (i.e. it took that long for the process to come out of kernel hung state), but the machine was usable during that time. Note that copying the same data with scp works fine. NFS mount looks like this: gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0 I didn't have anything like last time in the kernel logs, and more annoyingly, ps -elf does not show anything for any process in WCHAN, making pointing the finger a bit harder (procps-ng 3.3.3 does not show anything other than '-' in WCHAN for any process with 3.4.4). My understanding is that user space calling drivers that shut off all interrupts for extended periods of time (as least I think so since my mouse cursor would not move), is still a kernel bug. For what it's worth, copying 1GB of data in lots of small files does not cause problems, it seems that it's big files that cause a problem since they likely fill a buffer somewhere while interrupts are disabled. Do you have an idea of how I can find out where my mc process is stuck in the kernel? Should I reproduce with specific sysrq output?
Just to clarify, you get this freeze when transferring a big file from a remote NFS server to your PC, (aka a download), not the reverse way ? If so, you might hit OOM condition because iwlwifi uses big/fat RX buffers, I never understood why... (amsdu_size_8K = 1) Storing an MTU=1500 frams in 8KB of memory sounds really bad.
diff --git a/drivers/net/wireless/iwlwifi/iwl-drv.c b/drivers/net/wireless/iwlwifi/iwl-drv.c
index cc41cfa..434b924 100644
--- a/drivers/net/wireless/iwlwifi/iwl-drv.c
+++ b/drivers/net/wireless/iwlwifi/iwl-drv.c@@ -1006,7 +1006,7 @@ void iwl_drv_stop(struct iwl_drv *drv) /* shared module parameters */ struct iwl_mod_params iwlwifi_mod_params = { - .amsdu_size_8K = 1, + .amsdu_size_8K = 0, .restart_fw = 1, .plcp_check = true, .bt_coex_active = true,