Thread (30 messages) 30 messages, 8 authors, 2013-02-26

Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out

From: Eric Dumazet <hidden>
Date: 2013-02-19 05:17:16
Also in: linux-wireless

Possibly related (same subject, not in this thread)

On Mon, 2013-02-18 at 20:05 -0800, Marc MERLIN wrote:
On Mon, Jul 16, 2012 at 06:21:57PM +0200, Eric Dumazet wrote:
quoted
quoted
No, it's atually when I'm 'uploading' from my laptop to my server.
One interesting thing is that my server is running lvm2 with snapshots,
which makes writes slower than my laptop can push data over the network, so
it's definitely causing buffers to fill up.
I just did a download test and got 4.5MB/s sustained without problems.
Hmm, nfs apparently is able to push lot of data, try to reduce
rsize/wsize to sane values, like 32K instead of 512K ?

gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4
rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

You could trace svc_sock_setbufsize() and check how large is set
sk_sndbuf
My apologies, I totally dropped the ball on this.

So, the problem was still there in more recent kernels.

TL;DR:
- reducing nfs buffers removes the full hang
- iwlwifi has a problem where lack of pages causes the whoe machine to hang
- NFS copies out, even with buffers down to 32K is very wonky and cp does not
  return until over 2mn after the copy is actually finished.
  (I have a trace of what's hung in cp/nfs when this happens)


Details:

It's still pretty severe because whatever blocks doesn't just end up
blocking disk IO, but actually blocking interrupts altogether since my mouse
can't move for a minute or more until some buffer flushes.

The last trace I got during this (I can't do sysrq because I have a broken 
Lenovo T530 without a sysrq key, and typing doesn't really work when
interrupts aren't firing).

Not sure if it's useful. First chrome had an issue, and then iwlwifi

chrome: page allocation failure: order:1, mode:0x4020
Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
Call Trace:
 <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
 [<ffffffff810d8cfd>] __alloc_pages_nodemask+0x66a/0x702
 [<ffffffff8108a948>] ? arch_local_irq_save+0x15/0x1b
 [<ffffffff811064af>] alloc_pages_current+0xcd/0xee
 [<ffffffffa039b579>] iwl_rx_allocate+0x8c/0x271 [iwlwifi]
 [<ffffffffa039c24e>] iwl_irq_tasklet+0x7e5/0x91c [iwlwifi]
 [<ffffffff8104805e>] tasklet_action+0x80/0xd2
 [<ffffffff81047c99>] __do_softirq+0xdf/0x1c5
 [<ffffffff814c1ed6>] ? _raw_spin_lock+0x1b/0x1f
 [<ffffffff810a7f37>] ? handle_irq_event+0x4d/0x62
 [<ffffffff814c7f5c>] call_softirq+0x1c/0x30
 [<ffffffff8101104e>] do_softirq+0x41/0x7f
 [<ffffffff81047e52>] irq_exit+0x3f/0xa7
 [<ffffffff81010d40>] do_IRQ+0x88/0x9f
 [<ffffffff814c246d>] common_interrupt+0x6d/0x6d
 <EOI> Mem-Info:
You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)

It should hopefully use order-0 pages

Some drivers cant fallback to low order page allocations.

mlx4 is another example (it uses order-2 pages )
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help