Thread (3 messages) 3 messages, 3 authors, 2014-07-05

Re: Filesystem lockup with CONFIG_PREEMPT_RT

From: Austin Schuh <hidden>
Date: 2014-05-21 21:59:34
Also in: linux-xfs, lkml

On Wed, May 21, 2014 at 12:30 PM, John Blackwood
[off-list ref] wrote:
quoted
Date: Wed, 21 May 2014 03:33:49 -0400
From: Richard Weinberger <redacted>
To: Austin Schuh <redacted>
CC: LKML <redacted>, xfs <redacted>, rt-users
      [off-list ref]
Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT
quoted
CC'ing RT folks

On Wed, May 21, 2014 at 8:23 AM, Austin Schuh [off-list ref]
wrote:
quoted
quoted
On Tue, May 13, 2014 at 7:29 PM, Austin Schuh
[off-list ref] wrote:
quoted
quoted
Hi,

I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT
patched kernel.  I have currently only triggered it using dpkg.
Dave
Chinner on the XFS mailing list suggested that it was a rt-kernel
workqueue issue as opposed to a XFS problem after looking at the
kernel messages.

The only modification to the kernel besides the RT patch is that I
have applied tglx's "genirq: Sanitize spurious interrupt detection
of
threaded irqs" patch.
I upgraded to 3.14.3-rt4, and the problem still persists.

I turned on event tracing and tracked it down further.  I'm able to
lock it up by scping a new kernel debian package to /tmp/ on the
machine.  scp is locking the inode, and then scheduling
xfs_bmapi_allocate_worker in the work queue.  The work then never gets
run.  The kworkers then lock up waiting for the inode lock.

Here are the relevant events from the trace.  ffff8803e9f10288
(blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0
(xfs_bmapi_allocate_worker) never does.  The kernel then warns about
blocked tasks 120 seconds later.
Austin and Richard,

I'm not 100% sure that the patch below will fix your problem, but we
saw something that sounds pretty familiar to your issue involving the
nvidia driver and the preempt-rt patch.  The nvidia driver uses the
completion support to create their own driver's notion of an internally
used semaphore.

Some tasks were failing to ever wakeup from wait_for_completion() calls
due to a race in the underlying do_wait_for_common() routine.
Hi John,

Thanks for the suggestion and patch.  The issue is that the work never
gets run, not that the work finishes but the waiter never gets woken.
I applied it anyways to see if it helps, but I still get the lockup.

Thanks,
    Austin

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help