Re: Filesystem lockup with CONFIG_PREEMPT_RT
From: Austin Schuh <hidden>
Date: 2014-05-21 21:59:34
Also in:
linux-xfs, lkml
On Wed, May 21, 2014 at 12:30 PM, John Blackwood [off-list ref] wrote:
quoted
Date: Wed, 21 May 2014 03:33:49 -0400 From: Richard Weinberger <redacted> To: Austin Schuh <redacted> CC: LKML <redacted>, xfs <redacted>, rt-users [off-list ref] Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RTquoted
CC'ing RT folks On Wed, May 21, 2014 at 8:23 AM, Austin Schuh [off-list ref] wrote:quoted
quoted
On Tue, May 13, 2014 at 7:29 PM, Austin Schuh [off-list ref] wrote:quoted
quoted
Hi, I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT patched kernel. I have currently only triggered it using dpkg. Dave Chinner on the XFS mailing list suggested that it was a rt-kernel workqueue issue as opposed to a XFS problem after looking at the kernel messages. The only modification to the kernel besides the RT patch is that I have applied tglx's "genirq: Sanitize spurious interrupt detection of threaded irqs" patch.I upgraded to 3.14.3-rt4, and the problem still persists. I turned on event tracing and tracked it down further. I'm able to lock it up by scping a new kernel debian package to /tmp/ on the machine. scp is locking the inode, and then scheduling xfs_bmapi_allocate_worker in the work queue. The work then never gets run. The kworkers then lock up waiting for the inode lock. Here are the relevant events from the trace. ffff8803e9f10288 (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0 (xfs_bmapi_allocate_worker) never does. The kernel then warns about blocked tasks 120 seconds later.Austin and Richard, I'm not 100% sure that the patch below will fix your problem, but we saw something that sounds pretty familiar to your issue involving the nvidia driver and the preempt-rt patch. The nvidia driver uses the completion support to create their own driver's notion of an internally used semaphore. Some tasks were failing to ever wakeup from wait_for_completion() calls due to a race in the underlying do_wait_for_common() routine.
Hi John,
Thanks for the suggestion and patch. The issue is that the work never
gets run, not that the work finishes but the waiter never gets woken.
I applied it anyways to see if it helps, but I still get the lockup.
Thanks,
Austin
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs