Thread (8 messages) 8 messages, 4 authors, 2018-03-30

Re: General protection fault with use_blk_mq=1.

From: Zephaniah E. Loss-Cutler-Hull <hidden>
Date: 2018-03-29 09:12:56
Also in: linux-scsi, lkml

On 03/28/2018 10:13 PM, Paolo Valente wrote:
quoted
Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe [off-list ref] ha scritto:

On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:
quoted
On 03/28/2018 06:02 PM, Jens Axboe wrote:
quoted
On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:
quoted
I am not subscribed to any of the lists on the To list here, please CC
me on any replies.

I am encountering a fairly consistent crash anywhere from 15 minutes to
12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> 
The crash looks like:
quoted
quoted
Looking through the code, I'd guess that this is dying inside
blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP
is pointing at.
Leaving the whole thing here for Paolo - it's crashing off insertion of
a request coming out of SG_IO. Don't think we've seen this BFQ failure
case before.

You can mitigate this by switching the scsi-mq devices to mq-deadline
instead.
I'm thinking that I should also be able to mitigate it by disabling
CONFIG_DEBUG_BLK_CGROUP.

That should remove that entire chunk of code.

Of course, that won't help if this is actually a symptom of a bigger
problem.
Yes, it's not a given that it will fully mask the issue at hand. But
turning off BFQ has a much higher chance of working for you.

This time actually CC'ing Paolo.
Hi Zephaniah,
if you are actually interested in the benefits of BFQ (low latency,
high responsiveness, fairness, ...) then it may be worth to try what
you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP.  Also because
this option activates the heavy computation of debug cgroup statistics,
which probably you don't use.
I definitely am.
In addition, the outcome of your attempt without
CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information:
- if no failure occurs, then the issue is likely to be confined in
that debugging code (which, on the bright side, is likely to be of
occasional interest, for only a handful of developers)
- if the issue still shows up, then we may have new hints on this odd
failure

Finally, consider that this issue has been reported to disappear from
4.16 [1], and, as a plus, that the service quality of BFQ had a
further boost exactly from 4.16.
I look forward to that either way then.
Looking forward to your feedback, in case you try BFQ without
CONFIG_DEBUG_BLK_CGROUP,
I'm running that now, judging from the past if it survives until
tomorrow evening then we're good, so I should hopefully know in the next
day.

Thank you,
Zephaniah E. Loss-Cutler-Hull.
Paolo

[1] https://www.spinics.net/lists/linux-block/msg21422.html
quoted
-- 
Jens Axboe
  

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help