Re: General protection fault with use_blk_mq=1.
From: Zephaniah E. Loss-Cutler-Hull <hidden>
Date: 2018-03-29 09:12:56
Also in:
linux-scsi, lkml
On 03/28/2018 10:13 PM, Paolo Valente wrote:
quoted
Il giorno 29 mar 2018, alle ore 05:22, Jens Axboe [off-list ref] ha scritto: On 3/28/18 9:13 PM, Zephaniah E. Loss-Cutler-Hull wrote:quoted
On 03/28/2018 06:02 PM, Jens Axboe wrote:quoted
On 3/28/18 5:03 PM, Zephaniah E. Loss-Cutler-Hull wrote:quoted
I am not subscribed to any of the lists on the To list here, please CC me on any replies. I am encountering a fairly consistent crash anywhere from 15 minutes to 12 hours after boot with scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=1> The crash looks like:quoted
quoted
Looking through the code, I'd guess that this is dying inside blkg_rwstat_add, which calls percpu_counter_add_batch, which is what RIP is pointing at.Leaving the whole thing here for Paolo - it's crashing off insertion of a request coming out of SG_IO. Don't think we've seen this BFQ failure case before. You can mitigate this by switching the scsi-mq devices to mq-deadline instead.I'm thinking that I should also be able to mitigate it by disabling CONFIG_DEBUG_BLK_CGROUP. That should remove that entire chunk of code. Of course, that won't help if this is actually a symptom of a bigger problem.Yes, it's not a given that it will fully mask the issue at hand. But turning off BFQ has a much higher chance of working for you. This time actually CC'ing Paolo.Hi Zephaniah, if you are actually interested in the benefits of BFQ (low latency, high responsiveness, fairness, ...) then it may be worth to try what you yourself suggest: disabling CONFIG_DEBUG_BLK_CGROUP. Also because this option activates the heavy computation of debug cgroup statistics, which probably you don't use.
I definitely am.
In addition, the outcome of your attempt without CONFIG_DEBUG_BLK_CGROUP would give us useful bisection information: - if no failure occurs, then the issue is likely to be confined in that debugging code (which, on the bright side, is likely to be of occasional interest, for only a handful of developers) - if the issue still shows up, then we may have new hints on this odd failure Finally, consider that this issue has been reported to disappear from 4.16 [1], and, as a plus, that the service quality of BFQ had a further boost exactly from 4.16.
I look forward to that either way then.
Looking forward to your feedback, in case you try BFQ without CONFIG_DEBUG_BLK_CGROUP,
I'm running that now, judging from the past if it survives until tomorrow evening then we're good, so I should hopefully know in the next day. Thank you, Zephaniah E. Loss-Cutler-Hull.
Paolo [1] https://www.spinics.net/lists/linux-block/msg21422.htmlquoted
-- Jens Axboe
Attachments
- signature.asc [application/pgp-signature] 819 bytes