Re: Host managed SMR drive issue
From: Johannes Thumshirn <hidden>
Date: 2021-09-30 09:55:59
Subsystem:
scsi subsystem, the rest · Maintainers:
"James E.J. Bottomley", "Martin K. Petersen", Linus Torvalds
On 28/09/2021 13:49, Sven Oehme wrote:
the host should have plenty of memory, it still hangs right now and
here is what free reports :
root@01:~$ free -m
total used free shared buff/cache available
Mem: 257790 12557 30211 76367 215021 166105
Swap: 40959 452 40507
OK Naohiro has managed to reproduce your problem and while we where
dinning we found that a) the scheduler tags are exhausted, b) the SCSI
Zone Append emulation has (two) invalid entries in it's write pointer
offset cache and c) we have seen blocked instances of ata_id.
Maybe (just maybe) ata_id is doing an ioctl on the drive which goes
down the route:
sd_open()
`-> sd_revalidate_disk()
`-> sd_zbc_revalidate_disk()
`-> sd_zbc_revalidate_zones()
`-> blk_revalidate_disk_zones()
`-> sd_zbc_revalidate_zones_cb()
and IO is ongoing or doing completions. Both are accessing
struct scsi_disk::zones_wp_offset, but sd_zbc_revalidate_zones_cb()
is doing so without holding the struct scsi_disk::zones_wp_offset_lock.
This will then corrupt the zones_wp_offset array.
Can you try if the following patch makes any difference for you?
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index ed06798983f8..e04f55dde70b 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c@@ -694,8 +694,11 @@ void sd_zbc_release_disk(struct scsi_disk *sdkp) static void sd_zbc_revalidate_zones_cb(struct gendisk *disk) { struct scsi_disk *sdkp = scsi_disk(disk); + unsigned long flags; + spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags); swap(sdkp->zones_wp_offset, sdkp->rev_wp_offset); + spin_unlock_irqrestore(&sdkp->zones_wp_offset_lock, flags); } int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)