Thread (19 messages) 19 messages, 4 authors, 2021-09-30

Re: Host managed SMR drive issue

From: Johannes Thumshirn <hidden>
Date: 2021-09-30 09:55:59
Subsystem: scsi subsystem, the rest · Maintainers: "James E.J. Bottomley", "Martin K. Petersen", Linus Torvalds

On 28/09/2021 13:49, Sven Oehme wrote:
the host should have plenty of memory, it still hangs right now and
here is what free reports :

root@01:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:         257790       12557       30211       76367      215021      166105
Swap:         40959         452       40507
OK Naohiro has managed to reproduce your problem and while we where
dinning we found that a) the scheduler tags are exhausted, b) the SCSI
Zone Append emulation has (two) invalid entries in it's write pointer
offset cache and c) we have seen blocked instances of ata_id.

Maybe (just maybe) ata_id is doing an ioctl on the drive which goes 
down the route:
sd_open()
`-> sd_revalidate_disk()
    `-> sd_zbc_revalidate_disk()
        `-> sd_zbc_revalidate_zones()
            `-> blk_revalidate_disk_zones()
                `-> sd_zbc_revalidate_zones_cb()

and IO is ongoing or doing completions. Both are accessing
struct scsi_disk::zones_wp_offset, but sd_zbc_revalidate_zones_cb()
is doing so without holding the struct scsi_disk::zones_wp_offset_lock.

This will then corrupt the zones_wp_offset array.

Can you try if the following patch makes any difference for you?
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index ed06798983f8..e04f55dde70b 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -694,8 +694,11 @@ void sd_zbc_release_disk(struct scsi_disk *sdkp)
 static void sd_zbc_revalidate_zones_cb(struct gendisk *disk)
 {
        struct scsi_disk *sdkp = scsi_disk(disk);
+       unsigned long flags;
 
+       spin_lock_irqsave(&sdkp->zones_wp_offset_lock, flags);
        swap(sdkp->zones_wp_offset, sdkp->rev_wp_offset);
+       spin_unlock_irqrestore(&sdkp->zones_wp_offset_lock, flags);
 }
 
 int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help