Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large
From: Tom Yan <hidden>
Date: 2016-08-11 21:17:25
Also in:
linux-ide, linux-scsi
The patch isn't about how the request from the block layer will be processed (to form the SCSI commands). What it addresses is blk_queue_max_write_same_sectors() and blk_queue_max_discard_sectors() that are called in the SCSI disk driver. You can see that they are called with an input of the Maximum Write Same Length times (logical_block_size >> 9), which is to convert the number of sectors to an appropriate value for the block layer (512-byte block based). On 4Kn drives, the multiplier will be 4096 >> 9, which is 8. So if the reported Maximum Write Same Length is 4194240, when the value is passed onto the block layer to set the limits for a 4Kn drive, it will be 4194240 * 8. (And when this value is represented in bytes, it will be further multiplied by 512 and over 32-bit.) `logical_block_size >> 9` is pretty much the same thing as ``logical_block_size / 512`. I should have probably used the bit shift way instead. On 11 August 2016 at 17:04, Shaun Tancheff [off-list ref] wrote:
On Thu, Aug 11, 2016 at 3:26 AM, [off-list ref] wrote:quoted
From: Tom Yan <redacted> Currently we advertise Maximum Write Same Length based on the maximum number of sectors that one-block TRIM payload can cover. The field are used to derived discard_max_bytes and write_same_max_bytes limits in the block layer, which currently can at max be 0xffffffff (32-bit). However, with a AF 4Kn drive, the derived limits would be 65535 * 64 * 4096 = 0x3fffc0000 (34-bit). Therefore, we now devide ATA_MAX_TRIM_RNUM with (logical sector size / 512), so that the derived limits will not overflow. The limits are now also consistent among drives with different logical sector sizes. (Although that may or may not be what we want ultimately when the SCSI / block layer allows larger representation in the future.) Although 4Kn ATA SSDs may not be a thing on the market yet, this patch is necessary for forthcoming SCT Write Same translation support, which could be available on traditional HDDs where 4Kn is already a thing. Also it should not change the current behavior on drives with 512-byte logical sectors. Note: this patch is not about AF 512e drives. Signed-off-by: Tom Yan <redacted>diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index be9c76c..dcadcaf 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c@@ -2295,6 +2295,7 @@ static unsigned int ata_scsiop_inq_89(struct ata_scsi_args *args, u8 *rbuf) static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) { u16 min_io_sectors; + u32 sector_size; rbuf[1] = 0xb0; rbuf[3] = 0x3c; /* required VPD size with unmap support */@@ -2309,17 +2310,27 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id); put_unaligned_be16(min_io_sectors, &rbuf[6]); - /* - * Optimal unmap granularity. - * - * The ATA spec doesn't even know about a granularity or alignment - * for the TRIM command. We can leave away most of the unmap related - * VPD page entries, but we have specifify a granularity to signal - * that we support some form of unmap - in thise case via WRITE SAME - * with the unmap bit set. - */ + sector_size = ata_id_logical_sector_size(args->id); if (ata_id_has_trim(args->id)) { - put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM, &rbuf[36]); + /* + * Maximum write same length. + * + * Avoid overflow in discard_max_bytes and write_same_max_bytes + * with AF 4Kn drives. Also make them consistent among drives + * with different logical sector sizes. + */ + put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM / + (sector_size / 512), &rbuf[36]);I think the existing fixups in sd_setup_discard_cmnd() and sd_setup_write_same_cmnd() are 'doing the right thing'. If I understand the stack correctly: libata-scsi.c (and sd.c) both report a maximum in terms of 512 byte sectors. The upper layer stack works (mostly) on a mix of bytes and 512 byte sectors agnostic of the underlying hardware ... mostly. There are some bits in the files systems and block layer that are honoring the logical block size being larger 512 bytes as all I/O being generated are multiples of the logical block size as per block device's request_queue / queue_limits. So regardless of a 4Kn device being able to handle an 8x larger I/O as per the logical sector being bigger that's basically ignored, for convenience. In the scsi upper layer as the command are being setup the shift from 512 to 'sector_size' is handled to the number of device sectors is matched up to the request: sector >>= ilog2(sdp->sector_size) - 9; nr_sectors >>= ilog2(sdp->sector_size) - 9; So if you correctly report number of logical sectors here you break the 'fix' in sd.c At least that is my understanding.quoted
+ + /* + * Optimal unmap granularity. + * + * The ATA spec doesn't even know about a granularity or alignment + * for the TRIM command. We can leave away most of the unmap related + * VPD page entries, but we have specifify a granularity to signal + * that we support some form of unmap - in thise case via WRITE SAME + * with the unmap bit set. + */ put_unaligned_be32(1, &rbuf[28]); } -- 2.9.2Regards, Shaun