Thread (104 messages) 104 messages, 12 authors, 2008-03-20

Re: [PATCH] block: fix residual byte count handling

From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: 2008-03-01 15:20:15
Also in: linux-ide, lkml

On Sat, 2008-03-01 at 15:17 +0900, Tejun Heo wrote:
Hello, Jens, James.

Jens Axboe wrote:
quoted
quoted
With the original patch, I have to run through the whole of libsas and
scsi_transport_sas doing

s/data_len/raw_data_len/

With your update it looks like I have to run through them all doing

s/data_len/data_len - extra_len/
blk_rq_raw_data_len() should do.
I know we *could* sweep through all the block drivers altering them; my
point is that I don't think we *should*.  Fundamentally, every driver
that cares is assuming req->data_len is the length of the request that
came down.  The fact that it got padded is irrelevant (and actually
detrimental) to most of them as the SMP driver illustrates.

We use a high dma_alignment not because we care about padding, but
because we want to avoid scatter gather.  So we care about alignment of
the start of the buffer (to avoid sg), but fundamentally, we need to
know what its true length (not its padded length) is.  The true length
feeds into the smp frame size and is checked by the interfaces, which is
why the changes caused an SMP failure.

Just for the principle of least surprise, can we not keep req->data_len
what it has always been, namely the true data length of the request and
express the fact that we've padded it by req->extra_len or something, so
we don't have to do all of these driver changes.
quoted
quoted
which is even worse.  Can't we put things back to a point where data_len
means exactly that and extra_len means how much we have spare on the
end, so you know you can DMA up to data_len + extra_len if need be?

That way we don't have to sweep through every block driver altering the
way it uses data_len.
If SMP is broken because it needs start address alignment but not
padding to align the size, what should be done is to make that exact
requirement visible to the block layer.  Say,
blk_queue_dma_start_alignment() or maybe change
blk_queue_dma_alignment() such that it only indicates start address
alignment and add blk_queue_dma_size_alignment() for drivers which
require size to be aligned too.  I think those are few.
But this is true of *every* current user of the block layer apart from
IDE ... we all care about alignment not padding.  Any current user that
actually cares about padding will be doing their own adjustments, so
they need changing anyway.

We can frame that with a different API, but blk_queue_dma_alignment()
better be the common case (start but not pad alignment).
I think the decision which value rq->data_len represents comes down to
which size is used more in low level drivers because no matter which way
we choose we'll have to update some of the drivers which expects the
other thing from rq->data_len.
Right, and currently, apart from IDE, they all want it to mean the true
data length.
blk_rq_raw_data_len() is needed iff a driver needs dummy buffers
attached at the end and still needs to know the original request size
which isn't the common case.
I think it *is* the common case.
quoted
Fully agree. The reason why I think it's so ugly is that you have to
keep these two seperate variables in sync. The burning was just one bug,
there will be others...
The posted modification isn't too bad as the maintenance of the two
variables is at places where the nasty things happen.  I think what
rq->data_len should represent when seen from LLDs is more important and
please note that if SMP is broken because it simply doesn't require
512byte size alignment, it's a different issue.
But we still have to find all the bugs this causes in all the block
drivers ... that's my biggest concern right now.
As long as both raw_data_len and data_len are accessible, I'm okay
either way.  My biggest reluctance is against breaking sum(sg) ==
rq->data_len.  I think this can lead to much more subtle problems such
as programming the controller w/ wrong bytes count and wrapped-around
resid calculation.
OK, so can we go back to data_len being the true value and add an
extra_len for drivers who want to know where the padding lies?

James

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help