Thread (62 messages) 62 messages, 9 authors, 2011-03-08

MMC quirks relating to performance/lifetime.

From: axboe@kernel.dk (Jens Axboe)
Date: 2011-03-01 19:15:42
Also in: linux-fsdevel, linux-mmc

On 2011-03-01 14:11, Arnd Bergmann wrote:
On Tuesday 01 March 2011 19:48:17 Jens Axboe wrote:
quoted
On 2011-02-25 07:21, Arnd Bergmann wrote:
quoted
On Friday 25 February 2011, Andrei Warkentin wrote:
quoted
Yup. I understand :-).  That's the strategy I'm going to follow. For
page_size-alignment/splitting I'm looking at the block layer now. Is
that the right approach or should I still submit a (cleaned up) patch
to mmc/card/block.c for that performance improvement.
I guess it should live in block/cfq-iosched in the long run, but I don't
know how easy it is to implement it there for test purposes.
I don't think I saw the original patch(es) for this?
Nobody has posted one yet, only discussions. Andrei made a patch for the
MMC block driver to split requests in some cases, but I think the
concept has changed enough that it's probably not useful to look at
that patch.

I think what needs to be done here is to split requests in these cases:

* Small requests should be split on flash page boundaries, where a page
is typically 8 to 32 KB. Sending one hardware request that spans two
partial pages can be slower than sending two requests with the same
data, but on page boundaries.

* If a hardware transfer is limited to a few sectors, these should be
aligned to page boundaries. E.g. assuming a 16 sector page and 32 sector
maximum transfers, a request that spans from sector 7 to 62 should be
split into three transfers: 7-15, 16-47 and 48-62, not 7-38 and 39-62.
This reduces the number of page read-modify-write cycles that the drive
does.

* No request should ever span multiple erase blocks. Most flash drives today
have 4MB erase blocks (sometimes 1, 2 or 8), and the I/O scheduler should
treat the erase block boundary like a seek on a hard drive. The I/O
scheduler should try to send all sector writes of an erase block in sequence,
but after that it can chose any other erase block to write to next.

I think if we get this logic, we can deal well with all cheap flash drives.
The two parameters we need are the page size and the erase block size,
which the kernel can sometimes guess, but should also be tunable in
sysfs for devices that don't tell us or lie to the kernel about them.

I'm not sure if we want to do this for all nonrotational media, or
add another flag to enable these optimizations. On proper SSDs that have
an intelligent controller and enough RAM, they probably would not help
all that much, or even make it slightly slower due to a higher number
of separate write requests.
Thanks for the recap. One way to handle this would be to have a dm
target that ensures that requests are never built up to violate any of
the above items. Doing splitting is a little silly, when you can prevent
it from happening in the first place.

Alternatively, a queue ->merge_bvec_fn() with a settings table could
provide the same.

As this is of limited scope, I would prefer having this done via a
plugin of some sort (like a dm target).

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help