MMC quirks relating to performance/lifetime.
From: Andrei Warkentin <hidden>
Date: 2011-02-20 04:39:06
Also in:
linux-mmc
On Sat, Feb 19, 2011 at 3:54 AM, Arnd Bergmann [off-list ref] wrote:
On Friday 18 February 2011 23:40:16 Andrei Warkentin wrote:quoted
On Fri, Feb 18, 2011 at 1:47 PM, Andrei Warkentin [off-list ref] wrote: Flashbench timings for both Sandisk and Toshiba cards. Attaching due to size.Very nice, thanks for the measurement! I don't think having the results inline in the mail is a problem, it would even make it easier to quote.quoted
Some interesting things that I don't understand. For the align test, I extended it to do a write align test (-A). I tried two partitions that I could write over, and both read and writes behaved differently for the two partitions on same device. Odd. They are both 4MB aligned.I never did a write align test because the results will be highly unreliable as soon as you get into thrashing. Your results seem to be meaningful still, so maybe we should have it after all, but I'll put a big warning on it.
Actually it would be a good idea to also bail/warn if you do the au test with more open au's than the size of the passed device allows, since it'll just wrap around and skew the results.
quoted
On the sandisk it was the write align that made the page size stand out. ?The read align had pretty constant results.I've noticed on other Sandisk media that the read align test is sometimes useless. It may help to do a full erase of the partition, or to fill it with data before running the test.quoted
On the toshiba the results varied wildly for the two partitions. For partition 6, there was a clear pattern in the diff values for read align. For 9, it was all over the place. For 9 with the write align, 8K and 16K the crossing writes took ~115ms!! Look in attached files for all the data.Partition 6 is a lot smaller, so you have the accesses less than a segment apart, so it shows other effects.quoted
The AU tests were interesting too, especially how with several open AUs the throughput is higher for certain smaller sizes on sandisk, but if I interpret it correctly both cards have at least 4 AUs, as I didn't see yet a significant drop for small sizes. The larger ones I am running now on mmcblk0p9 which is sufficiently larger for these tests... (mmcblk0p6 is only 40mb, p9 is 314 mb)Right, you should try larger values for --open-au-nr here. It's at least a good sign that the drive can do random access inside a segment and that it can have at least 4 segments open. This is much better than I expected from your descriptions at first.
Actually the Toshiba one seems to have 7 AUs if I interpret this correctly. ^C # ./flashbench -O -0 6 -b 512 /dev/block/mmcblk0p9 4MiB 5.91M/s 2MiB 8.84M/s 1MiB 10.8M/s 512KiB 13M/s 256KiB 13.6M/s ^C # ./flashbench -O -0 7 -b 512 /dev/block/mmcblk0p9 4MiB 6.32M/s 2MiB 8.63M/s 1MiB 10.5M/s 512KiB 13.2M/s 256KiB 13M/s ^[[A^[[D^[[A128KiB 12.3M/s ^C # ./flashbench -O -0 8 -b 512 /dev/block/mmcblk0p9 4MiB 6.65M/s 2MiB 7.02M/s 1MiB 6.36M/s 512KiB 3.17M/s 256KiB 1.53M/s The Sandisk one has 20 AUs. # ./flashbench -O -0 20 -b 512 /dev/block/mmcblk0p9 4MiB 11.3M/s 2MiB 12.8M/s 1MiB 9.87M/s 512KiB 9.97M/s 256KiB 9.13M/s 128KiB 8.05M/s ^C # ./flashbench -O -0 50 -b 512 /dev/block/mmcblk0p9 4MiB 7.19M/s ^C # ./flashbench -O -0 2 -b 512 /dev/block/mmcblk0p9 ^C # ./flashbench -O -0 22 -b 512 /dev/block/mmcblk0p9 4MiB 11.6M/s 2MiB 12.3M/s 1MiB 5.13M/s 512KiB 2.57M/s 256KiB 1.59M/s 128KiB 1.16M/s 64KiB 776K/s ^C # ./flashbench -O -0 21 -b 512 /dev/block/mmcblk0p9 4MiB 11.2M/s 2MiB 12.4M/s 1MiB 4.65M/s 512KiB 1.95M/s 256KiB 955K/s
However, the drop from 32 KB to 16 KB in performance is horrifying for the Toshiba drive, it's clear that this one does not like to be accessed smaller than 32 KB at a time, an obvious optimization for FAT32 with 32 KB clusters. How does this change with your kernel patches?
Since the only performance-increasing patch here would be just the one that splits unaligned accesses, I wouldn't expect any improvements for page-aligned accesses < 32KB. As you can see here... # cat /sys/block/mmcblk0/device/page_size 8192 # ./flashbench -O -0 1 -b 512 /dev/block/mmcblk0p9 4MiB 6.81M/s 2MiB 7.73M/s 1MiB 9.21M/s 512KiB 9.98M/s 256KiB 10.3M/s 128KiB 10.2M/s 64KiB 9.76M/s 32KiB 8.52M/s 16KiB 3.68M/s 8KiB 1.72M/s 4KiB 837K/s ^C # echo 0 > /sys/block/mmcblk0/device/page_size # ./flashbench -O -0 1 -b 512 /dev/block/mmcblk0p9 4MiB 6.42M/s 2MiB 7.79M/s 1MiB 9.22M/s 512KiB 10M/s 256KiB 9.94M/s 128KiB 10.1M/s 64KiB 9.68M/s 32KiB 8.5M/s 16KiB 3.65M/s 8KiB 1.73M/s 4KiB 838K/s 2KiB 417K/s ^C #
For the sandisk drive, it's funny how it is consistently faster doing random access than linear access. I don't think I've seem that before. It does seem to have some cache for linear access using smaller than 16 KB, and can probably combine them when it's only writing to a single segment.
Yes, that is pretty interesting. Smaller than 16K? Not smaller than 32K? I wonder what it is doing...