Thread (13 messages) 13 messages, 6 authors, 2013-09-19

Re: Best configuration for bcache/md cache or other cache using ssd

From: Roberto Spadim <hidden>
Date: 2013-09-19 22:50:33

Hi Stan!!

2013/9/19 Stan Hoeppner [off-list ref]:
On 9/19/2013 10:30 AM, Roberto Spadim wrote:
quoted
1) Smart or other tool to diagnostics and access drives diagnostics
See the '-d' option in smartctl(8).
nice i tryed this in a running machine, some cards don't work, others
work with smart

quoted
2) Cache memory (if i have 512mb here, i could replace with 512mb or
more at linux side? instead of cache at raid board, why not add cache
to linux kernel?)
Because if the power goes out, or the kernel crashes, the contents of
system RAM are lost, ergo you lose data and possibly corrupt your files
and filesystem.
OK, cache at raid is better

quoted
3) batery backup, how this really work? what kind of raid board really
work nice with this?
A BBU, or battery backup unit, provides power to the DRAM and DRAM
controller on the RAID card.  If the power goes out or kernel crashes
the data is intact.  When power and operating state are restored, the
controller flushes the cache contents to the drives.  Most BBUs will
hold for about 72 hours before the batteries run out.

A newer option is a flash backed cache.  Here there is no battery unit,
In this case, it´s something similar to a SSD at raid card? this don't
have write cicles limit? and problems with  flash being corrupt?
and the data is truly non-volatile.  In the event of power loss or some
crash situations, the controller copies the contents of the write cache
to onboard flash memory.  When normal system state is restored, the
flash is dumped to DRAM, then flushed to the drives.  This option is a
little more expensive, but is preferred for obvious reasons.  There is
no 72 hour limit.  The data resides in flash indefinitely.  This can be
valuable in the case of natural disasters that take out utility power
and network links for days or weeks, but leave the facility and systems
unharmed.  With flash backed write cache, you can wait it out, or
relocate, and the data will hit disk after you power up.  With BBU, you
have only ~3 days to get back online.
quoted
4) support for news drivers (firmware updates)
All of the quality RAID cards have pretty seamless firmware update
mechanisms.  In the case of Linux the drivers are in mainline, so
updating your kernel updates the driver.
Nice =)
quoted
5) support for hot swap
RAID cards supported hot swap long before Linus wrote the first lines of
code that became Linux, and more than a decade before the md driver was
written.  RAID cards typically handle hot swap better than md does.
Yes, but some Dell server (here) have a raid card and can't allow a
hot swap, i don't know if it's a problem about drive bays or not

quoted
6) if i use ssd what should i consider? i have one raid card with ssd
and i don't know if it's runs nice or just do the job
I'm not sure what you're asking here.
Well just to know if the raid card could be used with ssd...
i was thinking something like:

RAID CARD -> ssd
motherboard -> hdd

and a bcache or dmcache of md-raid1(hdds) with raidcard-raid1(ssds)

quoted
7) anything else? costs =) ?
I can't speak accurately to costs.  The last time we spoke of pricing,
off list, you stated a 500GB SATA drive costs ~$500 USD in your locale.
 That's radically out of line with pricing here in the US.
yes here is my problem, this time i´m considering some parts being imported

I can only say for comparison that I can obtain an LSI 9260-4i 512MB
w/BBU for ~$470 USD.  I can obtain an Intel DC S3700 200GB enterprise
SSD for $499.  But this isn't an apt comparison as neither device is a
:'( i will cry hahah it's very cheap for my country market
direct replacement for the other.  It's the complete storage
architecture and its overall capabilities that matters.  Using an SSD
with one of the late kernel caching hacks doesn't give you the
protection of BBU/flash cache on hardware RAID.  Nor does it give you
the near zero latency fsync ACK of RAID cache.
Nice

quoted
i will search about this boards you told, and about features (i don't
know what bbu means yet, but will check... any good raid boards
literarture to read? maybe wikipedia?)
So you've never used a hardware RAID controller?  Completely new to you?
+- i don't know with details, i have a supperficial experience only,
not a technical view of raid cards yet
 Wow...  Start with these.  Beware.  This is a few hundred pages of
material.

http://www.lsi.com/downloads/Public/MegaRAID%20SAS/MegaRAID%20SAS%209260-4i/MR_SAS9260-4i_PB_FIN_071212.pdf
http://www.lsi.com/downloads/Public/MegaRAID%20SAS/41450-04_RevC_6Gbs_MegaRAID_SAS_UG.pdf
http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/51530-00_RevK_MegaRAID_SAS_SW_UG.pdf
!!! wow! very nice, i will read :)
Thanks a lot!


quoted
thanks a lot!! :)

2013/9/19 Stan Hoeppner [off-list ref]:
quoted
On 9/18/2013 10:42 PM, Roberto Spadim wrote:
quoted
nice, in other words, is better spend money with hardware raid cards
right?
If it's my money, yes, absolutely.  RAID BBWC will run circles around an
SSD with a random write workload.  The cycle time on DDR2 SDRAM is 10s
of nanoseconds.  Write latency on flash cells is 50-100 microseconds.
Do the math.

Random write apps such as transactional databases rarely, if ever,
saturate the BBWC faster than it can flush and free pages, so the
additional capacity of an SSD yields no benefit.  Additionally, good
RAID firmware will take some of the randomness out of the write pattern
by flushing nearby LBA sectors in a single IO to the drives, increasing
the effectiveness of TCQ/NCQ, thereby reducing seeks.  This in essence
increases the random IO throughput of the drives.

In summary, yes, a good caching RAID controller w/BBU will yield vastly
superior performance compared to SSD for most random write workloads,
simply due to instantaneous ACK to fsync and friends.
quoted
any special card that i should look?
If this R420 is the 4x3.5" model then the LSI 9260-4i is suitable.  If
it's the 8x2.5" drive model then the LSI 9260-8i is suitable.  Both have
512MB of cache DRAM.  In both cases you'd use the LSI00161/ LSIiBBU07
BBU for lower cost instead of the flash option.  These two models have
the lowest MSRP of the LSI RAID cards having both large cache and BBU
support.

In the 8x2.5" case you could also use the Dell PERC 710, which has built
in FBWC.  Probably more expensive than the LSI branded cards.  All of
Dell's RAID cards are rebranded LSI cards, or OEM produced by LSI for
Dell with Dell branded firmware.  I.e. it's the same product, same
performance, just a different name on it.

Adaptec also has decent RAID cards.  The bottom end doesn't support BBU
so steer clear of those, i.e. 6405e/6805e, etc.

Don't use Areca, HighPoint, Promise, etc.  They're simply not in the
same league as the enterprise vendors above.  If you have problems with
optimizing their cards, drivers, firmware, etc for a specific workload,
their support is simply non existent.  You're on your own.
quoted
2013/9/18 Stan Hoeppner [off-list ref]:
quoted
On 9/18/2013 12:33 PM, Roberto Spadim wrote:
quoted
Well the internet link here is 100mbps, i think the workload will be a
bit more than only 100 users, it's a second webserver+database server
He is trying to use a cheaper server with more disk performace, Brazil
costs are too high to allow a full ssd system or 15k rpm sas harddisks
For mariadb server i'm studing if the thread-pool scheduler will be
used instead of one thread per connection but "it's not my problem"
the final user will select what is better for database scheduler
In other words i think the work load will not be a simple web server
cms/blog, i don't know yet how it will work, it's a black/gray box to
me, today he have sata enterprise hdd 7200rpm at servers (dell server
r420 if i'm not wrong) and is studing if a ssd could help, that's my
'job' (hobby) in this task
Based on the information provided it sounds like the machine is seek
bound.  The simplest, and best, solution to this problem is simply
installing a [B|F]BWC RAID card w/512KB cache.  Synchronous writes are
acked when committed to RAID cache instead of the platter.  This will
yield ~130,000 burst write TPS before hitting the spindles, or ~130,000
writes in flight.  This is far more performance than you can achieve
with a low end enterprise SSD, for about the same cost.  It's fully
transparent and performance is known and guaranteed, unlike the recent
kernel based block IO caching hacks targeting SSDs as fast read/write
buffers.

You can use the onboard RAID firmware to create RAID1s or a RAID10, or
you can expose each disk individually and use md/RAID while still
benefiting from the write caching, though for only a handful of disks
you're better off using the firmware RAID.  Another advantage is that
you can use parity RAID (controller firmware only) and avoid some of the
RMW penalty, as the read blocks will be in controller cache.  I.e. you
can use three 7.2K disks, get the same capacity as a four disk RAID10,
with equal read performance and nearly the same write performance.

Write heavy DB workloads are a post child for hardware caching RAID devices.

--
Stan



quoted
2013/9/18 Drew [off-list ref]:
quoted
On Wed, Sep 18, 2013 at 8:51 AM, Roberto Spadim [off-list ref] wrote:
quoted
Sorry guys, this time i don't have a full knowledge about the
workload, but from what he told me, he want fast writes with hdd but i
could check if small ssd devices could help
After install linux with raid1 i will install apache mariadb and php
at this machine, in other words it's a database and web server load,
but i don't know what size of app and database will run yet

Btw, ssd with bcache or dm cache could help hdd (this must be
enterprise level) writes, right?
Any idea what the best method to test what kernel drive could give
superior performace? I'm thinking about install the bcache, and after
make a backup and install dm cache and check what's better, any other
idea?
We still need to know what size datasets are going to be used. And
also given it's a webserver, how big of a pipe does he have?

Given a typical webserver in a colo w/ 10Mbps pipe, I think the
suggested config is overkill. For a webserver the 7200 SATA's should
be able to deliver enough data to keep apache happy.

In the database side, depends on how intensive the workload is. I see
a lot of webservers where the 7200's are just fine because the I/O
demands from the database are low. Blog/CMS systems like wordpress
will be harder on the database but again it depends on how heavy the
access is to the server. How many visitors/hour does he expect to
serve?


--
Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Roberto Spadim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help