Thread (130 messages) 130 messages, 15 authors, 2013-04-17

Re: RAID performance

From: Adam Goryachev <hidden>
Date: 2013-02-12 05:33:04

On 12/02/13 13:46, Stan Hoeppner wrote:
If it's OK I'm going to snip a bunch of this and get to the meat of it,
so hopefully it's less confusing.
Thanks, was getting way over the top :)
That is correct.  Long story short, the last time I messed with a
configuration such as this I was using a Cisco that fanned over 802.3ad
groups based on L3/4 info.  Stock 802.3ad won't do this.  
Yes, Cisco have their own proprietary extensions... EtherChannel I think
it is called.
I apologize
for the confusion, and for the delay in responding (twas a weekend after
all).
No problem, I expected as much... Just because I'm silly enough to work
on a weekend, I realise most others don't. Besides, any help I get here
is a bonus :)

However, I did end up already making the solution proposal to the
client, and have already ordered some equipment, but see below...
 I just finished reading the relevant section of your GS716T-200
(GST716-v2) manual, and it does not appear to have this capability.
Nope.
All is not lost.  I've done a considerable amount of analysis of all the
information you've provided.  In fact I've spent way to much time on
this.  But it's an intriguing problem involving interesting systems
assembled from channel parts, i.e. "DIY", and I couldn't put it down.  I
was hoping to come up with a long term solution that didn't require any
more hardware than a NIC and HBA, but that's just not really feasible.
That's OK, I was fully prepared to get additional equipment, and the
customer was happy to throw money at it to get it fixed...
So, my conclusions and recommendations, based on all the information I
have to date:

2.  To scale iSCSI throughput using a single switch will require
    multiple host ports and MPIO, but no LAG for these ports.
I'm assuming MPIO is Multi Path IO (ie, MultiPath iSCSI)?
3.  Given the facts above, an extra port could be added to each TS Xen
    box.  A separate subnet would be created for the iSCSI SAN traffic,
    and each port given an IP in the subnet.  Both ports would carry
    MPIO iSCSI packets, but only one port would carry user traffic.
This would allow iSCSI up to 2Gbit bi-directional traffic per xen box,
though some of it would also be consumed for the VM's. Also, the iSCSI
server would only be capable of a total 2Gbps on each network, so it
could handle two xen boxes demanding 100% throughput, which is a total
of 4Gbps which is pretty impressive (assuming SAN server uses
balance-alb). However, ignore this, I'll concentrate on what you suggest
below.
4.  Given the fact that there will almost certainly be TS users on the
    target box when the DC VM gets migrated due to some kind of failure
    or maintenance, adding the load of file sharing may not prove
    desirable.  And you'd need another switch.  Thus, I'd recommend:

A.  Dedicate the DC Xen box as a file server and dedicate a non-TS
    Xen box as its failover partner.  Each machine will receive a quad
    port NIC.  Two ports on each host will be connected to the current
    16 port switch.  The two ports will be configured to balance-alb
    using the current user network IP address.  All switch ports will
    be reconfigured to standard mode, no LAGs, as they are not needed
    for Linux balance-alb.  Disconnect the 8111 mobo ports on these two
    boxes from the switch as they're no longer needed.  Prioritize RDP
    in the switch, leave all other protocols alone.
BTW, the switch has a maximum of 4 LAG's, so one option I was going to
try would not have worked anyway. Though that was probably just bad
design on my part... I think I'm passed that now :)
B.  We remove 4 links each from the iSCSI servers, the primary and the
    DRBD backup server, from the switch.  This frees up 8 ports for
    connecting the file servers' 4 ports, and connecting a motherboard
    ethernet port from each iSCSI server to the switch for management.
    If my math is correct this should leave two ports free.
I already have one motherboard port from SAN1/2 connected to another
switch, and also one motherboard port is a direct crossover cable
between san1 and san2 which is configured for DRBD traffic sync (so this
traffic is kept away from the iSCSI traffic).

However, after this, the only connection between the xen boxes running
the terminal servers to the iSCSI server is the single "management"
ethernet port. The Terminal Servers C: is also on the iSCSI server... so
this doesn't quite work.
C.  MPIO is designed specifically for IO scaling, and works well.
    So it's a better fit, and you save the cost of the additional
    switch(es) that would be required to do perfect balance-rr bonding
    between iSCSI hosts (which can be done easily with each host
    ethernet port connected to a different dedicated SAN switch.  In
    this case it would require 4 additional switches.
I assume this means that if you have a quad port card in each machine,
with a single ethernet connected to each of 4 switches, then you can do
balance-rr because bandwidth on both endpoints is equal ? That doesn't
quite work for me because I don't want the expense of a quad port card
in each machine, and also I don't want equal bandwidth.... I want the
server to have more bandwidth than the clients. In any case, let's
ignore this since it doesn't get us closer to the solution.
    Instead what
    we'll do here is connect the remaining 2 ports from each Xen file
    server box, the primary and the backup, and all 4 ports on each
    iSCSI server, the primary and the backup, to a new 12-16 port
    switch.  It can be any cheap unmanaged GbE switch of 12 or more
    ports.  We'll assign an IP address in the new SAN subnet to each
    physical port on these 4 boxes and configure MPIO accordingly.
As mentioned, this cuts off the iSCSI from the rest of the 6 xen boxes.
    So what we end up with is decent session based scaling of user CIFS
    traffic between the TS hosts and the DC Xen servers, with no single
    TS host bogging everyone down, and no desktop lag if both links are
    full due to two greedy users.  We end up with nearly perfect
    ~200MB/s iSCSI scaling in both directions between the DC Xen box
    (and/or backup) and the iSCSI servers, and we end up with nearly
    perfect ~400MB/s each way between the two iSCSI servers via DRBD,
    allowing you to easily do mirroring in real-time.
I'm assuming MPIO requires the following:
SAN must have multiple physical links over 'disconnected' networks (ie,
different networks) on different subnets.
iSCSI client must meet the same requirements.
All for the cost of two quad port NICs and an inexpensive switch, and
possibly a new high performance SAS HBA.  I analyzed many possible paths
to a solution, and I think this one is probably close to ideal.
OK, what about this option:

Install dual port ethernet card into each of the 8 xen boxes
Install 2 x quad port ethernet card into each of the san boxes

Connect one port from each of the xen boxes plus 4 ports from each san
box to a single switch (16ports)

Connect the second port from each of the xen boxes plus 4 ports from
each san box to a second switch (16 ports)

Connect the motherboard port (existing) from each of the xen boxes plus
one port from each of the SAN boxes (management port) to a single switch
(10 ports).

Total of 42 ports.

Leave the existing motherboard port configured with existing IP's/etc,
and dedicate this as the management/user network (RDP/SMB/etc).

We then configure the SAN boxes with two bond devices, each consisting
of a set of 4 x 1Gbps as balance-alb, with one IP address each (from 2
new subnets).

Add a "floating" IP to the current primary SAN on each of the bond
interfaces from the new subnets.

We configure each of the xen boxes with two new ethernets with one IP
address each (from the 2 new subnets).

Configure multipath to talk to the two floating IP's

See a rough sketch at:
http://suspended.wesolveit.com.au/graphs/diagram.JPG
I couldn't fit any detail like IP addresses without making it a complete
mess. BTW, sw1 and sw2 I'm thinking can be the same physical switch,
using VLAN to make them separate (although different physical switches
adds to the reliability factor, so that is also something to think about).

Now, this provides up to 2Gbps traffic for any one host, and up to 8Gbps
traffic in total for the SAN server, which is equivalent to 4 clients at
full speed.

It also allows for the user network to operate at a full 1Gbps for
SMB/RDP/etc, and I could still prioritise RDP at the switch....

I'm thinking 200MB/s should be enough performance for any one machine
disk access, and 1Gbps for any single user side network access should be
ample given this is the same as what they had previously.

The only question left is what will happen when there is only one xen
box asking to read data from the SAN? Will the SAN attempt to send the
data at 8Gbps, flooding the 2Gbps that the client can handle, and
generate all the pause messages, or is this not relevant and it will
"just work". Actually, I think from reading the docs, it will only use
one link out of each group of 4 to send the data, hence it won't attempt
to send at more than 2Gbps to each client....

I don't think this system will scale any further than this, I can only
add additional single Gbps ports to the xen hosts, and I can only add
one extra 4 x 1Gbps ports to each SAN server.... Best case is add 4 x
10Gbps to the SAN, 2 single 1Gbps ports to each xen, providing a full
32Gbps to the clients, each client gets max 4Gbps. In any case, I think
that would be one kick-ass network, besides being a pain to try and
debug, keep cabling neat and tidy, etc... Oh, and the current SSD's
wouldn't be that fast... At 400MB/s read, times 7 data disks is
2800GB/s, actually, damn, that's fast.

The only additional future upgrade I would plan is to upgrade the
secondary san to use SSD's matching the primary. Or add additional SSD's
to expand storage capacity and I guess speed. I may also need to add
additional ethernet ports to both SAN1 and SAN2 to increase the DRBD
cross connects, but these would I assume be configured using linux
bonding in balance-rr since there is no switch in between.
You can pull off the same basic concept buying just the quad port HBA
for the current DC Xen box, removing 2 links between each iSCSI server
and the switch and direct connecting these 4 NIC ports via 2 cross over
cables, and using yet another IP subnet for these, with MPIO.  You'd
have no failover for the DC, and the bandwidth between the iSCSI servers
for BRBD would be cut in half.  But it only costs one quad port NIC.  A
dedicated 200MB/s is probably more than plenty for live DRBD, but again
you have no DC failover.

However, given that you've designed this system with "redundancy
everywhere" in mind, I'm guessing the additional redundancy justifies
the capital outlay for an unmanaged switch and a 2nd quad port NIC.
Let's ignore this... we both agree it isn't a good solution.
If one of those test boxes could be permanently deployed as the failover
host for the DC VM, I think the dedicated iSCSI switch architecture
makes the most sense long term.  If the cost of the switch and another 4
port NIC isn't in the cards right now, you can go the other route with
just one new NIC.  And given that you'll be doing no ethernet channel
bonding on the iSCSI network, but IP based MPIO instead, it's a snap to
convert to the redundant architecture with new switch later.  All you'll
be doing is swapping cables to the new switch and changing IP address
bindings on the NICs as needed.
I'd rather keep all boxes with identical hardware, so that any VM can be
run on any xen host.

So, the current purchase list, which the customer approved yesterday,
and most of it should be delivered tomorrow (insufficient stock, already
ordering from 4 different wholesalers):
4 x Quad port 1Gbps cards
4 x Dual port 1Gbps cards
2 x LSI HBA's (the suggested model)
1 x 48port 1Gbps switch (same as the current 16port, but more ports).

The idea being to pull out 4 x dual port cards from san1/2 and install
the 4 x quad port cards. Then install a single dual port card on each
xen box. Install one LSI HBA in each san box. Use the 48 port switch to
connect it all together.

However, I'm going to be short 1 x quad ethernet, and 1 x sata
controller, so the secondary san is going to be even more lacking for up
to 2 weeks when these parts arrive, but IMHO, that is not important at
this stage, if san1 falls over, I'm going to be screwed anyway running
on spinning disks :) though not as screwed as being plain
down/offline/nothing/just go home folks...
Again, apologies for the false start with the 802.3ad confusion on my
part.  I think you'll find all (or at least most) of the ducks in a row
in the recommendations above.
No problem, this has been a definite learning experience for me and I
appreciate all the time and effort you've put into assisting.

BTW, I went last night (monday night) and removed one dual port card
from the san2, installed into the xen host running the DC VM. Configured
the two new ports on the xen box as active-backup (couldn't get LACP to
work since the switch only supports max of 4 LAG's anyway). Removed one
port from the LAG on san1, and setup the three ports (1 x san + 2 x
xen1) as a VLAN with private IP address on a new subnet. Today,
complaints have been non-existant, mostly relating to issues they had
yesterday but didn't bother to call until today. It's now 4:30pm, so I'm
thinking that the problem is solved just with that done. I was going to
do this across all 8 boxes, using 2 x ethernet on each xen box plus one
x ethernet on each san, producing a max of 1Gbps ethernet for each xen
box. However, I think your suggestion of MPIO is much better, and
grouping the SAN ports into two bundles makes a lot more sense, and
produces 2Gbps per xen box.

Thanks again, I appreciate all the help.

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help