RE: [LSF/MM TOPIC] linux servers as a storage server - what'smissing?
From: Loke, Chetan <hidden>
Date: 2012-01-19 17:32:59
Also in:
linux-fsdevel
quoted
True, a single front-end won't see all of those LUNs/devices. So notaquoted
big concern about the front-end hosts. I am thinking of a use-case where folks can use a linux-box to
manage
quoted
their different storage arrays. So this linux box with 'libstoragemgmt + app' needs to manage(scan/create/delete/so on) all those LUNs.People do have boxes with thousands of luns though & file systems in active use. Both for SAN and NAS volumes. One of the challenges is what to do when just one LUN (or NFS server) crashes and burns.
The FS needs to go read-only(plain & simple) because you don't know what's going on. You can't risk writing data anymore. Let the apps fail. You can make it happen even today. It's a simple exercise. Like others, I have seen/debugged enough weirdness when it comes to resets/aborts(FYI - 200+ hosts in a cluster). Because of NDA reasons I can't disclose a whole lot but folks have fixed/enhanced scsi stack to make resets/aborts fully robust. And you need folks who can debug 'apps/FS/block/initiator/wire-protocol/target-side' in one shot. Simple. So when you say 'crash & burn' then either or 'all' of the above(minus the protocol handling) might need fixing.
You simply cannot "reboot" the server to clean up after one bad mount when you have thousands of other happy users runs on
thousands/hundreds
of other mount points :)
Again, can't front-end can go read only and limit the outage w/o disturbing thousands of users? Chetan Loke