Thread (25 messages) 25 messages, 10 authors, 2007-11-09

Re: 2.6.23.1: mdadm/raid5 hung/d-state

From: Chuck Ebbert <hidden>
Date: 2007-11-07 16:40:00
Also in: lkml

Possibly related (same subject, not in this thread)

On 11/05/2007 03:36 AM, BERTRAND Joël wrote:
Neil Brown wrote:
quoted
On Sunday November 4, jpiszcz@lucidpixels.com wrote:
quoted
# ps auxww | grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       273  0.0  0.0      0     0 ?        D    Oct21  14:40
[pdflush]
root       274  0.0  0.0      0     0 ?        D    Oct21  13:00
[pdflush]

After several days/weeks, this is the second time this has happened,
while doing regular file I/O (decompressing a file), everything on
the device went into D-state.
At a guess (I haven't looked closely) I'd say it is the bug that was
meant to be fixed by

commit 4ae3f847e49e3787eca91bced31f8fd328d50496

except that patch applied badly and needed to be fixed with
the following patch (not in git yet).
These have been sent to stable@ and should be in the queue for 2.6.23.2
    My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
time :

...
        spin_lock(&sh->lock);
        clear_bit(STRIPE_HANDLE, &sh->state);
        clear_bit(STRIPE_DELAYED, &sh->state);

        s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
        s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
        s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
        /* Now to look around and see what can be done */

        /* clean-up completed biofill operations */
        if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
                clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
                clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
                clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
        }

        rcu_read_lock();
        for (i=disks; i--; ) {
                mdk_rdev_t *rdev;
                struct r5dev *dev = &sh->dev[i];
...

but it doesn't fix this bug.
Did that chunk starting with "clean-up completed biofill operations" end
up where it belongs? The patch with the big context moves it to a different
place from where the original one puts it when applied to 2.6.23...

Lately I've seen several problems where the context isn't enough to make
a patch apply properly when some offsets have changed. In some cases a
patch won't apply at all because two nearly-identical areas are being
changed and the first chunk gets applied where the second one should,
leaving nowhere for the second chunk to apply.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help