Re: raid1: freeze_array/wait_all_barriers deadlock
From: Nate Dailey <hidden>
Date: 2017-10-16 15:34:37
Hi Coly, I'm not 100% sure that's going to work.
What I'm imagining is that close_sync gets the lock, so now
raid1d/handle_read_error/freeze_array is blocked waiting for it.
Is it possible that raid1d might need to complete some sync IO from the
retry_list, that close_sync/wait_all_barriers is waiting for? In that case,
there would still be a deadlock.
Actually, what if close_sync just did this:
static void close_sync(struct r1conf *conf)
{
int idx;
for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
_wait_barrier(conf, idx);
_allow_barrier(conf, idx);
}
...
Is there any reason all the _wait_barriers have to complete before starting on
_allow_barrier? If not, then I think this change would work (but will think
about it more).
Nate
On 10/16/2017 10:43 AM, Coly Li wrote:On 2017/10/16 下午8:58, Nate Dailey wrote:quoted
Hi Coly, I'm not sure I understand the change you're proposing. Would it be something like the following? spin_lock_irq(&conf->resync_lock); conf->array_frozen = 1; raid1_log(conf->mddev, "wait freeze"); while (get_unqueued_pending(conf) != extra) { wait_event_lock_irq_cmd_timeout( conf->wait_barrier, get_unqueued_pending(conf) == extra, conf->resync_lock, flush_pending_writes(conf), timeout); } spin_unlock_irq(&conf->resync_lock); On its own, I don't see how this would make any difference. Until array_frozen == 0, wait_all_barriers will continue to be blocked, which in turn will prevent the condition freeze_array is waiting on from ever becoming true.Hi Nate, You are right, this idea does not help too much, we need to find another way.quoted
Or should something else be done inside the new freeze_array loop that would allow wait_all_barriers to make progress?It seems wait_all_barriers() is only used in close_sync(), which is to make sure all sync requests hit platters before raid1_sync_request() returns. How about setting a critical section in close_sync() and protected by another lock. It might be something like this, static void close_sync(struct r1conf *conf) { + mutex_lock close_sync_lock; wait_all_barriers(conf); + mutex_unlock close_sync_lock; allow_all_barriers(conf); [snip] } static void freeze_array(struct r1conf *conf, int extra) { + mutex_lock close_sync_lock spin_lock_irq(&conf->resync_lock); [snip] spin_unlock_irq(&conf->resync_lock); + mutex_unlock close_sync_lock } Then conf->array_frozen won't be set when wait_all_barriers() partially iterated on barrier buckets. Then a deadlock can be avoided. How do you think of this one ? Thanks. Coly Li [snip]