Re: [PATCH 1/5] mm/vmscan: Throttle reclaim until some writeback completes if congested
From: NeilBrown <hidden>
Date: 2021-09-21 21:41:32
Also in:
linux-fsdevel, lkml
On Tue, 21 Sep 2021, Mel Gorman wrote:
On Tue, Sep 21, 2021 at 10:13:17AM +1000, NeilBrown wrote:quoted
On Mon, 20 Sep 2021, Mel Gorman wrote:quoted
-long wait_iff_congested(int sync, long timeout) -{ - long ret; - unsigned long start = jiffies; - DEFINE_WAIT(wait); - wait_queue_head_t *wqh = &congestion_wqh[sync]; - - /* - * If there is no congestion, yield if necessary instead - * of sleeping on the congestion queue - */ - if (atomic_read(&nr_wb_congested[sync]) == 0) { - cond_resched(); - - /* In case we scheduled, work out time remaining */ - ret = timeout - (jiffies - start); - if (ret < 0) - ret = 0; - - goto out; - } - - /* Sleep until uncongested or a write happens */ - prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);Uninterruptible wait. ....quoted
+static void +reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, + long timeout) +{ + wait_queue_head_t *wqh = &pgdat->reclaim_wait; + unsigned long start = jiffies; + long ret; + DEFINE_WAIT(wait); + + atomic_inc(&pgdat->nr_reclaim_throttled); + WRITE_ONCE(pgdat->nr_reclaim_start, + node_page_state(pgdat, NR_THROTTLED_WRITTEN)); + + prepare_to_wait(wqh, &wait, TASK_INTERRUPTIBLE);Interruptible wait. Why the change? I think these waits really need to be TASK_UNINTERRUPTIBLE.Because from mm/ context, I saw no reason why the task *should* be uninterruptible. It's waiting on other tasks to complete IO and it is not protecting device state, filesystem state or anything else. If it gets a signal, it's safe to wake up, particularly if that signal is KILL and the context is a direct reclaimer.
I disagree. An Interruptible sleep only makes sense if the "was interrupted" status can propagate up to user-space (or to some in-kernel handler that will clear the signal). In particular, if reclaim_throttle() is called in a loop (which it is), and if that loop doesn't check for signal_pending (which it doesn't), then the next time around the loop after receiving a signal, it won't sleep at all. That would be bad. In general, if you don't return an error, then you probably shouldn't sleep Interruptible. I notice that tasks sleep on kswapd_wait as TASK_INTERRUPTIBLE, but they don't have any signal handling. I suspect this isn't actually a defect because I suspect that is it not even possible to SIGKILL kswapd. But the code seems misleading. I guess I should write a patch. Unless reclaim knows to abort completely on a signal (__GFP_KILLABLE ???) this must be an UNINTERRUPTIBLE wait. Thanks, NeilBrown
The original TASK_UNINTERRUPTIBLE is almost certainly a copy&paste from congestion_wait which may be called because a filesystem operation must complete before it can return to userspace so a signal waking it up is pointless. -- Mel Gorman SUSE Labs