Thread (54 messages) 54 messages, 8 authors, 2016-11-28

Re: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks

From: Mikulas Patocka <mpatocka@redhat.com>
Date: 2016-11-23 21:12:06
Also in: lkml


On Sun, 14 Aug 2016, Michal Hocko wrote:
On Sat 13-08-16 13:34:29, Mikulas Patocka wrote:
quoted

On Fri, 12 Aug 2016, Michal Hocko wrote:
quoted
On Thu 04-08-16 14:49:41, Mikulas Patocka wrote:
quoted
On Wed, 3 Aug 2016, Michal Hocko wrote:
quoted
But the device congestion is not the only condition required for the
throttling. The pgdat has also be marked congested which means that the
LRU page scanner bumped into dirty/writeback/pg_reclaim pages at the
tail of the LRU. That should only happen if we are rotating LRUs too
quickly. AFAIU the reclaim shouldn't allow free ticket scanning in that
situation.
The obvious problem here is that mempool allocations should sleep in 
mempool_alloc() on &pool->wait (until someone returns some entries into 
the mempool), they should not sleep inside the page allocator.
I agree that mempool_alloc should _primarily_ sleep on their own
throttling mechanism. I am not questioning that. I am just saying that
the page allocator has its own throttling which it relies on and that
cannot be just ignored because that might have other undesirable side
effects. So if the right approach is really to never throttle certain
requests then we have to bail out from a congested nodes/zones as soon
as the congestion is detected.

Now, I would like to see that something like that is _really_ necessary.
Currently, it is not a problem - device mapper reports the device as 
congested only if the underlying physical disks are congested.

But once we change it so that device mapper reports congested state on its 
own (when it has too many bios in progress), this starts being a problem.
OK, can we wait until it starts becoming a real problem and solve it
appropriately then?

I will repost the patch which removes thottle_vm_pageout in the meantime
as it doesn't seem to be needed anymore.

-- 
Michal Hocko
SUSE Labs
Hi Michal

So, here Google developers hit a stacktrace where a block device driver is 
being throttled in the memory management:

https://www.redhat.com/archives/dm-devel/2016-November/msg00158.html

dm-bufio layer is something like a buffer cache, used by block device 
drivers. Unlike the real buffer cache, dm-bufio guarantees forward 
progress even if there is no memory free.

dm-bufio does something similar like a mempool allocation, it tries an 
allocation with GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN 
(just like a mempool) and if it fails, it will reuse some existing buffer.

Here, they caught it being throttled in the memory management:

   Workqueue: kverityd verity_prefetch_io
   __switch_to+0x9c/0xa8
   __schedule+0x440/0x6d8
   schedule+0x94/0xb4
   schedule_timeout+0x204/0x27c
   schedule_timeout_uninterruptible+0x44/0x50
   wait_iff_congested+0x9c/0x1f0
   shrink_inactive_list+0x3a0/0x4cc
   shrink_lruvec+0x418/0x5cc
   shrink_zone+0x88/0x198
   try_to_free_pages+0x51c/0x588
   __alloc_pages_nodemask+0x648/0xa88
   __get_free_pages+0x34/0x7c
   alloc_buffer+0xa4/0x144
   __bufio_new+0x84/0x278
   dm_bufio_prefetch+0x9c/0x154
   verity_prefetch_io+0xe8/0x10c
   process_one_work+0x240/0x424
   worker_thread+0x2fc/0x424
   kthread+0x10c/0x114

Will you consider removing vm throttling for __GFP_NORETRY allocations?

Mikulas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help