Re: mm, virtio: possible OOM lockup at virtballoon_oom_notify()
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2017-09-29 04:00:14
Also in:
linux-mm
On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
Hello. I noticed that virtio_balloon is using register_oom_notifier() and leak_balloon() from virtballoon_oom_notify() might depend on __GFP_DIRECT_RECLAIM memory allocation. In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to serialize against fill_balloon(). But in fill_balloon(), alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory(). And leak_balloon() is called by virtballoon_oom_notify() via blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already held by fill_balloon(). As a result, despite __GFP_NORETRY is specified, fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex at leak_balloon().
That would be tricky to fix. I guess we'll need to drop the lock while allocating memory - not an easy fix.
Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf() request from leak_balloon() from virtballoon_oom_notify() from blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup because oom_lock mutex is already held before calling out_of_memory().
I guess we should just do GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?
OOM notifier callback should not (directly or indirectly) depend on __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?