Re: [PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG

From: Wei Wang <hidden>
Date: 2017-10-11 03:16:55

On 10/11/2017 10:26 AM, Tetsuo Handa wrote:

Wei Wang wrote:

quoted

On 10/10/2017 09:09 PM, Tetsuo Handa wrote:

quoted

Wei Wang wrote:

quoted

And even if we could remove balloon_lock, you still cannot use
__GFP_DIRECT_RECLAIM at xb_set_page(). I think you will need to use
"whether it is safe to wait" flag from
"[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()" .

Without the lock being held, why couldn't we use __GFP_DIRECT_RECLAIM at
xb_set_page()?

Because of dependency shown below.

leak_balloon()
   xb_set_page()
     xb_preload(GFP_KERNEL)
       kmalloc(GFP_KERNEL)
         __alloc_pages_may_oom()
           Takes oom_lock
           out_of_memory()
             blocking_notifier_call_chain()
               leak_balloon()
                 xb_set_page()
                   xb_preload(GFP_KERNEL)
                     kmalloc(GFP_KERNEL)
                       __alloc_pages_may_oom()
                         Fails to take oom_lock and loop forever

__alloc_pages_may_oom() uses mutex_trylock(&oom_lock).

Yes. But this mutex_trylock(&oom_lock) is semantically mutex_lock(&oom_lock)
because __alloc_pages_slowpath() will continue looping until
mutex_trylock(&oom_lock) succeeds (or somebody releases memory).

quoted

I think the second __alloc_pages_may_oom() will not continue since the
first one is in progress.

The second __alloc_pages_may_oom() will be called repeatedly because
__alloc_pages_slowpath() will continue looping (unless somebody releases
memory).

OK, I see, thanks. So, the point is that the OOM code path should not
have memory allocation, and the
old leak_balloon (without the F_SG feature) don't need xb_preload(). I
think one solution would be to let
the OOM uses the old leak_balloon() code path, and we can add one more
parameter to leak_balloon
to control that:

leak_balloon(struct virtio_balloon *vb, size_t num, bool oom)

quoted

By the way, is xb_set_page() safe?
Sleeping in the kernel with preemption disabled is a bug, isn't it?
__radix_tree_preload() returns 0 with preemption disabled upon success.
xb_preload() disables preemption if __radix_tree_preload() fails.
Then, kmalloc() is called with preemption disabled, isn't it?
But xb_set_page() calls xb_preload(GFP_KERNEL) which might sleep with
preemption disabled.

Yes, I think that should not be expected, thanks.

I plan to change it like this:

bool xb_preload(gfp_t gfp)
{
        if (!this_cpu_read(ida_bitmap)) {
                struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap), gfp);

                if (!bitmap)
                        return false;
                bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap);
                kfree(bitmap);
        }

Excuse me, but you are allocating per-CPU memory when running CPU might
change at this line? What happens if running CPU has changed at this line?
Will it work even with new CPU's ida_bitmap == NULL ?


Yes, it will be detected in xb_set_bit(): when ida_bitmap = NULL on the
new CPU, xb_set_bit() will
return -EAGAIN to the caller, and the caller should restart from
xb_preload().

Best,
Wei

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help