Re: [PATCH v16 3/5] virtio-balloon: VIRTIO_BALLOON_F_SG
From: Wei Wang <hidden>
Date: 2017-10-11 03:16:55
On 10/11/2017 10:26 AM, Tetsuo Handa wrote:
Wei Wang wrote:quoted
On 10/10/2017 09:09 PM, Tetsuo Handa wrote:quoted
Wei Wang wrote:quoted
quoted
And even if we could remove balloon_lock, you still cannot use __GFP_DIRECT_RECLAIM at xb_set_page(). I think you will need to use "whether it is safe to wait" flag from "[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()" .Without the lock being held, why couldn't we use __GFP_DIRECT_RECLAIM at xb_set_page()?Because of dependency shown below. leak_balloon() xb_set_page() xb_preload(GFP_KERNEL) kmalloc(GFP_KERNEL) __alloc_pages_may_oom() Takes oom_lock out_of_memory() blocking_notifier_call_chain() leak_balloon() xb_set_page() xb_preload(GFP_KERNEL) kmalloc(GFP_KERNEL) __alloc_pages_may_oom() Fails to take oom_lock and loop forever__alloc_pages_may_oom() uses mutex_trylock(&oom_lock).Yes. But this mutex_trylock(&oom_lock) is semantically mutex_lock(&oom_lock) because __alloc_pages_slowpath() will continue looping until mutex_trylock(&oom_lock) succeeds (or somebody releases memory).quoted
I think the second __alloc_pages_may_oom() will not continue since the first one is in progress.The second __alloc_pages_may_oom() will be called repeatedly because __alloc_pages_slowpath() will continue looping (unless somebody releases memory).
OK, I see, thanks. So, the point is that the OOM code path should not have memory allocation, and the old leak_balloon (without the F_SG feature) don't need xb_preload(). I think one solution would be to let the OOM uses the old leak_balloon() code path, and we can add one more parameter to leak_balloon to control that: leak_balloon(struct virtio_balloon *vb, size_t num, bool oom)
quoted
quoted
By the way, is xb_set_page() safe? Sleeping in the kernel with preemption disabled is a bug, isn't it? __radix_tree_preload() returns 0 with preemption disabled upon success. xb_preload() disables preemption if __radix_tree_preload() fails. Then, kmalloc() is called with preemption disabled, isn't it? But xb_set_page() calls xb_preload(GFP_KERNEL) which might sleep with preemption disabled.Yes, I think that should not be expected, thanks. I plan to change it like this: bool xb_preload(gfp_t gfp) { if (!this_cpu_read(ida_bitmap)) { struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap), gfp); if (!bitmap) return false; bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap); kfree(bitmap); }Excuse me, but you are allocating per-CPU memory when running CPU might change at this line? What happens if running CPU has changed at this line? Will it work even with new CPU's ida_bitmap == NULL ?
Yes, it will be detected in xb_set_bit(): when ida_bitmap = NULL on the new CPU, xb_set_bit() will return -EAGAIN to the caller, and the caller should restart from xb_preload(). Best, Wei