Re: WARNING in __mmdrop
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2019-07-23 10:28:01
Also in:
linux-mm, lkml
On Tue, Jul 23, 2019 at 04:42:19PM +0800, Jason Wang wrote:
On 2019/7/23 下午3:56, Michael S. Tsirkin wrote:quoted
On Tue, Jul 23, 2019 at 01:48:52PM +0800, Jason Wang wrote:quoted
On 2019/7/23 下午1:02, Michael S. Tsirkin wrote:quoted
On Tue, Jul 23, 2019 at 11:55:28AM +0800, Jason Wang wrote:quoted
On 2019/7/22 下午4:02, Michael S. Tsirkin wrote:quoted
On Mon, Jul 22, 2019 at 01:21:59PM +0800, Jason Wang wrote:quoted
On 2019/7/21 下午6:02, Michael S. Tsirkin wrote:quoted
On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:quoted
syzbot has bisected this bug to: commit 7f466032dc9e5a61217f22ea34b2df932786bbfc Author: Jason Wang [off-list ref] Date: Fri May 24 08:12:18 2019 +0000 vhost: access vq metadata through kernel virtual address bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=149a8a20600000 start commit: 6d21a41b Add linux-next specific files for 20190718 git tree: linux-next final crash: https://syzkaller.appspot.com/x/report.txt?x=169a8a20600000 console output: https://syzkaller.appspot.com/x/log.txt?x=129a8a20600000 kernel config: https://syzkaller.appspot.com/x/.config?x=3430a151e1452331 dashboard link: https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10139e68600000 Reported-by: syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") For information about bisection process see: https://goo.gl/tpsmEJ#bisectionOK I poked at this for a bit, I see several things that we need to fix, though I'm not yet sure it's the reason for the failures: 1. mmu_notifier_register shouldn't be called from vhost_vring_set_num_addr That's just a bad hack,This is used to avoid holding lock when checking whether the addresses are overlapped. Otherwise we need to take spinlock for each invalidation request even if it was the va range that is not interested for us. This will be very slow e.g during guest boot.KVM seems to do exactly that. I tried and guest does not seem to boot any slower. Do you observe any slowdown?Yes I do.quoted
Now I took a hard look at the uaddr hackery it really makes me nervious. So I think for this release we want something safe, and optimizations on top. As an alternative revert the optimization and try again for next merge window.Will post a series of fixes, let me know if you're ok with that. ThanksI'd prefer you to take a hard look at the patch I posted which makes code cleaner,I did. But it looks to me a series that is only about 60 lines of code can fix all the issues we found without reverting the uaddr optimization.Another thing I like about the patch I posted is that it removes 60 lines of code, instead of adding more :) Mostly because of unifying everything into a single cleanup function and using kfree_rcu.Yes.quoted
So how about this: do exactly what you propose but as a 2 patch series: start with the slow safe patch, and add then return uaddr optimizations on top. We can then more easily reason about whether they are safe.If you stick, I can do this.
Given I realized my patch is buggy in that it does not wait for outstanding maps, I don't insist.
quoted
Basically you are saying this: - notifiers are only needed to invalidate maps - we make sure any uaddr change invalidates maps anyway - thus it's ok not to have notifiers since we do not have maps All this looks ok but the question is why do we bother unregistering them. And the answer seems to be that this is so we can start with a balanced counter: otherwise we can be between _start and _end calls.Yes, since there could be multiple co-current invalidation requests. We need count them to make sure we don't pin wrong pages.quoted
I also wonder about ordering. kvm has this: /* * Used to check for invalidations in progress, of the pfn that is * returned by pfn_to_pfn_prot below. */ mmu_seq = kvm->mmu_notifier_seq; /* * Ensure the read of mmu_notifier_seq isn't reordered with PTE reads in * gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't * risk the page we get a reference to getting unmapped before we have a * chance to grab the mmu_lock without mmu_notifier_retry() noticing. * * This smp_rmb() pairs with the effective smp_wmb() of the combination * of the pte_unmap_unlock() after the PTE is zapped, and the * spin_lock() in kvm_mmu_notifier_invalidate_<page|range_end>() before * mmu_notifier_seq is incremented. */ smp_rmb(); does this apply to us? Can't we use a seqlock instead so we do not need to worry?I'm not familiar with kvm MMU internals, but we do everything under of mmu_lock. Thanks
I don't think this helps at all. There's no lock between checking the invalidate counter and get user pages fast within vhost_map_prefetch. So it's possible that get user pages fast reads PTEs speculatively before invalidate is read. -- MST _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel