Thread (34 messages) 34 messages, 5 authors, 2021-05-19

Re: [PATCH v2] s390/vfio-ap: fix memory leak in mdev remove callback

From: Halil Pasic <pasic@linux.ibm.com>
Date: 2021-05-19 12:59:53
Also in: linux-s390, lkml

On Wed, 19 May 2021 13:22:56 +0200
Christian Borntraeger [off-list ref] wrote:
On 19.05.21 10:17, Christian Borntraeger wrote:
quoted

On 19.05.21 01:27, Halil Pasic wrote:  
quoted
On Tue, 18 May 2021 19:01:42 +0200
Christian Borntraeger [off-list ref] wrote:
 
quoted
On 18.05.21 17:33, Halil Pasic wrote:  
quoted
On Tue, 18 May 2021 15:59:36 +0200
Christian Borntraeger [off-list ref] wrote:  
[..]  
quoted
quoted
quoted
quoted
quoted
Would it help, if the code in priv.c would read the hook once
and then only work on the copy? We could protect that with rcu
and do a synchronize rcu in vfio_ap_mdev_unset_kvm after
unsetting the pointer?  
Unfortunately just "the hook" is ambiguous in this context. We
have kvm->arch.crypto.pqap_hook that is supposed to point to
a struct kvm_s390_module_hook member of struct ap_matrix_mdev
which is also called pqap_hook. And struct kvm_s390_module_hook
has function pointer member named "hook".  
I was referring to the full struct.  
quoted
quoted
quoted
I'll look into this.  
I think it could work. in priv.c use rcu_readlock, save the
pointer, do the check and call, call rcu_read_unlock.
In vfio_ap use rcu_assign_pointer to set the pointer and
after setting it to zero call sychronize_rcu.  
In my opinion, we should make the accesses to the
kvm->arch.crypto.pqap_hook pointer properly synchronized. I'm
not sure if that is what you are proposing. How do we usually
do synchronisation on the stuff that lives in kvm->arch?  
RCU is a method of synchronization. We  make sure that structure
pqap_hook is still valid as long as we are inside the rcu read
lock. So the idea is: clear pointer, wait until all old readers
have finished and the proceed with getting rid of the structure.  
Yes I know that RCU is a method of synchronization, but I'm not
very familiar with it. I'm a little confused by "read the hook
once and then work on a copy". I guess, I would have to read up
on the RCU again to get clarity. I intend to brush up my RCU knowledge
once the patch comes along. I would be glad to have your help when
reviewing an RCU based solution for this.  
Just had a quick look. Its not trivial, as the hook function itself
takes a mutex and an rcu section must not sleep. Will have a deeper
look.  

As a quick hack something like this could work. The whole locking is pretty
complicated and this makes it even more complex so we might want to do
a cleanup/locking rework later on.
Hm, seems our emails crossed mid air...
quoted hunk ↗ jump to hunk
index 9928f785c677..fde6e02aab54 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -609,6 +609,7 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
   */
  static int handle_pqap(struct kvm_vcpu *vcpu)
  {
+       struct kvm_s390_module_hook *pqap_hook;
         struct ap_queue_status status = {};
         unsigned long reg0;
         int ret;
@@ -657,14 +658,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
          * Verify that the hook callback is registered, lock the owner
          * and call the hook.
          */
-       if (vcpu->kvm->arch.crypto.pqap_hook) {
-               if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
+       rcu_read_lock();
+       pqap_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
+       if (pqap_hook) {
+               if (!try_module_get(pqap_hook->owner)) {
+                       rcu_read_unlock();
                         return -EOPNOTSUPP;
-               ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
-               module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
+               }
Up to this point the local pqap_hook is guaranteed to point to a valid
object if not NULL, ...
+               rcu_read_unlock();
... and after this point IMHO it is not.
+               ret = pqap_hook->hook(vcpu);
So IMHO the pointer deference here is still problematic, but that can
be fixed easily as I described in that email I've sent 3 minutes after
yours. IMHO we need a local copy of cpu->kvm->arch.crypto.pqap_hook->hook
taken within the rcu read critical section. Do you agree?

Regards,
Halil
quoted hunk ↗ jump to hunk
+               module_put(pqap_hook->owner);
                 if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
                         kvm_s390_set_psw_cc(vcpu, 3);
                 return ret;
+       } else {
+               rcu_read_unlock();
         }
         /*
          * A vfio_driver must register a hook.
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index f90c9103dac2..a7124abd6aed 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1194,6 +1194,7 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
                 mutex_lock(&matrix_dev->lock);
                 vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
                 matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+               synchronize_rcu();
                 kvm_put_kvm(matrix_mdev->kvm);
                 matrix_mdev->kvm = NULL;
                 matrix_mdev->kvm_busy = false;
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help