Thread (72 messages) 72 messages, 4 authors, 2017-05-06
DORMANTno replies

[PATCH v6 21/24] KVM: arm64: vgic-its: Device table save/restore

From: eric.auger@redhat.com (Auger Eric)
Date: 2017-05-06 10:21:08
Also in: kvm, kvmarm

Hi Christoffer,

On 05/05/2017 20:12, Christoffer Dall wrote:
On Fri, May 05, 2017 at 06:23:22PM +0200, Auger Eric wrote:
quoted
Hi Christoffer,

On 05/05/2017 14:44, Christoffer Dall wrote:
quoted
On Thu, May 04, 2017 at 01:44:41PM +0200, Eric Auger wrote:
quoted
This patch saves the device table entries into guest RAM.
Both flat table and 2 stage tables are supported. DeviceId
indexing is used.

For each device listed in the device table, we also save
the translation table using the vgic_its_save/restore_itt
routines. Those functions will be implemented in a subsequent
patch.

On restore, devices are re-allocated and their itt are
re-built.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v5 -> v6:
- accomodate vgic_its_alloc_device change of proto
- define bit fields for L1 entries
- s/handle_l1_entry/handle_l1_dte
- s/ite_esz/dte_esz in handle_l1_dte
- check BASER valid bit
- s/nb_eventid_bits/num_eventid_bits
- new convention for returned values
- itt functions implemented in subsequent patch

v4 -> v5:
- sort the device list by deviceid on device table save
- use defines for shifts and masks
- use abi->dte_esz
- clatify entry sizes for L1 and L2 tables

v3 -> v4:
- use the new proto for its_alloc_device
- compute_next_devid_offset, vgic_its_flush/restore_itt
  become static in this patch
- change in the DTE entry format with the introduction of the
  valid bit and next field width decrease; ittaddr encoded
  on its full range
- fix handle_l1_entry entry handling
- correct vgic_its_table_restore error handling

v2 -> v3:
- fix itt_addr bitmask in vgic_its_restore_dte
- addition of return 0 in vgic_its_restore_ite moved to
  the ITE related patch

v1 -> v2:
- use 8 byte format for DTE and ITE
- support 2 stage format
- remove kvm parameter
- ITT flush/restore moved in a separate patch
- use deviceid indexing
---
 virt/kvm/arm/vgic/vgic-its.c | 194 +++++++++++++++++++++++++++++++++++++++++--
 virt/kvm/arm/vgic/vgic.h     |  10 +++
 2 files changed, 199 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index a3ed52a..c5b388d 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -23,6 +23,7 @@
 #include <linux/interrupt.h>
 #include <linux/list.h>
 #include <linux/uaccess.h>
+#include <linux/list_sort.h>
 
 #include <linux/irqchip/arm-gic-v3.h>
 
@@ -1701,7 +1702,8 @@ int vgic_its_attr_regs_access(struct kvm_device *dev,
 	return ret;
 }
 
-u32 compute_next_devid_offset(struct list_head *h, struct its_device *dev)
+static u32 compute_next_devid_offset(struct list_head *h,
+				     struct its_device *dev)
 {
 	struct its_device *next;
 	u32 next_offset;
@@ -1755,8 +1757,8 @@ typedef int (*entry_fn_t)(struct vgic_its *its, u32 id, void *entry,
  * Return: < 0 on error, 0 if last element was identified, 1 otherwise
  * (the last element may not be found on second level tables)
  */
-int scan_its_table(struct vgic_its *its, gpa_t base, int size, int esz,
-		   int start_id, entry_fn_t fn, void *opaque)
+static int scan_its_table(struct vgic_its *its, gpa_t base, int size, int esz,
+			  int start_id, entry_fn_t fn, void *opaque)
 {
 	void *entry = kzalloc(esz, GFP_KERNEL);
 	struct kvm *kvm = its->dev->kvm;
@@ -1791,13 +1793,171 @@ int scan_its_table(struct vgic_its *its, gpa_t base, int size, int esz,
 	return ret;
 }
 
+static int vgic_its_save_itt(struct vgic_its *its, struct its_device *device)
+{
+	return -ENXIO;
+}
+
+static int vgic_its_restore_itt(struct vgic_its *its, struct its_device *dev)
+{
+	return -ENXIO;
+}
+
+/**
+ * vgic_its_save_dte - Save a device table entry at a given GPA
+ *
+ * @its: ITS handle
+ * @dev: ITS device
+ * @ptr: GPA
+ */
+static int vgic_its_save_dte(struct vgic_its *its, struct its_device *dev,
+			     gpa_t ptr, int dte_esz)
+{
+	struct kvm *kvm = its->dev->kvm;
+	u64 val, itt_addr_field;
+	u32 next_offset;
+
+	itt_addr_field = dev->itt_addr >> 8;
+	next_offset = compute_next_devid_offset(&its->device_list, dev);
+	val = (1ULL << KVM_ITS_DTE_VALID_SHIFT |
+	       ((u64)next_offset << KVM_ITS_DTE_NEXT_SHIFT) |
+	       (itt_addr_field << KVM_ITS_DTE_ITTADDR_SHIFT) |
+		(dev->num_eventid_bits - 1));
+	val = cpu_to_le64(val);
+	return kvm_write_guest(kvm, ptr, &val, dte_esz);
+}protection
+
+/**
+ * vgic_its_restore_dte - restore a device table entry
+ *
+ * @its: its handle
+ * @id: device id the DTE corresponds to
+ * @ptr: kernel VA where the 8 byte DTE is located
+ * @opaque: unused
+ *
+ * Return: < 0 on error, 0 if the dte is the last one, id offset to the
+ * next dte otherwise
+ */
+static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
+				void *ptr, void *opaque)
+{
+	struct its_device *dev;
+	gpa_t itt_addr;
+	u8 num_eventid_bits;
+	u64 entry = *(u64 *)ptr;
+	bool valid;
+	u32 offset;
+	int ret;
+
+	entry = le64_to_cpu(entry);
+
+	valid = entry >> KVM_ITS_DTE_VALID_SHIFT;
+	num_eventid_bits = (entry & KVM_ITS_DTE_SIZE_MASK) + 1;
+	itt_addr = ((entry & KVM_ITS_DTE_ITTADDR_MASK)
+			>> KVM_ITS_DTE_ITTADDR_SHIFT) << 8;
+
+	if (!valid)
+		return 1;
+
+	/* dte entry is valid */
+	offset = (entry & KVM_ITS_DTE_NEXT_MASK) >> KVM_ITS_DTE_NEXT_SHIFT;
+
+	dev = vgic_its_alloc_device(its, id, itt_addr, num_eventid_bits);
+	if (IS_ERR(dev))
+		return PTR_ERR(dev);
+
+	ret = vgic_its_restore_itt(its, dev);
+	if (ret)
+		return ret;
+
+	return offset;
+}
+
+static int vgic_its_device_cmp(void *priv, struct list_head *a,
+			       struct list_head *b)
+{
+	struct its_device *deva = container_of(a, struct its_device, dev_list);
+	struct its_device *devb = container_of(b, struct its_device, dev_list);
+
+	if (deva->device_id < devb->device_id)
+		return -1;
+	else
+		return 1;
+}
+
 /**
  * vgic_its_save_device_tables - Save the device table and all ITT
  * into guest RAM
+ *
+ * L1/L2 handling is hidden by vgic_its_check_id() helper which directly
+ * returns the GPA of the device entry
  */
 static int vgic_its_save_device_tables(struct vgic_its *its)
 {
-	return -ENXIO;
+	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+	struct its_device *dev;
+	int dte_esz = abi->dte_esz;
+	u64 baser;
+
+	baser = its->baser_device_table;
+
+	list_sort(NULL, &its->device_list, vgic_its_device_cmp);
this list is protected by the ITS mutex but you seem to be only holding
the KVM mutex here, so don't we have a potential exploit here?
Updates to the device, ite list are done when running commands. As we
hold the KVM mutex, commands cannot run. Then there is
vgic_its_destroy() which happens on kvm_put_kvm when all users have
released their reference. So to me holding the kvm lock looks sufficient.
But we don't hold the KVM mutex when running commands, we run the its
mutex?  What am I missing?
Yes we do. The kvm lock is taken in vgic_its_attr_regs_access. Commands
are processed on vgic_mmio_write_its_cwriter and vgic_mmio_write_its_ctlr
Even worse, the vgic_its_trigger_msi also only takes the its->its_lock
mutex (or rather its caller does) and that surely can run while we are
saving the tables can it not?
Hum yes this can theoretically happen in a non qemu use case. Otherwise
the VM being stopped at that time, injecting a new MSI at that point
looks as invalid. Looks safer I take the its lock too then.

Thanks for spotting this!

Eric
Thanks,
-Christoffer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help