Re: [PATCH 07/14] KVM: arm64: Restrict host access to the ITS tables
From: Fuad Tabba <hidden>
Date: 2026-03-16 16:14:38
Also in:
kvmarm, lkml
Hi Sebastian, On Tue, 10 Mar 2026 at 12:49, Sebastian Ene [off-list ref] wrote:
Setup shadow structures for ITS indirect tables held in the GITS_BASER<n> registers. Make the last level of the Device Table and vPE Table inacessible to the host.
inacessible -> inaccessible
In a direct layout configuration, donate the table to the hypervisor since the software is not expected to program them directly.
This commit message is too brief and doesn't fully explain the problem, the impact, and the mechanism of the solution. It also appears to contradict the actual code changes. For example, could you elaborate why must the last level of indirect tables be inaccessible? Can you also please explain the mechanism? You are parsing GITS_BASER_INDIRECT to determine if a shadow Level 1 table must be shared with the host, while unconditionally donating the original physical tables. You also explicitly exclude Collection tables. The msg should briefly justify why Collection tables are safe to leave accessible to the host. There is also a contradiction in the message. You state "In a direct layout configuration, donate the table...". However, your code donates the original hardware table unconditionally on every iteration of the loop, regardless of whether GITS_BASER_INDIRECT is set. Please ensure the commit log accurately reflects the code implementation. Maybe you could say that the problem is Host DMA attacks via ITS table manipulation. Whereas the mechanism is to unconditionally donate hardware tables to EL2. For indirect Device/vPE tables, share a L1 shadow table with the host and strictly donate the L2 pages to prevent the host from writing malicious L2 pointers.
quoted hunk ↗ jump to hunk
Signed-off-by: Sebastian Ene <redacted> --- arch/arm64/kvm/hyp/nvhe/its_emulate.c | 143 ++++++++++++++++++++++++++ 1 file changed, 143 insertions(+)diff --git a/arch/arm64/kvm/hyp/nvhe/its_emulate.c b/arch/arm64/kvm/hyp/nvhe/its_emulate.c index 4a3ccc90a1a9..865a5d6353ed 100644 --- a/arch/arm64/kvm/hyp/nvhe/its_emulate.c +++ b/arch/arm64/kvm/hyp/nvhe/its_emulate.c@@ -141,6 +141,145 @@ static struct pkvm_protected_reg *get_region(phys_addr_t dev_addr) return NULL; } +static int pkvm_host_unmap_last_level(void *shadow, size_t num_pages, u32 psz) +{ + u64 *table = shadow; + int ret, i, end = (num_pages << PAGE_SHIFT) / sizeof(table); + phys_addr_t table_addr;
RCT, mixing initialized variables and uninitialized variables, plus variables of conceptually different "types" in the same declaration. Please use sizeof(*table): sizeof(table) evaluates to the size of the pointer (8 bytes), NOT the size of the array element. In this case, this happens to be the same, but it's still wrong. Maybe the following is clearer: + int end = num_pages * (PAGE_SIZE / sizeof(*table));
+
+ for (i = 0; i < end; i++) {
+ if (!(table[i] & GITS_BASER_VALID))
+ continue;
+
+ table_addr = table[i] & PHYS_MASK;
+ ret = __pkvm_host_donate_hyp(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);The ITS-configured page size and the host page size could be different, but the number of pages to donate for Level 2 tables is calculated based on psz (the ITS). If the ITS hardware is configured for 4KB pages, but the host kernel is using (e.g.,) 64KB pages, psz >> PAGE_SHIFT evaluates to 0. You need to account for mismatched page sizes, perhaps by using DIV_ROUND_UP(psz, PAGE_SIZE) (or something similar) to ensure the containing host page is donated.
+ if (ret)
+ goto err_donate;
+ }
+
+ return 0;
+err_donate:
+ for (i = i - 1; i >= 0; i--) {Please use the while (i--) idiom for rollback loops.
+ if (!(table[i] & GITS_BASER_VALID)) + continue; + + table_addr = table[i] & PHYS_MASK; + __pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);
Please wrap this in WARN_ON(...). If donating back to the host fails during a rollback, we have a fatal page leak that needs to be loudly flagged, similar to how you handle it in pkvm_unshare_shadow_table.
+ }
+ return ret;
+}
+
+static int pkvm_share_shadow_table(void *shadow, u64 nr_pages)
+{
+ u64 i, ret, start_pfn = hyp_virt_to_pfn(shadow);Same comment as before with RCT and the mixing of declarations.
+
+ for (i = 0; i < nr_pages; i++) {
+ ret = __pkvm_host_share_hyp(start_pfn + i);
+ if (ret)
+ goto unshare;
+ }
+
+ ret = hyp_pin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
+ if (ret)
+ goto unshare;
+
+ return ret;
+unshare:Please use the while (i--) idiom for rollback loops. Also, please use consistent naming conventions for the labels. Here you call it unshare, and earlier it was err_donate.
+ for (i = i - 1; i >= 0; i--)
+ __pkvm_host_unshare_hyp(start_pfn + i);
+ return ret;
+}
+
+static void pkvm_unshare_shadow_table(void *shadow, u64 nr_pages)
+{
+ u64 i, start_pfn = hyp_virt_to_pfn(shadow);
+
+ hyp_unpin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
+
+ for (i = 0; i < nr_pages; i++)
+ WARN_ON(__pkvm_host_unshare_hyp(start_pfn + i));
+}
+
+static void pkvm_host_map_last_level(void *shadow, size_t num_pages, u32 psz)
+{
+ u64 *table;RCT, and you forgot to initialize table: + u64 *table = shadow;
+ int i, end = (num_pages << PAGE_SHIFT) / sizeof(table);
Same sizeof(table) pointer-size bug as above.
+ phys_addr_t table_addr;
+
+ for (i = 0; i < end; i++) {
+ if (!(table[i] & GITS_BASER_VALID))
+ continue;
+
+ table_addr = table[i] & ~GITS_BASER_VALID;Inconsistent masking logic, since in pkvm_host_unmap_last_level you correctly used PHYS_MASK to extract the address, but here in the rollback path you use ~GITS_BASER_VALID. While both currently work because the upper bits and lower bits (below the page size) are defined as RES0 in the GIC spec, ~GITS_BASER_VALID is architecturally fragile. If a future hardware revision repurposes the upper RES0 bits [62:52] for new attributes (e.g., memory encryption flags), ~GITS_BASER_VALID will leak those attribute bits into the physical address calculation. Since PHYS_MASK correctly handles the address extraction across all page sizes (relying on the lower bits being RES0) and safely masks off future upper attribute bits, please standardize on using table_addr = table[i] & PHYS_MASK; for both functions.
+ WARN_ON(__pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT));
+ }
+}
+
+static int pkvm_setup_its_shadow_baser(struct its_shadow_tables *shadow)
+{
+ int i, ret;
+ u64 baser_val, num_pages, type;
+ void *base, *host_base;
+
+ for (i = 0; i < GITS_BASER_NR_REGS; i++) {
+ baser_val = shadow->tables[i].val;
+ if (!(baser_val & GITS_BASER_VALID))
+ continue;
+
+ base = kern_hyp_va(shadow->tables[i].base);
+ num_pages = (1 << shadow->tables[i].order);
+
+ ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn(base), num_pages);
+ if (ret)
+ goto err_donate;
+
+ if (baser_val & GITS_BASER_INDIRECT) {
+ host_base = kern_hyp_va(shadow->tables[i].shadow);
+ ret = pkvm_share_shadow_table(host_base, num_pages);
+ if (ret)
+ goto err_with_donation;
+
+ type = GITS_BASER_TYPE(baser_val);
+ if (type == GITS_BASER_TYPE_COLLECTION)
+ continue;
+
+ ret = pkvm_host_unmap_last_level(base, num_pages,
+ shadow->tables[i].psz);
+ if (ret)
+ goto err_with_share;
+ }
+ }
+
+ return 0;
+err_with_share:
+ pkvm_unshare_shadow_table(host_base, num_pages);
+err_with_donation:
+ __pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages);
+err_donate:
+ for (i = i - 1; i >= 0; i--) {Please use the while (i--) idiom for rollback loops.
+ baser_val = shadow->tables[i].val; + if (!(baser_val & GITS_BASER_VALID)) + continue; + + base = kern_hyp_va(shadow->tables[i].base); + num_pages = (1 << shadow->tables[i].order); + + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));
The sequence of rollback operations here creates a TOCTOU vulnerability.
- First, you donate base (the Level 1 indirect table) back to the host.
- Then, you pass base into pkvm_host_map_last_level().
- Finally, pkvm_host_map_last_level() reads table[i] out of base to
determine which Level 2 pages to donate back to the host.
Because the host regains ownership of base _first_, it can be running
concurrently on another CPU. A malicious host can overwrite the Level
1 table with pointers to arbitrary hypervisor-owned memory. The
hypervisor will then read those malicious pointers and dutifully grant
the host access to its own secure memory.
The order of operations needs to be reversed: you must read base to
roll back the L2 pages, unshare the shadow table, and *only then*
donate base back to the host.
Also, num_pages = (1 << shadow->tables[i].order); calculates a 32-bit
signed integer because the literal 1 is a signed 32-bit int. If order
is 31, this evaluates to a negative number. If order is 32 or higher,
this is undefined behavior. Because num_pages is declared as a u64,
you should use the standard kernel macro BIT_ULL().
Here's my suggested fix (not tested). Reorder the operations to safely
rollback L2 before donating L1, use the standard `while (i--)` loop,
and fix the page calculation:
+ while (i--) {
+ baser_val = shadow->tables[i].val;
+ if (!(baser_val & GITS_BASER_VALID))
+ continue;
+
+ base = kern_hyp_va(shadow->tables[i].base);
+ num_pages = BIT_ULL(shadow->tables[i].order);
+
+ if (baser_val & GITS_BASER_INDIRECT) {
+ host_base = kern_hyp_va(shadow->tables[i].shadow);
+
+ type = GITS_BASER_TYPE(baser_val);
+ if (type != GITS_BASER_TYPE_COLLECTION)
+ pkvm_host_map_last_level(base, num_pages,
+ shadow->tables[i].psz);
+
+ pkvm_unshare_shadow_table(host_base, num_pages);
+ }
+
+ WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));
+ }
+ if (baser_val & GITS_BASER_INDIRECT) {
+ host_base = kern_hyp_va(shadow->tables[i].shadow);
+ pkvm_unshare_shadow_table(host_base, num_pages);
+
+ type = GITS_BASER_TYPE(baser_val);
+ if (type == GITS_BASER_TYPE_COLLECTION)
+ continue;
+
+ pkvm_host_map_last_level(base, num_pages, shadow->tables[i].psz);
+ }
+ }You have duplicated the entire table decoding logic (calculating base, num_pages, checking INDIRECT...) down here in the rollback path. Consider abstracting "setup one table" and "teardown one table" into helper functions to make pkvm_setup_its_shadow_baser more readable and less prone to copy-pasta errors. Cheers, /fuad
quoted hunk ↗ jump to hunk
+ + return ret; +} + static int pkvm_setup_its_shadow_cmdq(struct its_shadow_tables *shadow) { int ret, i, num_pages;@@ -205,6 +344,10 @@ int pkvm_init_gic_its_emulation(phys_addr_t dev_addr, void *host_priv_state, if (ret) goto err_with_shadow; + ret = pkvm_setup_its_shadow_baser(shadow); + if (ret) + goto err_with_shadow; + its_reg->priv = priv_state; hyp_spin_lock_init(&priv_state->its_lock); --2.53.0.473.g4a7958ca14-goog