Thread (4 messages) 4 messages, 4 authors, 2023-10-31

Re: [PATCH v2] mlx5: fix init stage error handling to avoid double free of same QP and UAF

From: Leon Romanovsky <leon@kernel.org>
Date: 2023-10-25 08:35:16
Also in: linux-rdma, lkml

On Wed, Oct 25, 2023 at 12:15:36AM +0000, Qing Huang wrote:
quoted
-----Original Message-----
From: George Kennedy <redacted>
Sent: Tuesday, October 24, 2023 11:02 AM
To: leon@kernel.org; jgg@ziepe.ca; sd@queasysnail.net; linux-
rdma@vger.kernel.org; linux-kernel@vger.kernel.org; netdev@vger.kernel.org
Cc: George Kennedy <redacted>; Tom Hromatka
[off-list ref]; Harshit Mogalapalli
[off-list ref]
Subject: [PATCH v2] mlx5: fix init stage error handling to avoid double free of
same QP and UAF

In the unlikely event that workqueue allocation fails and returns NULL in
mlx5_mkey_cache_init(), delete the call to
mlx5r_umr_resource_cleanup() (which frees the QP) in
mlx5_ib_stage_post_ib_reg_umr_init().  This will avoid attempted double free of
the same QP when __mlx5_ib_add() does its cleanup.

Hi George,

There seems no cleanup function defined for this stage:

        STAGE_CREATE(MLX5_IB_STAGE_POST_IB_REG_UMR,
                     mlx5_ib_stage_post_ib_reg_umr_init,
                     NULL),

Do you know where __mlx5_ib_add() does the double free call after the allocation failure?
It is done in MLX5_IB_STAGE_PRE_IB_REG_UMR. Unfortunately, we have
asymmetric init/release flow for UMRs.

Thanks
Regards,
Qing
quoted
Syzkaller reported a UAF in ib_destroy_qp_user

workqueue: Failed to create a rescuer kthread for wq "mkey_cache": -EINTR
infiniband mlx5_0: mlx5_mkey_cache_init:981:(pid 1642):
failed to create work queue
infiniband mlx5_0: mlx5_ib_stage_post_ib_reg_umr_init:4075:(pid 1642):
mr cache init failed -12
==================================================================
BUG: KASAN: slab-use-after-free in ib_destroy_qp_user
(drivers/infiniband/core/verbs.c:2073)
Read of size 8 at addr ffff88810da310a8 by task repro_upstream/1642

Call Trace:
<TASK>
kasan_report (mm/kasan/report.c:590)
ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2073)
mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198)
__mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4178)
mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
...
</TASK>

Allocated by task 1642:
__kmalloc (./include/linux/kasan.h:198 mm/slab_common.c:1026
mm/slab_common.c:1039)
create_qp (./include/linux/slab.h:603 ./include/linux/slab.h:720
./include/rdma/ib_verbs.h:2795 drivers/infiniband/core/verbs.c:1209)
ib_create_qp_kernel (drivers/infiniband/core/verbs.c:1347)
mlx5r_umr_resource_init (drivers/infiniband/hw/mlx5/umr.c:164)
mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4070)
__mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168)
mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
...

Freed by task 1642:
__kmem_cache_free (mm/slub.c:1826 mm/slub.c:3809 mm/slub.c:3822)
ib_destroy_qp_user (drivers/infiniband/core/verbs.c:2112)
mlx5r_umr_resource_cleanup (drivers/infiniband/hw/mlx5/umr.c:198)
mlx5_ib_stage_post_ib_reg_umr_init (drivers/infiniband/hw/mlx5/main.c:4076
drivers/infiniband/hw/mlx5/main.c:4065)
__mlx5_ib_add (drivers/infiniband/hw/mlx5/main.c:4168)
mlx5r_probe (drivers/infiniband/hw/mlx5/main.c:4402)
...

The buggy address belongs to the object at ffff88810da31000 which belongs to
the cache kmalloc-2k of size 2048 The buggy address is located 168 bytes inside
of freed 2048-byte region [ffff88810da31000, ffff88810da31800)

The buggy address belongs to the physical page:
page:000000003b5e469d refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x10da30
head:000000003b5e469d order:3 entire_mapcount:0 nr_pages_mapped:0
pincount:0
flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000840 ffff888100042f00 ffffea0004180800
dead000000000002
raw: 0000000000000000 0000000000080008 00000001ffffffff
0000000000000000 page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88810da30f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88810da31000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
quoted
ffff88810da31080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
			      ^
ffff88810da31100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88810da31180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint

Fixes: 04876c12c19e ("RDMA/mlx5: Move init and cleanup of UMR to umr.c")
Reported-by: syzkaller <redacted>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: George Kennedy <redacted>
---
v2: went with fix suggested by: Leon Romanovsky [off-list ref]

 drivers/infiniband/hw/mlx5/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c
b/drivers/infiniband/hw/mlx5/main.c
index 555629b7..5d963ab 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4071,10 +4071,8 @@ static int
mlx5_ib_stage_post_ib_reg_umr_init(struct mlx5_ib_dev *dev)
 		return ret;

 	ret = mlx5_mkey_cache_init(dev);
-	if (ret) {
+	if (ret)
 		mlx5_ib_warn(dev, "mr cache init failed %d\n", ret);
-		mlx5r_umr_resource_cleanup(dev);
-	}
 	return ret;
 }

--
1.8.3.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help