Thread (4 messages) 4 messages, 2 authors, 2020-02-29

Re: [PATCH V2] block: rename 'q->debugfs_dir' and 'q->blk_trace->dir' in blk_unregister_queue()

From: Ming Lei <hidden>
Date: 2020-02-28 23:16:25
Also in: lkml

On Thu, Feb 13, 2020 at 04:12:52PM +0800, yu kuai wrote:
quoted hunk ↗ jump to hunk
syzbot is reporting use after free bug in debugfs_remove[1].

This is because in request_queue, 'q->debugfs_dir' and
'q->blk_trace->dir' could be the same dir. And in __blk_release_queue(),
blk_mq_debugfs_unregister() will remove everything inside the dir.

With futher investigation of the reporduce repro, the problem can be
reporduced by following procedure:

1. LOOP_CTL_ADD, create a request_queue q1, blk_mq_debugfs_register() will
create the dir.
2. LOOP_CTL_REMOVE, blk_release_queue() will add q1 to release queue.
3. LOOP_CTL_ADD, create another request_queue q2,blk_mq_debugfs_register()
will fail because the dir aready exist.
4. BLKTRACESETUP, create two files(msg and dropped) inside the dir.
5. call __blk_release_queue() for q1, debugfs_remove_recursive() will
delete the files created in step 4.
6. LOOP_CTL_REMOVE, blk_release_queue() will add q2 to release queue.
And when __blk_release_queue() is called for q2, blk_trace_shutdown() will
try to release the two files created in step 4, wich are aready released
in step 5.

thread1		         |kworker                   |thread2
 ----------------------- | ------------------------ | --------------------
loop_control_ioctl       |                          |
 loop_add                |                          |
  blk_mq_debugfs_register|                          |
   debugfs_create_dir    |                          |
loop_control_ioctl       |                          |
 loop_remove		 |                          |
  blk_release_queue      |                          |
   schedule_work         |                          |
                         |                          |loop_control_ioctl
                         |                          | loop_add
                         |                          |  ...
                         |                          |blk_trace_ioctl
                         |                          | __blk_trace_setup
                         |                          |   debugfs_create_file
                         |__blk_release_queue       |
                         | blk_mq_debugfs_unregister|
                         |  debugfs_remove_recursive|
                         |                          |loop_control_ioctl
                         |                          | loop_remove
                         |                          |  ...
                         |__blk_release_queue       |
                         | blk_trace_shutdown       |
                         |  debugfs_remove          |

commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression") pushed the
final release of request_queue to a workqueue, so, when loop_add() is
called again(step 3), __blk_release_queue() might not been called yet,
which causes the problem.

Fix the problem by renaming 'q->debugfs_dir' or 'q->blk_trace->dir'
in blk_unregister_queue() if they exist.

[1] https://syzkaller.appspot.com/bug?extid=903b72a010ad6b7a40f2
References: CVE-2019-19770
Fixes: commit dc9edc44de6c ("block: Fix a blk_exit_rl() regression")
Reported-by: syzbot <redacted>
Signed-off-by: yu kuai <redacted>
---
Changes in V2:
 add device name to the new dir name
 fix compile err when 'CONFIG_BLK_DEBUG_FS' is not enabled
 add treatment of 'q->blk_trace->dir'

 block/blk-sysfs.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index fca9b158f4a0..1da8d4a4915a 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -11,6 +11,7 @@
 #include <linux/blktrace_api.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-cgroup.h>
+#include <linux/debugfs.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -1011,6 +1012,46 @@ int blk_register_queue(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(blk_register_queue);
 
+#ifdef CONFIG_DEBUG_FS
+/*
+ * blk_prepare_release_queue - rename q->debugfs_dir and q->blk_trace->dir
+ * @q: request_queue of which the dir to be renamed belong to.
+ *
+ * Because the final release of request_queue is in a workqueue, the
+ * cleanup might not been finished yet while the same device start to
+ * create. It's not correct if q->debugfs_dir still exist while trying
+ * to create a new one.
+ */
+static void blk_prepare_release_queue(struct request_queue *q)
+{
+	struct dentry *new = NULL;
+	struct dentry **old = NULL;
+	char name[DNAME_INLINE_LEN];
+	int i = 0;
+
+#ifdef CONFIG_BLK_DEBUG_FS
+	if (!IS_ERR_OR_NULL(q->debugfs_dir))
+		old = &q->debugfs_dir;
+#endif
+#ifdef CONFIG_BLK_DEV_IO_TRACE
+	/* q->debugfs_dir and q->blk_trace->dir can't both exist */
+	if (q->blk_trace && !IS_ERR_OR_NULL(q->blk_trace->dir))
+		old = &q->blk_trace->dir;
+#endif
If blk_trace->dir isn't same with .debugfs_dir, you will just rename
blk_trace->dir, this way can't avoid the failure in step3, can it?

I understand that we just need to rename .debugfs_dir, meantime making
sure blktrace code removes correct debugfs dir, is that enough for fixing
this issue?
+	if (old == NULL)
+		return;
+	while (new == NULL) {
+		sprintf(name, "ready_to_remove_%s_%d",
+				kobject_name(q->kobj.parent), i++);
+		new = debugfs_rename(blk_debugfs_root, *old,
+				     blk_debugfs_root, name);
+	}
The above code can be run concurrently with blktrace shutdown, so race might
exit between the two code paths, then bt->dir may has been renamed or being
renamed in debugfs_remove(bt->dir), can this function work as expected or
correct?

And there is dead loop risk, so suggest to not rename this way.


Thanks, 
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help