Thread (16 messages) 16 messages, 6 authors, 2017-01-16

Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

From: "zhangyi (F)" <yi.zhang@huawei.com>
Date: 2017-01-16 03:25:10
Also in: linux-fsdevel, lkml

on 2017/1/11 23:34, Theodore Ts'o wrote:
On Wed, Jan 11, 2017 at 05:07:29PM +0800, zhangyi (F) wrote:
quoted
(1) The file we want to unlink have many hard links, but only one dcache entry in memory.
(2) open this file, but it's inode->i_nlink read from disk was 1 (too low).
(3) some one call rename and drop it's i_nlink to zero.
(4) it's inode is still in use and do not destroy (not closed), at the same time,
    some others open it's hard link and create a dcache entry.
(5) call rename again and it's i_nlink will still underflow and cause memory corruption.
Do you have reproducers that make it easy to reproduce situations like
this?  (It shouldn't be hard to write, but if you have them already
will save me some effort.  :-)
I make a reproducer, we can do the following steps to reproduce this probrem easily:
1) mount a ext4 file system, and create 3 files and 1 hard link,

    #mount /dev/sdax /mnt
    #cd /mnt
    #touch old_file1 old_file2 new_file
    #ln new_file new_link1

2) umount the file system and use the debugfs to change new_file's
   links_count value to 1, which is used to simulate the fs inconsistency,

   #umount /mnt
   #debugfs /dev/sdax -w
	set_inode_field new_file links_count 1

3) mount the fs again, and then execute the following program (Note:
   do not execute the ls cmd, it will create the second dcache entry),

   #define RENAME_OLD_FILE_1  "old_file1"
   #define RENAME_OLD_FILE_2  "old_file2"
   #define RENAME_NEW_FILE    "new_file"
   #define NEW_FILE_LINK_1    "new_link1"

   int main(int argc, char *argv[])
   {
        int fd = 0;
        int err = 0;

        fd = open(RENAME_NEW_FILE, O_RDONLY);
        if (fd < 0) {
                printf("open error:%d\n", errno);
                return -1;
        }

        err = rename(RENAME_OLD_FILE_1, RENAME_NEW_FILE);
        if (err < 0) {
                printf("rename error:%d\n", errno);
                close(fd);
                return -1;
        }

        err = rename(RENAME_OLD_FILE_2, NEW_FILE_LINK_1);
        if (err < 0) {
                printf("rename error:%d\n", errno);
                close(fd);
                return -1;
        }

        close(fd);
        return 0;
   }

4) after this, the new_file's inode->i_nlink is underflowed and add to orphan list,
   kernel dump like this:

    ------------[ cut here ]------------
   WARNING: CPU: 0 PID: 1814 at fs/inode.c:282 drop_nlink+0x3e/0x50
   ...
   Call Trace:
   dump_stack+0x63/0x86
   __warn+0xcb/0xf0
   warn_slowpath_null+0x1d/0x20
   drop_nlink+0x3e/0x50
   ext4_rename+0x532/0x8c0
   ext4_rename2+0x1d/0x30
   vfs_rename+0x728/0x940
    ? __lookup_hash+0x20/0xa0
    SyS_rename+0x3ba/0x3e0
    entry_SYSCALL_64_fastpath+0x1a/0xa9
   ...
    ---[ end trace b157dacbc891e6e8 ]---

5) then, we trigger mem shrink, this inode will be destroyed but it is still
   on the orphan list,

   #echo 3 > /proc/sys/vm/drop_caches

   kernrl dump:

   EXT4-fs (sdb1): Inode 16 (ffff98f4b3285c20): orphan list check failed!
   ...
   ffff98f4b3285d30: fa87e800 ffff98f4 b3285e80 ffff98f4  .........^(.....
   ffff98f4b3285d40: b20829d8 ffff98f4 00000010 00000000  .)..............
   ffff98f4b3285d50: ffffffff 00000000 00000000 00000000  ................
   ...
   Call Trace:
    dump_stack+0x63/0x86
    ext4_destroy_inode+0xa0/0xb0
    destroy_inode+0x3b/0x60
    evict+0x130/0x1c0
    dispose_list+0x4d/0x70
    prune_icache_sb+0x5a/0x80
    super_cache_scan+0x14b/0x1a0
    shrink_slab.part.40+0x1f5/0x420
    shrink_slab+0x29/0x30
    drop_slab_node+0x31/0x60
    drop_slab+0x3f/0x70
    drop_caches_sysctl_handler+0x71/0xc0
    proc_sys_call_handler+0xea/0x110
    proc_sys_write+0x14/0x20
    __vfs_write+0x37/0x160
    ? selinux_file_permission+0xd7/0x110
    ? security_file_permission+0x3b/0xc0
    vfs_write+0xb5/0x1a0
    SyS_write+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x1a/0xa9
   ...
   bash (1594): drop_caches: 3

6) Some time later, if we change the orphan list, it will cause memory corruption.

Thanks.

zhangyi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help