Re: [RFC][PATH 4/4] ovl: relax lock_rename when moving files between work and upper dir
From: Amir Goldstein <amir73il@gmail.com>
Date: 2017-01-12 05:42:23
Also in:
linux-fsdevel
On Fri, Nov 11, 2016 at 7:44 PM, Miklos Szeredi [off-list ref] wrote:
On Fri, Nov 11, 2016 at 6:27 PM, Al Viro [off-list ref] wrote:quoted
On Fri, Nov 11, 2016 at 06:17:07PM +0200, Amir Goldstein wrote:quoted
I'm afraid so. It seems to me that most of the time, lock_rename() is being used in overlayfs for no better alternative to lock 2 directories (work+upper). My suggestion is a small modification to the overlayfs locking scheme. ---Instead of this: assert(lock_rename(workdir, upperdir) != NULL)); copy_up(src, tmp); vfs_rename(tmp, dst); unlock_rename(workdir, upperdir); +++Use this: assert(lock_rename(workdir, upperdir) != NULL)); mutex_unlock(s_vfs_rename_mutex); copy_up(src, tmp); inode_unlock(upperdir); inode_unlock(workdir); assert(lock_rename(workdir, upperdir) != NULL)); vfs_rename(tmp, dst); unlock_rename(workdir, upperdir); Miklos, Do you see any problem with the proposed scheme? Anything that can go wrong while releasing the workdir lock before vfs_rename()?Huh??? ->rename() definitely counts upon parents being locked; please, read the damn Documentation/filesystems/locking, it's there for a reason. The real question is why the fsck do you need to lock the workdir for the duration of copying at all. O_TMPFILE open + writes there doesn't need lock_rename() *or* parents being locked. You need the parent locked when you link the sucker in place, but that's it. IDGI...There's that. The other thing that lock_rename does is prevent multiple copy ups on the same file. Arguably it's an overkill, but a replacement needs to be added.
Miklos, FYI, I started to work on copy up using O_TMPFILE with i_mutex held only for upper dir.
There's that. The other thing that lock_rename does is prevent multiple copy ups on the same file. Arguably it's an overkill, but a replacement needs to be added.
I think you suggested here to use a waitqueue for pending copyups? Did you mean a waitqueue per overlay parent directory inode or did you mean something else?
Fact is, nobody ever reported this blatant performance bottleneck. Probably because copying up gigabyte files is not the usual use case for union filesystems...
Still, it's a loophole for some serious DoS. An unprivileged user inside container can touch a file and block all copy up on all other containers on that host and all renames on that host fs as well! Amir.