Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()

[PATCH 1/5] cgroup: fix broken css_has_online_children() · Li Zefan <hidden> · 2014-06-12
[PATCH 2/5] percpu-ref: introduce percpu_ref_alive() · Li Zefan <hidden> · 2014-06-12
[PATCH 3/5] cgroup: fix mount failure in a corner case · Li Zefan <hidden> · 2014-06-12
Re: [PATCH 3/5] cgroup: fix mount failure in a corner case · Tejun Heo <tj@kernel.org> · 2014-06-20
Re: [PATCH 3/5] cgroup: fix mount failure in a corner case · Li Zefan <hidden> · 2014-06-24
[PATCH 4/5] kernfs: introduce kernfs_pin_sb() and kernfs_drop_sb() · Li Zefan <hidden> · 2014-06-12
[PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Li Zefan <hidden> · 2014-06-12
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Tejun Heo <tj@kernel.org> · 2014-06-20
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Li Zefan <hidden> · 2014-06-24
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Tejun Heo <tj@kernel.org> · 2014-06-24
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Li Zefan <hidden> · 2014-06-25
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Tejun Heo <tj@kernel.org> · 2014-06-25
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Li Zefan <hidden> · 2014-06-27
Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() · Tejun Heo <tj@kernel.org> · 2014-06-27
Re: [PATCH 1/5] cgroup: fix broken css_has_online_children() · Tejun Heo <tj@kernel.org> · 2014-06-17

From: Li Zefan <hidden>
Date: 2014-06-24 01:22:09
Also in: lkml

On 2014/6/21 3:35, Tejun Heo wrote:

Hello, Li.

Sorry about the long delay.

On Thu, Jun 12, 2014 at 02:33:05PM +0800, Li Zefan wrote:

quoted

We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.

Run two instances of this script concurrently:

    for ((; ;))
    {
    	mount -t cgroup -o cpuacct xxx /cgroup
    	umount /cgroup
    }

After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.

This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().

To fix this, we increase the refcnt of the superblock instead of increasing
the percpu refcnt of cgroup root.

Ah, right.  Gees, I'm really hating the fact that we have ->mount but
not ->umount.  However, can't we make it a bit simpler by just
introducing a mutex protecting looking up and refing up an existing
root and a sb going away?  The only problem is that the refcnt being
killed isn't atomic w.r.t. new live ref coming up, right?  Why not
just add a mutex around them so that they can't race?

Well, kill_sb() is called with sb->s_umount held, while kernfs_mount()
returned with sb->s_umount held, so adding a mutex will lead to ABBA
deadlock.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help