Re: [PATCH 2/2] xfs: make sure link path does not go away at access
From: Ian Kent <raven@themaw.net>
Date: 2021-11-13 05:17:41
Also in:
linux-fsdevel, lkml
On Fri, 2021-11-12 at 06:47 -0500, Brian Foster wrote:
On Fri, Nov 12, 2021 at 07:10:19AM +0800, Ian Kent wrote:quoted
On Thu, 2021-11-11 at 11:08 -0500, Brian Foster wrote: Hi Brian,quoted
On Thu, Nov 11, 2021 at 11:39:30AM +0800, Ian Kent wrote:quoted
When following a trailing symlink in rcu-walk mode it's possible to succeed in getting the ->get_link() method pointer but the link path string be deallocated while it's being used. Utilize the rcu mechanism to mitigate this risk. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Ian Kent <raven@themaw.net> --- fs/xfs/kmem.h | 4 ++++ fs/xfs/xfs_inode.c | 4 ++-- fs/xfs/xfs_iops.c | 10 ++++++++-- 3 files changed, 14 insertions(+), 4 deletions(-)...quoted
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a607d6aca5c4..2977e19da7b7 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c@@ -524,11 +524,17 @@ xfs_vn_get_link_inline(/* * The VFS crashes on a NULL pointer, so return - EFSCORRUPTED if - * if_data is junk. + * if_data is junk. Also, if the path walk is in rcu- walk mode + * and the inode link path has gone away due inode re- use we have + * no choice but to tell the VFS to redo the lookup. */ - link = ip->i_df.if_u1.if_data; + link = rcu_dereference(ip->i_df.if_u1.if_data); + if (!dentry && !link) + return ERR_PTR(-ECHILD); +One thing that concerns me slightly about this approach is that inode reuse does not necessarily guarantee that if_data is NULL. It seems technically just as possible (even if exceedingly unlikely) for link to point at newly allocated memory since the previous sequence count validation check. The inode could be reused as another inline symlink for example, though it's not clear to me if that is really a problem for the vfs (assuming a restart would just land on the new link anyways?). But the inode could also be reallocated as something like a shortform directory, which means passing directory header data or whatever that it stores in if_data back to pick_link(), which is then further processed as a string.This is the sort of feedback I was hoping for. This sounds related to the life-cycle of xfs inodes and re-use. Hopefully someone here on the list can enlighten me on this. The thing that comes to mind is that the inode re-use would need to occur between the VFS check that validates the inode is still ok and the use of link string. I think that can still go away even with the above check.Yeah... The original NULL ->get_link() problem was replicated with a small delay in the lookup path (specifically in the symlink processing path). This essentially widens the race window and allows a separate task to invalidate the dentry between the time the last dentry sequence validation occurred (and passed) and the attempt to call ->get_link() becomes imminent. I think patch 1 largely addresses this issue because we'll have revalidated the previous read of the function pointer before we attempt to call it. That leads to this patch, which suggests that even after the validation fix a small race window still technically exists with the -quoted
get_link()code and inode teardown. In fact, it's not that hard to show that this is true by modifying the original reproducer to push the delay out beyond the check added by patch 1 (or into the ->get_link() callback). Playing around with that a bit, it's possible to induce a -quoted
get_link()call to an inode that was reallocated as a shortform directory and returns a non-NULL if_data fork of that dir back to the vfs (to be interpreted as a symlink string). Nothing seems to explode on my quick test, fortunately, but I don't think that's an interaction we want to maintain. Of course one caveat to all of that is that after patch 1, the race window for that one might be so small as to make this impossible to reproduce in practice (whereas the problem fixed by patch 1 has been reproduced by users)...quoted
Hopefully someone can clarify what happens here.quoted
With that, I wonder why we wouldn't just return -ECHILD here like we do for the non-inline case to address the immediate problem, and then perhaps separately consider if we can rework bits of the reuse/reclaim code to allow rcu lookup of inline symlinks under certain conditions.Always switching to ref-walk mode would certainly resolve the problem too, yes, perhaps we have no choice ...Oh I don't think it's the only choice. I think Miklos' suggestion to use ->free_inode() is probably the right general approach. I just think a switch to ref-walk mode might be a good incremental step to fix this problem in a backportable way (s_op->free_inode() is newer relative to the introduction of _get_link_inline()). We can always re-enable rcu symlink processing once we get our inode teardown/reuse bits fixed up accordingly.. Just my .02.
Yes, I've had a change of heart on this too. I think returning -ECHILD from xfs_vn_get_link_inline() is the best solution. There are a couple of reasons for that, the main one being the link string can still go away while the VFS is using it, but also Al has said more than once that switching to ref-walk mode is not a big deal and that makes the problems vanish completely. In any case references are taken at successful walk completion anyway. If it's found staying rcu-walk mode whenever possible is worth while in cases like this then there's probably a lot more to do to do this properly. The lockless stuff is tricky and error prone (certainly it is for me) and side effects are almost always hiding in unexpected places. So as you say, that's something for another day. I'll update the patch and post an update. Ian
Brianquoted
Ianquoted
FWIW, I'm also a little curious why we don't set i_link for inline symlinks. I don't think that addresses this validation problem, but perhaps might allow rcu lookups in the inline symlink common case where things don't change during the lookup (and maybe even eliminate the need for this custom inline callback)..? Brianquoted
if (XFS_IS_CORRUPT(ip->i_mount, !link)) return ERR_PTR(-EFSCORRUPTED); + return link; }