Thread (32 messages) 32 messages, 6 authors, 2021-12-16

Re: [PATCH 2/2] xfs: make sure link path does not go away at access

From: Ian Kent <raven@themaw.net>
Date: 2021-11-13 05:17:41
Also in: linux-fsdevel, lkml

On Fri, 2021-11-12 at 06:47 -0500, Brian Foster wrote:
On Fri, Nov 12, 2021 at 07:10:19AM +0800, Ian Kent wrote:
quoted
On Thu, 2021-11-11 at 11:08 -0500, Brian Foster wrote:

Hi Brian,
quoted
On Thu, Nov 11, 2021 at 11:39:30AM +0800, Ian Kent wrote:
quoted
When following a trailing symlink in rcu-walk mode it's
possible to
succeed in getting the ->get_link() method pointer but the link
path
string be deallocated while it's being used.

Utilize the rcu mechanism to mitigate this risk.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Ian Kent <raven@themaw.net>
---
 fs/xfs/kmem.h      |    4 ++++
 fs/xfs/xfs_inode.c |    4 ++--
 fs/xfs/xfs_iops.c  |   10 ++++++++--
 3 files changed, 14 insertions(+), 4 deletions(-)
...
quoted
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4..2977e19da7b7 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -524,11 +524,17 @@ xfs_vn_get_link_inline(
 
        /*
         * The VFS crashes on a NULL pointer, so return -
EFSCORRUPTED if
-        * if_data is junk.
+        * if_data is junk. Also, if the path walk is in rcu-
walk
mode
+        * and the inode link path has gone away due inode re-
use
we have
+        * no choice but to tell the VFS to redo the lookup.
         */
-       link = ip->i_df.if_u1.if_data;
+       link = rcu_dereference(ip->i_df.if_u1.if_data);
+       if (!dentry && !link)
+               return ERR_PTR(-ECHILD);
+
One thing that concerns me slightly about this approach is that
inode
reuse does not necessarily guarantee that if_data is NULL. It
seems
technically just as possible (even if exceedingly unlikely) for
link
to
point at newly allocated memory since the previous sequence count
validation check. The inode could be reused as another inline
symlink
for example, though it's not clear to me if that is really a
problem
for
the vfs (assuming a restart would just land on the new link
anyways?).
But the inode could also be reallocated as something like a
shortform
directory, which means passing directory header data or whatever
that
it
stores in if_data back to pick_link(), which is then further
processed
as a string.
This is the sort of feedback I was hoping for.

This sounds related to the life-cycle of xfs inodes and re-use.
Hopefully someone here on the list can enlighten me on this.

The thing that comes to mind is that the inode re-use would
need to occur between the VFS check that validates the inode
is still ok and the use of link string. I think that can still
go away even with the above check.
Yeah... The original NULL ->get_link() problem was replicated with a
small delay in the lookup path (specifically in the symlink
processing
path). This essentially widens the race window and allows a separate
task to invalidate the dentry between the time the last dentry
sequence
validation occurred (and passed) and the attempt to call ->get_link()
becomes imminent. I think patch 1 largely addresses this issue
because
we'll have revalidated the previous read of the function pointer
before
we attempt to call it.

That leads to this patch, which suggests that even after the
validation
fix a small race window still technically exists with the -
quoted
get_link()
code and inode teardown. In fact, it's not that hard to show that
this
is true by modifying the original reproducer to push the delay out
beyond the check added by patch 1 (or into the ->get_link()
callback).
Playing around with that a bit, it's possible to induce a -
quoted
get_link()
call to an inode that was reallocated as a shortform directory and
returns a non-NULL if_data fork of that dir back to the vfs (to be
interpreted as a symlink string). Nothing seems to explode on my
quick
test, fortunately, but I don't think that's an interaction we want to
maintain.

Of course one caveat to all of that is that after patch 1, the race
window for that one might be so small as to make this impossible to
reproduce in practice (whereas the problem fixed by patch 1 has been
reproduced by users)...
quoted
Hopefully someone can clarify what happens here.
quoted
With that, I wonder why we wouldn't just return -ECHILD here like
we
do
for the non-inline case to address the immediate problem, and
then
perhaps separately consider if we can rework bits of the
reuse/reclaim
code to allow rcu lookup of inline symlinks under certain
conditions.
Always switching to ref-walk mode would certainly resolve the
problem too, yes, perhaps we have no choice ...
Oh I don't think it's the only choice. I think Miklos' suggestion to
use
->free_inode() is probably the right general approach. I just think a
switch to ref-walk mode might be a good incremental step to fix this
problem in a backportable way (s_op->free_inode() is newer relative
to
the introduction of _get_link_inline()). We can always re-enable rcu
symlink processing once we get our inode teardown/reuse bits fixed up
accordingly.. Just my .02.
Yes, I've had a change of heart on this too.

I think returning -ECHILD from xfs_vn_get_link_inline() is the
best solution.

There are a couple of reasons for that, the main one being the
link string can still go away while the VFS is using it, but
also Al has said more than once that switching to ref-walk mode
is not a big deal and that makes the problems vanish completely.
In any case references are taken at successful walk completion
anyway.

If it's found staying rcu-walk mode whenever possible is worth
while in cases like this then there's probably a lot more to do
to do this properly. The lockless stuff is tricky and error prone
(certainly it is for me) and side effects are almost always hiding
in unexpected places.

So as you say, that's something for another day.
I'll update the patch and post an update.

Ian 
Brian
quoted
Ian
quoted
FWIW, I'm also a little curious why we don't set i_link for
inline
symlinks. I don't think that addresses this validation problem,
but
perhaps might allow rcu lookups in the inline symlink common case
where
things don't change during the lookup (and maybe even eliminate
the
need
for this custom inline callback)..?

Brian
quoted
        if (XFS_IS_CORRUPT(ip->i_mount, !link))
                return ERR_PTR(-EFSCORRUPTED);
+
        return link;
 }
 
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help