Re: [PATCH] nfsd: fix race to check ls_layouts
From: Benjamin Coddington <hidden>
Date: 2023-01-28 13:33:00
On 27 Jan 2023, at 13:03, Jeff Layton wrote:
On Fri, 2023-01-27 at 11:42 -0500, Benjamin Coddington wrote:quoted
On 27 Jan 2023, at 11:34, Chuck Lever III wrote:quoted
quoted
On Jan 27, 2023, at 11:18 AM, Benjamin Coddington [off-list ref] wrote: Its possible for __break_lease to find the layout's lease before we've added the layout to the owner's ls_layouts list. In that case, setting ls_recalled = true without actually recalling the layout will cause the server to never send a recall callback. Move the check for ls_layouts before setting ls_recalled. Signed-off-by: Benjamin Coddington <redacted>Did this start misbehaving recently, or has it always been broken? That is, does it need: Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls") ?I'm doing some new testing of racing LAYOUTGET and CB_LAYOUTRETURN after running into a livelock, so I think it has always been broken and the Fixes tag is probably appropriate. However, now I'm wondering if we'd run into trouble if ls_layouts could be empty but the lease still exist.. but that seems like it would be a different bug.Yeah, is that even possible? Surely once the last layout is gone, we drop the stateid? In any case, this patch looks fine. You can add: Reviewed-by: Jeff Layton <jlayton@kernel.org>
Jeff pointed out that there's another problem here. We can't just skip sending the callback if ls_layouts is empty, because then the process trying to break the lease will end up spinning in __break_lease. I think we can drop the list_empty() check altogether - it must be there so that we don't race in and send a callback for a layout that's already been returned, but I don't see any harm in that. Clients should just return NO_MATCHING_LAYOUT. Ben