Fixes: 260f32adb88 ("pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect")
When an applications get killed (SIGTERM/SIGINT) while pNFS client performs a connection
to DS, client ends in an infinite loop of connect-disconnect. This
source of the issue, it that flexfilelayoutdev#nfs4_ff_layout_prepare_ds gets an error
on nfs4_pnfs_ds_connect with status ERESTARTSYS, which is set by rpc_signal_task, but
the error is treated as transient, thus retried.
The issue is reproducible with script as (there should be ~1000 files in
a directory, client should must not have any connections to DSes):
echo 3 > /proc/sys/vm/drop_caches
for i in *
do
head -1 $i &
PP=$!
sleep 10e-03
kill -TERM $PP
done
Signed-off-by: Tigran Mkrtchyan <redacted>
---
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index 4a304cf17c4b..0008a8180c9b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -410,6 +410,10 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
mirror->mirror_ds->ds_versions[0].wsize = max_payload;
goto out;
}
+ /* There is a fatal error to connect to DS. Mark it unavailable to avoid infinite retry loop. */
+ if (nfs_error_is_fatal(status))
+ nfs4_mark_deviceid_unavailable(&mirror->mirror_ds->id_node);
+
noconnect:
ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
mirror, lseg->pls_range.offset,
--
2.49.0