Re: blktests failures with v6.11-rc1 kernel
From: Yi Zhang <hidden>
Date: 2024-08-13 07:06:48
Also in:
linux-nvme, linux-rdma, linux-scsi
On Sat, Aug 3, 2024 at 12:49 AM Nilay Shroff [off-list ref] wrote:
On 8/2/24 18:04, Shinichiro Kawasaki wrote:quoted
CC+: Yi Zhang, On Aug 02, 2024 / 17:46, Nilay Shroff wrote:quoted
On 8/2/24 14:39, Shinichiro Kawasaki wrote:quoted
#3: nvme/052 (CKI failure) The CKI project reported that nvme/052 fails occasionally [4]. This needs further debug effort. nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed] runtime ... 22.209s --- tests/nvme/052.out 2024-07-30 18:38:29.041716566 -0400 +++ /mnt/tests/gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/archive/production/kernel-tests-production.zip/storage/blktests/nvme/nvme-loop/blktests/results/nodev_tr_loop/nvme/052.out.bad 2024-07-30 18:45:35.438067452 -0400 @@ -1,2 +1,4 @@ Running nvme/052 +cat: /sys/block/nvme1n2/uuid: No such file or directory +cat: /sys/block/nvme1n2/uuid: No such file or directory Test complete [4] https://datawarehouse.cki-project.org/kcidb/tests/13669275I just checked the console logs of the nvme/052 and from the logs it's apparent that all namespaces were created successfully and so it's strange to see that the test couldn't access "/sys/block/nvme1n2/uuid".I agree that it's strange. I think the "No such file or directory" error happened in _find_nvme_ns(), and it checks existence of the uuid file before the cat command. I have no idea why the error happens.Yes exactly, and these two operations (checking the existence of uuid and cat command) are not atomic. So the only plausible theory I have at this time is "if namespace is deleted after checking the existence of uuid but before cat command is executed" then this issue may potentially manifests. Furthermore, as you mentioned, this issue is seen on the test machine occasionally, so I asked if there's a possibility of simultaneous blktest or some other tests running on this system.
There are no simultaneous tests during the CKI tests running.
I reproduced the failure on that server and always can be reproduced
within 5 times:
# sh a.sh
==============================0
nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [passed]
runtime 21.496s ... 21.398s
==============================1
nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [failed]
runtime 21.398s ... 21.974s
--- tests/nvme/052.out 2024-08-10 00:30:06.989814226 -0400
+++ /root/blktests/results/nodev_tr_loop/nvme/052.out.bad
2024-08-13 02:53:51.635047928 -0400
@@ -1,2 +1,5 @@
Running nvme/052
+cat: /sys/block/nvme1n2/uuid: No such file or directory
+cat: /sys/block/nvme1n2/uuid: No such file or directory
+cat: /sys/block/nvme1n2/uuid: No such file or directory
Test complete
# uname -r
6.11.0-rc3
[root@hpe-rl300gen11-04 blktests]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
zram0 252:0 0 8G 0 disk [SWAP]
nvme0n1 259:0 0 447.1G 0 disk
├─nvme0n1p1 259:1 0 600M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1G 0 part /boot
└─nvme0n1p3 259:3 0 445.5G 0 part
└─fedora_hpe--rl300gen11--04-root 253:0 0 445.5G 0 lvm /
Thanks, --Nilay
-- Best Regards, Yi Zhang