Re: [PATCH v5 00/14] dm-raid/md/raid: fix v6.7 regressions
From: Benjamin Marzinski <bmarzins@redhat.com>
Date: 2024-02-08 23:17:30
Also in:
dm-devel, lkml
On Thu, Feb 08, 2024 at 12:04:45AM -0800, Song Liu wrote:
Hi Benjamin, On Mon, Feb 5, 2024 at 7:58 PM Benjamin Marzinski [off-list ref] wrote:quoted
On Tue, Feb 06, 2024 at 09:36:18AM +0800, Yu Kuai wrote:quoted
Hi! 在 2024/02/06 3:35, Benjamin Marzinski 写道:quoted
Could you run the test with something like # make check_local T=lvconvert-repair-raid.sh VERBOSE=1 > out 2>&1 and post the output.Attached is the output from my VM.Instead of running the tests from the lvm2 git repo, if you run # make -C test install to install the tests, and then create a results directory and run the test from there, do you still see the error in the 6.6 kernel? # make ~/results # cd ~/results # lvm2-testsuite --only lvconvert-repair-raid.sh Running the tests this way will test the installed lvm2 binaries on your system, instead of the ones in the lvm2 git repo. They may be compiled differently.I am not able to get reliable results from shell/lvconvert-repair-raid.sh either. For 6.6.0 kernel, the test fails. On 6.8-rc1 kernel, the test fails sometimes. Could you please share more information about your test setup? Specifically: 1. Which tree/branch/tag are you testing? 2. What's the .config used in the tests? 3. How do you run the test suite? One test at a time, or all of them together? 4. How do you handle "test passes sometimes" cases?
So, I have been able to recreate the case where lvconvert-repair-raid.sh keeps failing. It happens when I tried running the reproducer on a virtual machine made using a cloud image, instead of one that I manually installed. I'm not sure why there is a difference. But I can show you how I can reliably recreate the errors I'm seeing. Create a new Fedora 39 virtual machine with the following commands (I'm not sure if it is possible to reproduce this on a machine using less memory and cpus, but I can try that if you need me to. You probably also want to pick a faster Fedora Mirror for the image location): # virt-install --name repair-test --memory 8192 --vcpus 8 --disk size=40 --graphics none --extra-args "console=ttyS0" --osinfo detect=on,name=fedora-unknown --location https://download.fedoraproject.org/pub/fedora/linux/releases/39/Server/x86_64/os/ Install to the whole virtual drive, using the default LVM partitioning. Then ssh into the VM and run the following commands to setup the lvm2-testsuite and 6.6.0 kernel: # dnf upgrade grub2 # dnf install -y git gcc bc flex make bison openssl openssl-devel dwarves zstd elfutils-libelf-devel libaio-devel lvm2 g++ # git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git # git clone git://sourceware.org/git/lvm2.git # cd ~/lvm2 # ./configure # make # cd ~/linux # git checkout -b ver6.6 v6.6 # cp /boot/config-`uname -r` .config # make olddefconfig # modprobe -a dm_raid dm_delay ext4 raid1 raid10 brd # yes "" | make localmodconfig # make -j8 # make modules_install # make install # reboot ssh back into the VM, and run the following commands to run lvm2-testsuite: # mount -o remount,dev /tmp # cd ~/lvm2 # make check T=lvconvert-repair-raid.sh This should always pass. I ran it 100 times without failure. To test the patched kernel, run: # cd ~/linux # git checkout -b dmraid-fix-v5 v6.8-rc3 # git am ~/dmraid-fix-v5.mbox ***Apply the v5 patches*** # make olddefconfig # make -j8 # make modules_install # make install # reboot Rerun the lvm2-testsuite with the same commands as before: # mount -o remount,dev /tmp # cd ~/lvm2 # make check T=lvconvert-repair-raid.sh This fails about 20% of the time, usually at either line 146 or 164. You can check by running the following command when the test fails. # grep "STACKTRACE()" ~/lvm2/test/results/ndev-vanilla\:shell_lvconvert-repair-raid.sh.txt [ 0:13.152] ## 1 STACKTRACE() called from /root/lvm2/test/shell/lvconvert-repair-raid.sh:146 Let me know if you have any questions, or if this doesn't work for you. -Ben
Thanks, Song