Thread (11 messages) 11 messages, 5 authors, 2024-08-17

Re: Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c

From: Niklas Cassel <cassel@kernel.org>
Date: 2024-08-13 14:59:44
Also in: linux-ide, lkml

Hello Michael,

On Tue, Aug 13, 2024 at 10:32:36PM +1000, Michael Ellerman wrote:
Niklas Cassel [off-list ref] writes:
quoted
Hello Jonáš, Kolbjørn,

thank you for the report.

On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
quoted
On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
quoted
Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
source tarball, building large software packages (or kernel) etc.

After a bit of testing, and patient kernel rebuilding (while crashing) I
found the cuplit to be this commit/change

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
with the 6.10.3 kernel version. Reverting the linked change
("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
the errors go away.
Michael, as the author of the this commit, could you please look into
this issue?
I can. My commit was really just working around the warning in the SCSI
core which appeared after afd53a3d8528, it was supposed to just fix the
warning without changing behaviour. Though obviously it did for 4KB
PAGE_SIZE kernels.

I don't have easy access to my mac-mini so it would be helpful if you
can test changes Jonáš and/or Kolbjørn.
quoted
We could revert your patch, which appears to work for some users,
but that would again break setups with PAGE_SIZE == 64K.
(I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)
Yes they are using 4K, it says so in the oops.
quoted
quoted
------------[ cut here ]------------
kernel BUG at drivers/ata/pata_macio.c:544!
https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544

It seems that the
while (sg_len) loop does not play nice with the new .max_segment_size.
Right, but only for 4KB kernels for some reason. Is there some limit
elsewhere that prevents the bug tripping on 64KB kernels, or is it just
luck that no one has hit it?
Have your tried running fio (flexible I/O tester), with reads with a very
large block sizes?

I would be surprised if it isn't possible to trigger the same bug with
64K page size.

max segment size = 64K
MAX_DCMDS = 256
256 * 64K = 16 MiB
What happens if you run fio with a 16 MiB blocksize?

Something like:
$ sudo fio --name=test --filename=/dev/sdX --direct=1 --runtime=60 --ioengine=io_uring --rw=read --iodepth=4 --bs=16M


Kind regards,
Niklas
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help