Thread (64 messages) 64 messages, 10 authors, 2020-06-06

Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: 2020-02-29 13:32:54
Also in: linux-pci, lkml

On Sat, Feb 29, 2020 at 12:08:28PM +0000, Russell King - ARM Linux admin wrote:
On Sat, Feb 29, 2020 at 11:04:56AM +0000, Russell King - ARM Linux admin wrote:
quoted
On Sat, Feb 29, 2020 at 09:55:50AM +0000, Russell King - ARM Linux admin wrote:
quoted
On Mon, Feb 10, 2020 at 03:22:57PM +0000, Russell King - ARM Linux admin wrote:
quoted
On Mon, Feb 10, 2020 at 04:12:30PM +0100, Olof Johansson wrote:
quoted
On Thu, Feb 6, 2020 at 11:57 AM Z.q. Hou [off-list ref] wrote:
quoted
Hi Olof,

Thanks a lot for your comments!
And sorry for my delay respond!
Actually, they apply with only minor conflicts on top of current -next.

Bjorn, any chance we can get you to pick these up pretty soon? They
enable full use of a promising ARM developer system, the SolidRun
HoneyComb, and would be quite valuable for me and others to be able to
use with mainline or -next without any additional patches applied --
which this patchset achieves.

I know there are pending revisions based on feedback. I'll leave it up
to you and others to determine if that can be done with incremental
patches on top, or if it should be fixed before the initial patchset
is applied. But all in all, it's holding up adaption by me and surely
others of a very interesting platform -- I'm looking to replace my
aging MacchiatoBin with one of these and would need PCIe/NVMe to work
before I do.
If you're going to be using NVMe, make sure you use a power-fail safe
version; I've already had one instance where ext4 failed to mount
because of a corrupted journal using an XPG SX8200 after the Honeycomb
Serror'd, and then I powered it down after a few hours before later
booting it back up.

EXT4-fs (nvme0n1p2): INFO: recovery required on readonly filesystem
EXT4-fs (nvme0n1p2): write access will be enabled during recovery
JBD2: journal transaction 80849 on nvme0n1p2-8 is corrupt.
EXT4-fs (nvme0n1p2): error loading journal
... and last night, I just got more ext4fs errors on the NVMe, without
any unclean power cycles:

[73729.556544] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid
[73729.565354] Aborting journal on device nvme0n1p2-8.
[73729.568995] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[73729.569077] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal
[73729.573741] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm rm: iget: checksum invalid
[73729.593330] EXT4-fs error (device nvme0n1p2): ext4_lookup:1700: inode #917524: comm mv: iget: checksum invalid

The affected file is /var/backups/dpkg.status.6.gz

It was cleanly shut down and powered off on the 22nd February, booted
yesterday morning followed by another reboot a few minutes later.

What worries me is the fact that corruption has happened - and if that
happens to a file rather than an inode, it will likely go unnoticed
for a considerably longer time.

I think I'm getting to the point of deciding NVMe or the LX2160A to be
just too unreliable for serious use.  I hadn't noticed any issues when
using the rootfs on the eMMC, so it suggests either the NVMe is
unreliable, or there's a problem with PCIe on this platform (which we
kind of know about with Jon's GPU rendering issues.)
Adding Ted and Andreas...

Here's the debugfs -n "id" output for dpkg.status.5.gz (which is fine,
and probably a similar size):

debugfs:  id <917527>
0000  a481 0000 30ff 0300 bd8e 475e bd77 4f5e  ....0.....G^.wO^
0020  29ca 345e 0000 0000 0000 0100 0002 0000  ).4^............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 4000 0000 8087 3800  ........@.....8.
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 c40b 4c0a 0000 0000 0000 0000  ......L.........
0160  0000 0000 0000 0000 0000 0000 3884 0000  ............8...
0200  2000 95f2 44b8 bdc9 a4d2 9883 c861 dc92   ...D........a..
0220  bd31 4a5e ecc5 260c 0000 0000 0000 0000  .1J^..&.........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*

and for the affected inode:
debugfs:  id <917524>
0000  a481 0000 30ff 0300 3d3d 465e bd77 4f5e  ....0...==F^.wO^
0020  29ca 345e 0000 0000 0000 0100 0002 0000  ).4^............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 4000 0000 c088 3800  ........@.....8.
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 5fc4 cfb4 0000 0000 0000 0000  ...._...........
0160  0000 0000 0000 0000 0000 0000 af23 0000  .............#..
0200  2000 1cc3 ac95 c9c8 a4d2 9883 583e addf   ...........X>..
0220  3de0 485e b04d 7151 0000 0000 0000 0000  =.H^.MqQ........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*

and "stat" output:
debugfs:  stat <917527>
Inode: 917527   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 172755908    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 261936
File ACL: 0
Links: 1   Blockcount: 512
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5e4f77bd:c9bdb844 -- Fri Feb 21 06:25:01 2020
 atime: 0x5e478ebd:92dc61c8 -- Sat Feb 15 06:25:01 2020
 mtime: 0x5e34ca29:8398d2a4 -- Sat Feb  1 00:45:29 2020
crtime: 0x5e4a31bd:0c26c5ec -- Mon Feb 17 06:25:01 2020
Size of extra inode fields: 32
Inode checksum: 0xf2958438
EXTENTS:
(0-63):3704704-3704767
debugfs:  stat <917524>
Inode: 917524   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 3033515103    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 261936
File ACL: 0
Links: 1   Blockcount: 512
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x5e4f77bd:c8c995ac -- Fri Feb 21 06:25:01 2020
 atime: 0x5e463d3d:dfad3e58 -- Fri Feb 14 06:25:01 2020
 mtime: 0x5e34ca29:8398d2a4 -- Sat Feb  1 00:45:29 2020
crtime: 0x5e48e03d:51714db0 -- Sun Feb 16 06:25:01 2020
Size of extra inode fields: 32
Inode checksum: 0xc31c23af
EXTENTS:
(0-63):3705024-3705087

When using sif (set_inode_info) to re-set the UID to 0 on this (so
provoke the checksum to be updated):

debugfs:  id <917524>
0000  a481 0000 30ff 0300 3d3d 465e bd77 4f5e  ....0...==F^.wO^
0020  29ca 345e 0000 0000 0000 0100 0002 0000  ).4^............
0040  0000 0800 0100 0000 0af3 0100 0400 0000  ................
0060  0000 0000 0000 0000 4000 0000 c088 3800  ........@.....8.
0100  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 5fc4 cfb4 0000 0000 0000 0000  ...._...........
0160  0000 0000 0000 0000 0000 0000 b61f 0000  ................
                                    ^^^^
0200  2000 aa15 ac95 c9c8 a4d2 9883 583e addf   ...........X>..
           ^^^^
0220  3de0 485e b04d 7151 0000 0000 0000 0000  =.H^.MqQ........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*

The values with "^^^^" are the checksum, which are the only values
that have changed here - the checksum is now 0x15aa1fb6 rather than
0xc31c23af.

With that changed, running e2fsck -n on the filesystem results in a
pass:

root@cex7:~# e2fsck -n /dev/nvme0n1p2
e2fsck 1.44.5 (15-Dec-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/nvme0n1p2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p2: 121163/2097152 files (0.1% non-contiguous), 1349227/8388608 blocks

and the file now appears to be intact (being a gzip file, gzip verifies
that the contents are now as it expects.)

So, it looks like the _only_ issue is that the checksum on the inode
became invalid, which seems to suggest that it *isn't* a NVMe nor PCIe
issue.

I wonder whether the journal would contain anything useful, but I don't
know how to use debugfs to find that out - while I can dump the journal,
I'd need to know which block contains the inode, and then work out where
in the journal that block was going to be written.  If that would help,
let me know ASAP as I'll hold off rebooting the platform for a while
(which means the filesystem will remain as-is - and yes, I have the
debugfs file for e2undo to put stuff back.)  Maybe it's possible to pull
the block number out of the e2undo file?
Okay, the inode was stored in block 3670049, and the journal appears
to contains no entries for that block.
quoted
tune2fs says:

Checksum type:            crc32c
Checksum:                 0x682f91b9

I guess this is what is used to checksum the inodes?  If so, it's using
the kernel's crc32c-generic driver (according to /proc/crypto).

Could it be a race condition, or some problem that's specific to the
ARM64 kernel that's provoking this corruption?
Something else occurs to me:

root@cex7:~# ls -li --time=ctime --full-time /var/backups/dpkg.status*
917622 -rw-r--r-- 1 root root 999052 2020-02-29 06:25:01.852231277 +0000 /var/backups/dpkg.status
917583 -rw-r--r-- 1 root root 999052 2020-02-21 06:25:01.958160960 +0000 /var/backups/dpkg.status.0
917520 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.954161050 +0000 /var/backups/dpkg.status.1.gz
917531 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.854163293 +0000 /var/backups/dpkg.status.2.gz
917532 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.850163383 +0000 /var/backups/dpkg.status.3.gz
917509 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.850163383 +0000 /var/backups/dpkg.status.4.gz
917527 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.846163473 +0000 /var/backups/dpkg.status.5.gz
917524 -rw-r--r-- 1 root root 261936 2020-02-21 06:25:01.842163563 +0000 /var/backups/dpkg.status.6.gz

So the last time that the kernel changed inode 917524 was on the 21th
of February, probably when it was last renamed by logrotate, and like
several other files stored in the same inode block.  Yet, _only_ the
checksum for 917524 was corrupted, the rest were fine.

I would guess that logrotate behaves as follows:
- remove /var/backups/dpkg.status.6.gz
- rename /var/backups/dpkg.status.5.gz to /var/backups/dpkg.status.6.gz
- repeat for other dpkg.status.*.gz files
- gzip /var/backups/dpkg.status.0 to /var/backups/dpkg.status.1.gz
- rename /var/backups/dpkg.status to /var/backups/dpkg.status.0
- create new /var/backups/dpkg.status

Looking at the inode block in the e2undo file, inode 917524 is at
offset 0x300 into the block, which means the first inode in the
block is 917521 and the last is 917536, which means we have several
of the dpkg.status.* files that are stored in this inode block.

That would've meant that the inode for /var/backups/dpkg.status.6.gz
would have been updated just before the inode for
/var/backups/dpkg.status.5.gz.  I wonder if the inode block was
written out somehow out of order, with the ctime for
/var/backups/dpkg.status.6.gz having been updated but not the checksum
as a result of the later changes - maybe as a result of having
executed on a different CPU?  That would suggest a weakness in the
ARM64 locking implementation, coherency issues, or interconnect issues.
Looking at the errata configuration, I have:

# ARM errata workarounds via the alternatives framework
#
CONFIG_ARM64_WORKAROUND_CLEAN_CACHE=y
CONFIG_ARM64_ERRATUM_826319=y
CONFIG_ARM64_ERRATUM_827319=y
CONFIG_ARM64_ERRATUM_824069=y
CONFIG_ARM64_ERRATUM_819472=y
CONFIG_ARM64_ERRATUM_832075=y
CONFIG_ARM64_ERRATUM_834220=y
CONFIG_ARM64_ERRATUM_845719=y
CONFIG_ARM64_ERRATUM_843419=y
CONFIG_ARM64_ERRATUM_1024718=y
CONFIG_ARM64_ERRATUM_1418040=y
CONFIG_ARM64_ERRATUM_1165522=y
CONFIG_ARM64_ERRATUM_1286807=y
CONFIG_ARM64_ERRATUM_1319367=y
CONFIG_ARM64_ERRATUM_1463225=y
# CONFIG_ARM64_ERRATUM_1542419 is not set
# CONFIG_CAVIUM_ERRATUM_22375 is not set
# CONFIG_CAVIUM_ERRATUM_23154 is not set
# CONFIG_CAVIUM_ERRATUM_27456 is not set
# CONFIG_CAVIUM_ERRATUM_30115 is not set
# CONFIG_CAVIUM_TX2_ERRATUM_219 is not set
CONFIG_QCOM_FALKOR_ERRATUM_1003=y
CONFIG_ARM64_WORKAROUND_REPEAT_TLBI=y
CONFIG_QCOM_FALKOR_ERRATUM_1009=y
CONFIG_QCOM_QDF2400_ERRATUM_0065=y
# CONFIG_SOCIONEXT_SYNQUACER_PREITS is not set
# CONFIG_HISILICON_ERRATUM_161600802 is not set
CONFIG_QCOM_FALKOR_ERRATUM_E1041=y
# CONFIG_FUJITSU_ERRATUM_010001 is not set
# end of ARM errata workarounds via the alternatives framework
...
CONFIG_FSL_ERRATUM_A008585=y
CONFIG_HISILICON_ERRATUM_161010101=y
CONFIG_ARM64_ERRATUM_858921=y

so I don't think it's a missing errata kconfig setting, unless there's
an erratum that isn't in v5.5 that's necessary.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help