Thread (6 messages) 6 messages, 4 authors, 2017-06-20

Re: [Qemu-devel] [RFH] qemu-2.6 memory corruption with OVMF and linux-4.9

From: Philipp Hahn <hidden>
Date: 2017-06-18 18:22:38
Also in: lkml, qemu-devel

Hello,

Am 17.06.2017 um 18:51 schrieb Laszlo Ersek:
(I also recommend using the "vbindiff" tool for such problems, it is
great for picking out patterns.)

          ** ** ** ** ** ** ** **   8  9 ** ** ** 13 14 15
          -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- --
00000000  01 e8 00 00 00 00 00 00  8c 5e 00 00 00 10 ff f1
00000010  5b 78 8a 3e 00 00 00 00  00 00 00 00 00 00 00 00
00000020  8c 77 00 00 00 12 00 02  18 f0 00 00 00 00 00 00
00000030  00 1e 00 00 00 00 00 00  8c 8c 00 00 00 12 00 02
00000040  07 70 00 00 00 00 00 00  00 14 00 00 00 00 00 00
00000050  8c 9c 00 00 00 12 00 02  22 00 00 00 00 00 00 00
00000060  00 40 00 00 00 00 00 00  8c ac 00 00 00 10 ff f1

00000000  01 e8 00 00 00 00 00 00  00 3c 00 00 00 17 00 00
00000010  5b 78 8a 3e 00 00 00 00  00 3c 00 00 00 07 00 00
00000020  8c 77 00 00 00 12 00 02  00 3c 00 00 00 07 00 00
00000030  00 1e 00 00 00 00 00 00  00 3c 00 00 00 17 00 00
00000040  07 70 00 00 00 00 00 00  00 3c 00 00 00 07 00 00
00000050  8c 9c 00 00 00 12 00 02  00 3c 00 00 00 07 00 00
00000060  00 40 00 00 00 00 00 00  00 3c 00 00 00 17 00 00
          -- -- -- -- -- -- -- --  -- -- -- -- -- -- -- --
          ** ** ** ** ** ** ** **   8  9 ** ** ** 13 14 15

The columns that I marked with "**" are identical between "good" and
"bad". (These are columns 0-7, 10-12.)

Column 8 is overwritten by zeros (every 16th byte).

Column 9 is overwritten by 0x3c (every 16th byte).

Column 13 is super interesting. The most significant nibble in that
column is not disturbed. And, in the least significant nibble, the least
significant three bits are turned on. Basically, the corruption could be
described, for this column (i.e., every 16th byte), as

  bad = good | 0x7

Column 14 is overwritten by zeros (every 16th byte).

Column 15 is overwritten by zeros (every 16th byte).

My take is that your host machine has faulty RAM. Please run memtest86+
or something similar.
I will do so, but for me very unlikely:
- it never happens with BIOS, only with OVMF
- for each test I start q new QEMU process, which should use a different
memory region
- it repeatedly hits e1000 or libata.ko

After updating from OVMF to 0~20161202.7bbe0b3e-1 from
(0~20160813.de74668f-2 it has not yet happened again.

Anyway, thank you for your help.

Philipp
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help