Re: [BUG] Slab corruption during XFS writeback under memory pressure

From: Dave Chinner <david@fromorbit.com>
Date: 2016-07-17 00:00:12
Also in: linux-scsi, linux-xfs, lkml

On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote:

Hello all,

I've found a nasty source of slab corruption. Based on seeing similar symptoms
on boxes at Facebook, I suspect it's been around since at least 3.10.

It only reproduces under memory pressure so far as I can tell: the issue seems
to be that XFS reclaims pages from buffers that are still in use by
scsi/block. I'm not sure which side the bug lies on, but I've only observed it
with XFS.

[67203.776421] ==================================================================
[67203.792521] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3bf/0x4c0 at addr ffff8804cf466288
[67203.812036] Read of size 8 by task python2.7/22913
[67203.822713] =============================================================================
[67203.840917] BUG buffer_head (Not tainted): kasan: bad access detected
[67203.855253] -----------------------------------------------------------------------------
[67203.855253]
[67203.876727] Disabling lock debugging due to kernel taint
[67203.888575] INFO: Allocated in 0xffff8804cf465d40 age=18437180719206552994 cpu=2191548261 pid=-1
[67203.908139] 	alloc_buffer_head+0x22/0xd0
[67203.916903] 	___slab_alloc+0x4e0/0x520
[67203.925286] 	__slab_alloc+0x43/0x70
[67203.933087] 	kmem_cache_alloc+0x228/0x2c0
[67203.942042] 	alloc_buffer_head+0x22/0xd0
[67203.950782] 	alloc_page_buffers+0xa9/0x1f0
[67203.959936] 	create_empty_buffers+0x30/0x420
[67203.969495] 	create_page_buffers+0x120/0x1b0
[67203.979029] 	__block_write_begin+0x16b/0x1010
[67203.988756] 	xfs_vm_write_begin+0x55/0x1b0
[67203.997884] 	generic_perform_write+0x288/0x510
[67204.007771] 	xfs_file_buffered_aio_write+0x316/0x780
[67204.018811] 	xfs_file_write_iter+0x26f/0x6c0
[67204.028313] 	__vfs_write+0x2a0/0x620
[67204.036276] 	vfs_write+0x159/0x4c0
[67204.043855] 	SyS_write+0xd2/0x1b0
[67204.051245] INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1
[67204.068634] 	free_buffer_head+0x41/0x90
[67204.077175] 	__slab_free+0x1ed/0x340
[67204.085138] 	kmem_cache_free+0x270/0x300
[67204.093867] 	free_buffer_head+0x41/0x90
[67204.102422] 	try_to_free_buffers+0x171/0x240
[67204.111925] 	xfs_vm_releasepage+0xcb/0x3b0
[67204.121101] 	try_to_release_page+0x106/0x190
[67204.130602] 	shrink_page_list+0x118e/0x1a10
[67204.139910] 	shrink_inactive_list+0x42c/0xdf0
[67204.149600] 	shrink_zone_memcg+0xa09/0xfa0
[67204.158715] 	shrink_zone+0x2c3/0xbc0
[67204.166679] 	do_try_to_free_pages+0x42a/0x12f0
[67204.176562] 	try_to_free_pages+0x1a3/0x5d0
[67204.185709] 	__alloc_pages_nodemask+0xbeb/0x20d0
[67204.195979] 	alloc_pages_vma+0x11b/0x5e0
[67204.204709] 	handle_mm_fault+0x2c27/0x47d0
[67204.213823] INFO: Slab 0xffffea00133d1900 objects=37 used=14 fp=0xffff8804cf464530 flags=0x2000000000004080
[67204.235439] INFO: Object 0xffff8804cf466260 @offset=8800 fp=0xbbbbbbbbbbbbbbbb

Hmmm - can only get there with a clean page, so this should be fine.

[67204.455817] CPU: 1 PID: 22913 Comm: python2.7 Tainted: G    B           4.7.0-rc7-calvinowens-1468357363-00001-gcaa3dc6 #1
[67204.480313] Hardware name: Wiwynn   HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016
[67204.497509]  ffff88075e99f480 ffff88075ec87a30 ffffffff81e8b8e4 ffff8804cf464000
[67204.514224]  ffff8804cf466260 ffff88075ec87a60 ffffffff8153a995 ffff88075e99f480
[67204.530924]  ffffea00133d1900 ffff8804cf466260 dffffc0000000000 ffff88075ec87a88
[67204.547624] Call Trace:
[67204.553086]  <IRQ>  [<ffffffff81e8b8e4>] dump_stack+0x68/0x94
[67204.565946]  [<ffffffff8153a995>] print_trailer+0x115/0x1a0
[67204.578334]  [<ffffffff81541174>] object_err+0x34/0x40
[67204.589762]  [<ffffffff815436e7>] kasan_report_error+0x217/0x530
[67204.616847]  [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50
[67204.645085]  [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0
[67204.658243]  [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220
[67204.685362]  [<ffffffff81de0c58>] bio_endio+0x158/0x1b0
[67204.696983]  [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80
[67204.710334]  [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0
[67204.723108]  [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690
[67204.807293]  [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0
[67204.820838]  [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340
[67204.848884]  [<ffffffff81e1e13f>] blk_done_softirq+0x1ff/0x360
[67204.875074]  [<ffffffff82a08cad>] __do_softirq+0x22d/0x8d7
[67204.887270]  [<ffffffff8113d42c>] irq_exit+0x15c/0x190
[67204.898697]  [<ffffffff82a085e3>] smp_apic_timer_interrupt+0x83/0xa0
[67204.912815]  [<ffffffff82a06b29>] apic_timer_interrupt+0x89/0x90
[67205.029113] ==================================================================

But this indicates that the page is under writeback at this point,
so that tends to indicate that the above freeing was incorrect.

Hmmm - it's clear we've got direct reclaim involved here, and the
suspicion of a dirty page that has had it's bufferheads cleared.
Are there any other warnings in the log from XFS prior to kasan
throwing the error?

Is there anything else I can send that might be helpful?

full console/dmesg output from a crashed machine, plus:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

--
/*
 * Run as "./repro outfile 1000", where "outfile" sits on an XFS filesystem.
 */

#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>

#define CHUNK (32768)

static const char crap[CHUNK];

int main(int argc, char **argv)
{
	int r, fd, i;
	size_t allocsize, count;
	void *p;

	if (argc != 3) {
		printf("Usage: %s filename count\n", argv[0]);
		return 1;
	}

	fd = open(argv[1], O_RDWR|O_CREAT, 0644);
	if (fd == -1) {
		perror("Can't open");
		return 1;
	}

	if (!fork()) {
		count = atol(argv[2]);

		while (1) {
			for (i = 0; i < count; i++)
				if (write(fd, crap, CHUNK) != CHUNK)
					perror("Eh?");

			fsync(fd);
			ftruncate(fd, 0);
		}

Hmmmm. Truncate is used, but only after fsync. If the truncate
is removed, does the problem go away?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help