Re: ext4 deep stack with mark_page_dirty reclaim
From: Andreas Dilger <hidden>
Date: 2011-03-15 05:17:30
Also in:
linux-mm, lkml
On 2011-03-14, at 1:46 PM, Ted Ts'o wrote:
On Mon, Mar 14, 2011 at 12:20:52PM -0700, Hugh Dickins wrote:quoted
When testing something else on 2.6.38-rc8 last night, I hit this x86_64 stack overflow. I've never had one before, it seems worth reporting. kdb was in, I jotted it down by hand (the notifier part of it will be notifying kdb of the fault). CONFIG_DEBUG_STACK_OVERFLOW and DEBUG_STACK_USAGE were not set. I should disclose that I have a hack in which may make my stack frames slightly larger than they should be: check against yours. So it may not be an overflow for anyone else, but still a trace to worry about.Here's the trace translated to the stack space used by each function. There are a few piggy ext4 functions that we can try to shrink, but the real problem is just how deep the whole stack is getting. From the syscall to the lowest-level ext4 function is 3712 bytes, and everything from there to the schedule() which then triggered the GPF was another 3728 of stack space....
Is there a script which you used to generate this stack trace to function size mapping, or did you do it by hand? I've always wanted such a script, but the tricky part is that there is so much garbage on the stack that any automated stack parsing is almost useless. Alternately, it would seem trivial to have the stack dumper print the relative address of each symbol, and the delta from the previous symbol...
To be honest, I think the stack size limitation is becoming a serious problem in itself. While some stack-size reduction effort is actually useful in removing inefficiency, I think there is a lot of crazy and inefficient things to try and minimize the stack usage (e.g. lots of kmalloc/kfree of temporary arrays instead of just putting them on the stack), which ends up consuming _more_ total memory.
This can be seen with deep storage stacks that are using the network on both ends, like NFS+{XFS, ext4}+LVM+DM+{fcoib,iSCSI}+driver+kmalloc or similar... The below stack isn't even using something so convoluted.
240 schedule+0x25a 368 io_schedule+0x35 32 get_request_wait+0xc6 160 __make_request+0x36d 112 generic_make_request+0x2f2 208 submit_bio+0xe1 144 swap_writepage+0xa3 80 pageout+0x151 128 shrink_page_list+0x2db 176 shrink_inactive_list+0x2d3 256 shrink_zone+0x17d 224 shrink_zones+0x0xa3 128 do_try_to_free_pages+0x87 144 try_to_free_mem_cgroup_pages+0x8e 112 mem_cgroup_hierarchical_reclaim+0x220 176 mem_cgroup_do_charge+0xdc 128 __mem_cgroup_try_charge+0x19c 128 mem_cgroup_charge_common+0xa8 128 mem_cgroup_cache_charge+0x19a 128 add_to_page_cache_locked+0x57 96 add_to_page_cache_lru+0x3e 80 find_or_create_page+0x69 112 grow_dev_page+0x4a 96 grow_buffers+0x41 64 __getblk_slow+0xd7 80 __getblk+0x44 80 __ext4_get_inode_loc+0x12c 176 ext4_get_inode_loc+0x30 48 ext4_reserve_inode_write+0x21 80 ext4_mark_inode_dirty+0x3b 160 ext4_dirty_inode+0x3e 64 __mark_inode_dirty+0x32 80 linux/fs.h mark_inode_dirty 0 linux/quotaops.h dquot_alloc_space 0 linux/quotaops.h dquot_alloc_block 0 ext4_mb_new_blocks+0xc2 144 ext4_alloc_blocks+0x189 208 ext4_alloc_branch+0x73 208 ext4_ind_map_blocks+0x148 272 ext4_map_blocks+0x148 112 ext4_getblk+0x5f 144 ext4_bread+0x36 96 ext4_append+0x52 96 do_split+0x5b 224 ext4_dx_add_entry+0x4b4 304 ext4_add_entry+0x7c 176 ext4_add_nondir+0x2e 80 ext4_create+0xf5 144 vfs_create+0x83 96 __open_namei_create+0x59 96 do_last+0x13b 112 do_filp_open+0x2ae 384 do_sys_open+0x72 128 sys_open+0x27 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>