Thread (46 messages) 46 messages, 9 authors, 2017-03-06

Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"

From: Michal Hocko <mhocko@kernel.org>
Date: 2017-01-27 09:37:35
Also in: ceph-devel, linux-ext4, linux-f2fs-devel, linux-fsdevel, linux-nfs, linux-xfs, lkml

On Fri 27-01-17 01:13:18, Theodore Ts'o wrote:
On Thu, Jan 26, 2017 at 08:44:55AM +0100, Michal Hocko wrote:
quoted
quoted
quoted
I'm convinced the current series is OK, only real life will tell us whether
we missed something or not ;)
I would like to extend the changelog of "jbd2: mark the transaction
context with the scope GFP_NOFS context".

"
Please note that setups without journal do not suffer from potential
recursion problems and so they do not need the scope protection because
neither ->releasepage nor ->evict_inode (which are the only fs entry
points from the direct reclaim) can reenter a locked context which is
doing the allocation currently.
"
Could you comment on this Ted, please?
I guess....   so there still is one way this could screw us, and it's this reason for GFP_NOFS:

        - to prevent from stack overflows during the reclaim because
	          the allocation is performed from a deep context already

The writepages call stack can be pretty deep.  (Especially if we're
using ext4 in no journal mode over, say, iSCSI.)

How much stack space can get consumed by a reclaim?
./scripts/stackusage with allyesconfig says:

./mm/page_alloc.c:3745  __alloc_pages_nodemask  264     static
./mm/page_alloc.c:3531  __alloc_pages_slowpath  520     static
./mm/vmscan.c:2946      try_to_free_pages       216     static
./mm/vmscan.c:2753      do_try_to_free_pages    304     static
./mm/vmscan.c:2517      shrink_node     	352     static
./mm/vmscan.c:2317      shrink_node_memcg       560     static
./mm/vmscan.c:1692      shrink_inactive_list    688     static
./mm/vmscan.c:908       shrink_page_list        608     static

So this would be 3512 for the standard LRUs reclaim whether we have
GFP_FS or not. shrink_page_list can recurse to releasepage but there is
no NOFS protection there so it doesn't make much sense to check this
path. So we are left with the slab shrinkers path

./mm/page_alloc.c:3745  __alloc_pages_nodemask  264     static
./mm/page_alloc.c:3531  __alloc_pages_slowpath  520     static
./mm/vmscan.c:2946      try_to_free_pages       216     static
./mm/vmscan.c:2753      do_try_to_free_pages    304     static
./mm/vmscan.c:2517      shrink_node     	352     static
./mm/vmscan.c:427       shrink_slab     	336     static
./fs/super.c:56 	super_cache_scan        104     static << here we have the NOFS protection
./fs/dcache.c:1089      prune_dcache_sb 	152     static
./fs/dcache.c:939       shrink_dentry_list      96      static
./fs/dcache.c:509       __dentry_kill   	72      static
./fs/dcache.c:323       dentry_unlink_inode     64      static
./fs/inode.c:1527       iput    		80      static
./fs/inode.c:532        evict   		72      static

This is where the fs specific callbacks play role and I am not sure
which paths can pass through for ext4 in the nojournal mode and how much
of the stack this can eat. But currently we are at +536 wrt. NOFS
context. This is quite a lot but still much less (2632 vs. 3512) than
the regular reclaim. So there is quite some stack space to eat... I am
wondering whether we have to really treat nojournal mode any special
just because of the stack usage?

If this ever turn out to be a problem and with the vmapped stacks we
have good chances to get a proper stack traces on a potential overflow
we can add the scope API around the problematic code path with the
explanation why it is needed.

Does that make sense to you?

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help