Re: [PATCH] btrfs: setup the page before calling any subpage helper
From: David Sterba <hidden>
Date: 2021-07-30 11:10:55
On Fri, Jul 30, 2021 at 01:58:57PM +0800, Qu Wenruo wrote:
Function set_page_extent_mapped() will setup the data page cache so that for subpage cases those pages will have page->private to store subpage specific info. Normally this happens when we create a new page for the page cache. But there is a special call site, __extent_writepage(), as we have special cases where upper layer can mark some page dirty without going through set_page_dirty() interface. I haven't yet seen any real world case for this, but if that's possible then in __extent_writepage() we will call btrfs_page_clear_error() before setting up the page->private, which can lead to NULL pointer dereference.
Yeah it's hard to believe, but it's been there since almost the beginning. Back then there was a hard BUG() in the fixup worker, I've hit it randomly on x86_64, https://lore.kernel.org/linux-btrfs/20111031154139.GF19328@twin.jikos.cz/ (local) you could find a lot of other reports where it crashed inside btrfs_writepage_fixup_worker.
Fix it by moving set_page_extent_mapped() call before btrfs_page_clear_error(). And make sure in the error path we won't call anything subpage helper.
I'm not sure about the fix, because the whole fixup thing is not entirely clear.
Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
Signed-off-by: Qu Wenruo <redacted>
---
I really hope we can have a more explicit comment about in exactly which
cases we can have such page, and maybe some test cases for it.The only reliable test case was on s390 with a particular seed for fsx, I still have it stored somewhere. On x86_64 it's very hard to hit.
In fact, I haven't really seen any case like this, and it doesn't really make sense for me to make some MM layer code to mark a page dirty without going through set_page_dirty() interface.
On s390 it's quick because the page state bits are stored in 2 places and need to be synced. On x86_64 it's very unclear and low level arch specific MM stuff but it is still a problem.