Thread (4 messages) 4 messages, 2 authors, 2021-08-12

Re: [PATCH] btrfs: setup the page before calling any subpage helper

From: David Sterba <hidden>
Date: 2021-07-30 11:10:55

On Fri, Jul 30, 2021 at 01:58:57PM +0800, Qu Wenruo wrote:
Function set_page_extent_mapped() will setup the data page cache so that
for subpage cases those pages will have page->private to store subpage
specific info.

Normally this happens when we create a new page for the page cache.
But there is a special call site, __extent_writepage(), as we have
special cases where upper layer can mark some page dirty without going
through set_page_dirty() interface.

I haven't yet seen any real world case for this, but if that's possible
then in __extent_writepage() we will call btrfs_page_clear_error()
before setting up the page->private, which can lead to NULL pointer
dereference.
Yeah it's hard to believe, but it's been there since almost the
beginning. Back then there was a hard BUG() in the fixup worker, I've
hit it randomly on x86_64,

https://lore.kernel.org/linux-btrfs/20111031154139.GF19328@twin.jikos.cz/ (local)

you could find a lot of other reports where it crashed inside
btrfs_writepage_fixup_worker.
Fix it by moving set_page_extent_mapped() call before
btrfs_page_clear_error().
And make sure in the error path we won't call anything subpage helper.
I'm not sure about the fix, because the whole fixup thing is not
entirely clear.
Fixes: 32443de3382b ("btrfs: introduce btrfs_subpage for data inodes")
Signed-off-by: Qu Wenruo <redacted>
---
I really hope we can have a more explicit comment about in exactly which
cases we can have such page, and maybe some test cases for it.
The only reliable test case was on s390 with a particular seed for fsx,
I still have it stored somewhere. On x86_64 it's very hard to hit.
In fact, I haven't really seen any case like this, and it doesn't really
make sense for me to make some MM layer code to mark a page dirty
without going through set_page_dirty() interface.
On s390 it's quick because the page state bits are stored in 2 places
and need to be synced. On x86_64 it's very unclear and low level arch
specific MM stuff but it is still a problem.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help