Thread (13 messages) 13 messages, 5 authors, 2007-11-06

Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA

From: Mingming Cao <hidden>
Date: 2007-11-05 23:21:38

On Mon, 2007-11-05 at 10:07 -0800, Badari Pulavarty wrote: 
On Tue, 2007-11-06 at 00:15 +0800, Andreas Dilger wrote:
quoted
On Nov 05, 2007  08:04 -0800, Badari Pulavarty wrote:
quoted
On Sat, 2007-11-03 at 09:36 +0800, Andreas Dilger wrote:
quoted
But...  this implies that every user of bh->b_data needs to kmap, and I
don't see that in the code anywhere else.  That makes me think something
else is going wrong here.
Most cases, this is handled in ll_rw_block() code - when we submit the
buffer head for IO. If the page is in highmem, we will end up creating
a bounce bufer for it. 

In our case, JBD code is trying to look at the data to do checksum
on it. Thats why we have to kmap() the page before looking.
My point is that there is a LOT of code in ext[234] that dereferences
bh->b_data without kmap() (e.g. group descriptors, bitmaps, superblock,
inode tables, etc).  Does that imply that something is forcing those
bh pages into lowmem, or is the journal bh page in question being
allocated in some different way that allows it to be in highmem?
Yes. You are right. Its been a while since I had to deal with HIGHMEM.
All the meta-data should be in LOWMEM. I asked Mingming to verify
what the buffer-head is pointing to when it has HIGHMEM page.
The buffer_heads with NULL bh->b_data(under the "start_journal_io"
branch in jbd2_journal_commit_transaction() code) is created by
jbd2_journal_write_metadata_buffer().

Noticed that in jbd2_journal_write_metadata_buffer(),  there are
multiple places which do kmap_atomic() to access the journal bh page
(new_page).  In the normal case the new_page is pointing to the bh
pages, which(the page) was initially allocated by _page_cache_alloc()
(sb_bread->__bread()->_...>find_or_create_page()->_page_cache_alloc()

In the case it need a data copy (the buffer start with the
JBD2_MAGIC_NUMBER?), a new page is allocated by by
__get_free_pages()(via jbd2_alloc, which is possible allocated in
highmem. __get_free_pages calls alloc_pages() directly, doesn't seem to
have highmem handling like __page_cache_alloc(). 

I am not sure why we saw this issue on 2.6.23 kernel, where
jbd2_slab_alloc()->kmem_cache_alloc() is used. Isn't all slab pages
under lowmem?


Regards,

Mingming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help