Thread (2 messages) 2 messages, 2 authors, 2008-03-20

Re: delayed allocation result in BUG at fs/buffer.c:2880!

From: Aneesh Kumar K.V <hidden>
Date: 2008-03-20 18:02:43

Possibly related (same subject, not in this thread)

Adding linux-ext4 back.

On Thu, Mar 20, 2008 at 10:21:49AM -0700, Mingming Cao wrote:
On Thu, 2008-03-20 at 20:46 +0530, Aneesh Kumar K.V wrote:
quoted
On Thu, Mar 20, 2008 at 05:04:47PM +0300, Dmitri Monakhov wrote:
quoted
On 17:39 Thu 20 Mar     , Aneesh Kumar K.V wrote:
quoted
On Thu, Mar 20, 2008 at 11:16:19AM +0300, Dmitri Monakhov wrote:
quoted
On 21:39 Wed 19 Mar     , Eric Sandeen wrote:
quoted
Solofo.Ramangalahy@bull.net wrote:
quoted
Hello,

During stress testing (workload: racer from ltp + fio/iometer), here
is an error I am encountering:
8<------------------------------------------------------------------------------
kernel: WARNING: at fs/buffer.c:1680 __block_write_full_page+0xd4/0x2af()
So this is WARN_ON(bh->b_size != blocksize);

What is b_size in this case?
FS block size, because this page pinned bh (it comes from page_buffers(page)), but
not dummy bh which may comes from {write,read}pages or direct_IO. 
Page's bh i_size must always be equal to fs blocksize.
This bh always constructed via following construction
if (!page_has_buffers(page))
	create_empty_buffers(page, 1<<inode->i_blkbits, flags)
So page's bh->b_size was inited with right value from very beginning, but
apparently somewhere this size was changed 
I guess i've localized buggy place, at least it's looks strange.
ext4_da_get_block_prep ()
{
...
	BUG_ON(create == 0);
        BUG_ON(bh_result->b_size != inode->i_sb->s_blocksize);
	ret =  ext4_get_blocks_wrap(NULL,  inode, iblock, 1,  bh_result, 0, 0);
#Here ext4_get_block_write called with max_blocks == 1  ^^^^^
	...
	if (ret > 0) {
                        bh_result->b_size = (ret << inode->i_blkbits);
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
## I don't understand this place. I hoped what (ret <= max_blocks) must always
##be true true. But after I've add debug info printing I've got following result.
                ret = 0;
        }
...
}
Some times I've seen following ,message 
 bh= {state=0,size=114688, blknr=18446744073709551615 dev=0000000000000000,count=0}, ret=28
And because it was page-cache's bh later this result in WARNING.
Is that a fallocate space ?. For falloc space we can return values
greater than max_blocks. ext4_ext_get_blocks was made to return  >0
for a read on prealloc space to ensure delalloc doesn't reserve space
for the same. I guess we need to make sure we don't return more than
max_blocks. Can you try the patch below
Ok Warning has gone, but resulted bh still incorrectly filled.
I've found what function ext4_da_get_block_prep() return bh witch 
is !mapped and !delayed, which is prohibited because it is always called with
create != 0. BH debug info at the end of this function result in following msg

BH={state=0, size=4096, blknr=18446744073709551615,dev=0000000000000000,
  count=0} block =288 ret=1

Later this incorrectly filled bh result in BUG_ON triggering
------------[ cut here ]------------
 kernel BUG at fs/buffer.c:2880!
 invalid opcode: 0000 [1] SMP
 CPU 1
.....
quoted
quoted
quoted
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d6ae40a..4985fd5 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2600,8 +2600,18 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 			}
 			if (create == EXT4_CREATE_UNINITIALIZED_EXT)
 				goto out;
-			if (!create)
+			if (!create) {
+				/*
+				 * We have blocks reserved already. We
+				 * return allocated blocks so that delalloc
+				 * won't do block reservation for us. But
+				 * the buffer head will be unmapped so that
+				 * a read from the block return 0
+				 */
+				if (allocated > max_blocks)
+					allocated = max_blocks;
 				goto out2;
+			}
 
 			ret = ext4_ext_convert_to_initialized(handle, inode,
 								path, iblock,
With prealloc space we still need to make sure buffer heads are marked
new and delayed. 
I doubt this. prealloc space should not mark as delayed. The allocation
already done. delayed flag triggeres block reservation for delayed
allocation, with is not needed for preallocation, that will cause double
accounting for free space.

With delayed allocation, where hit preallocated space, get_block() right
now returns bh as new but return value > 0 (it's possible that returns >
maxblocks, as we just return a single large extent).

As Dmitri mentioned in the previous mail if the buffer head is not
marked as delayed or new, in __block_prepare_write after get_block
we would do a ll_rw_block(READ, 1, &bh); and that will result in BUG_ON.

quoted
Only difference between prealloc and get_block failure
case should be in failure case we need to do block reservation.
Correct, in the failure case, the returned number of blocks from
get_block() is 0, but with preallocation, the return value is positive.
Both case the resulting bh is remains new, unmapped.
quoted
  With
prealloc we still like to get get_block called again with create = 1
so that the uninit extent get split. 
I could not see why we still need doing create =1 at write_begin time
with delayed allocation, if the space has already preallocated.

The preallocation extent split could be defered at write out time,
get_block() is always called with create = 1 at writepage() time. 

I meant at writepage time.


quoted
I would also like to test it locally. How are you reproducing it. Just
fsstress won't reproduce it right ?
Not sure which ext4 tree Dmitri is testing, I have a patch to handle
preallocation case in delayed allocation, I wonder if that makes the
problem goes away? 

http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=delalloc-ext4-preallocation-handling.patch;h=ba3b70ecba99137d452b6692c92caabe8831392e;hb=80aeb2ef59cdb97bf527570cb273f6e5d5d27e3f
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help