Thread (3 messages) 3 messages, 2 authors, 2009-03-03

Re: [RFC][PATCH 2/3] ext4: Exchange the blocks between two inodes

From: Akira Fujita <hidden>
Date: 2009-03-03 23:27:07

Hi Ted,
Thank you for comment. :-)

Theodore Tso wrote:
quoted
+
+	up_write(&EXT4_I(org_inode)->i_data_sem);
+	ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags,
+								&page, &fsdata);
+	down_write(&EXT4_I(org_inode)->i_data_sem);
    
This is going to be a problem.  Once we release i_data_sem, there is
the possibility that other processes which might be running and
accessing the file at the same time that the defragger is running
could be blocked waiting for i_data_sem to be released.  The moment it
gets released, they will grab the lock then start to modify extent
tree --- either allocating new blocks to it, or worse, truncating or
unlinking the target inode.

This is going to be a mess to fix, since Linux doesn't have recursive
locking primitives.  We do take i_mutex, which will protect us against
truncates, but it won't protect against a write() system call.  Also,
if there inode has delayed allocation blocks pending, those could get
written out by the page cleaner, and i_mutex won't protect us against
changes to i_data_sem, either.

  
As you said, we take i_mutex at the start of ext4_defrag()
and hold it until the end of this function,
so orig file is protected against truncates and
it never be shrunk during defrag.

On the other hand, semaphore is released/taken around write_begin()
in ext4_defrag_partial() every page, so it does not protect orig file
against a write() system call from other process.  
So that defrag result (fragmentation) might not be best,
but data corruption does not occur at least.
So I think it is not a serious problem.

As above, it is not necessary to lock the whole of ext4_defrag()
with semaphore, we should just lock only a necessary point.
Therefore defrag V1's lock seems have unneeded lock points.
I will change lock point and semaphore type in the next version.

Do I overlook something?

Regards,
Akira Fujita

We could add special-case kludgery to wrap around all of the places
that takes or release the i_data_sem so that we get recursive locking
support --- but that would be very ugly indeed.

I'm not sure what's the best way to deal with this; maybe if we sleep
on it someone will come up with a better suggestion --- but it's
something that we have to figure out.

	  					- Ted

  
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help