Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal

From: Surbhi Palande <hidden>
Date: 2012-01-11 06:06:59

Possibly related (same subject, not in this thread)

2012-01-11 · Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal · Eric Sandeen <hidden>
2012-01-10 · Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal · Surbhi Palande <hidden>
2012-01-10 · Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal · Surbhi Palande <hidden>
2012-01-10 · Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal · Jan Kara <jack@suse.cz>
2012-01-10 · Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal · Eric Sandeen <hidden>

(Resending one of the last cases, as the two task scenario got mixed! )

Now following is the case where the writing
process/jbd2_journal_start() calls read_lock() *before*
jbd2_journal_lock_updates() are called by the freezing process.

        CPU1						                              CPU2
    Task1 (write operation)				                               Task2
tx	   read_lock(&journal->j_state_lock)		
tx+1							               jbd2_journal_lock_updates() (will be stuck
at the next instance...)
tx+3 	if(journal->j_barrier_count)			
write_lock(journal->j_state_lock) /* stuck here till Task1
relinquishes the lock*/
tx+4  handle started and associated with
        running transaction.
tx+5   read_unlock(&journal->j_state_lock)
tx+6							               jbd2_journal_lock_updates() succeeds
tx+7							               jbd2_journal_flush() /* This shall flush the
running transactions */
							                 /* There are no outstanding transcations. No
new ext4_journal_start() will succeed by this point */

--------------------------------
Thus:

Though a racy transaction can really be started after SB_FREEZE_WRITE,
either that transaction will be flushed by jbd2_journal_flush or it
cannot be started at all because of the barrier count.

So, I do not understand why the journaled writes should really have a
race window after this patch is applied?

Regards,
Surbhi


On Tue, Jan 10, 2012 at 9:38 PM, Surbhi Palande [off-list ref] wrote:

On second thoughts, I fail to see why there is still a race window
after this patch.

Here are the reasons why i fail to see how the data can be dirtied
when all the operations involve a journal:

----------
So here is the problem that we see
       CPU1                                                     CPU2
      Task1 (write operation)                                    Task2
---------------------------------------------------------------------------------------
t1      ext4_journal_start()
t2        ext4_journal_start_sb()
t3          vfs_check_frozen                            sb->frozen=SB_FREEZE_WRITE
t4              jbd2_journal_start()                    /* hence forth all processes calling
vfs_check_frozen will wait */

Now, our aim is to stop Task1 from dirtying the page cache ie in
starting this transaction. However if it is successful in starting
this transaction, then we want to make sure that this transaction is
flushed out.
Correct?

(with the journal freeze patch applied)
* Task1 is now executing the code of jbd2_journal_start(). Task2 is
the freezing process.

       CPU1                                            CPU2
      Task1 (write operation)                          Task2
t4      jbd2_journal_start()
...
tx         read_lock(&journal->j_state_lock)

Now two things can happen at this point with respect to Task2:
a) either journal->j_flags is set to JBD2_FROZEN already
b) or it is not set.


Lets look at both the cases:
A) When journal->j_flags is set to JBD2_FROZEN already then
jbd2_journal_start() will get stuck on the waitqueue as long as  the
journal->j_flags is JBD2_FROZEN. So we cannot create dirty data in
this case.

B) When journal->j_flags is not set to JBD2_ FROZEN then two more
things could happen:
Task2 has already finished the call to jbd2_journal_flush() by the
time Task1 calls read_lock(). So now no new task (T1) should create
any new dirty data as that will not get flushed out. So we really want
to stop the jbd2_journal_start() from succeeding.


       CPU1                                                          CPU2
      Task1 (write operation)                                         Task2
----------------------------------------------------------------------------------------------
tx         read_lock(&journal->j_state_lock)
jbd2_journal_lock_updates() /*journal->j_barrier_count incremented -
so non zero ! */
tx+1                                                                         jbd2_journal_flush()
tx+2                                                                         write_lock(&journal->j_state_lock);
                                                                            /* till the read_lock is relinquished by
task1 , we are stuck */
tx+3    if(journal->j_barrier_count)
               read_unlock(&journal->j_state_lock);

      /* so we can set the journal->j_flags now as write_lock()
succeeds here */
tx+4     goto repeat

The above case is where the writing process/jbd2_journal_start() calls
read_lock() *after* jbd2_journal_lock_updates() are called by the
freezing process and hence is protected by the j_barrier_count check.
This transaction can definitely not start!

Now following is the case where the writing
process/jbd2_journal_start() calls read_lock() *before*
jbd2_journal_lock_updates() are called by the freezing process.

However, if Task1 already finished starting the transaction as follows:
       CPU1                                                                                         CPU2
      Task1 (write operation)
        Task2
-------------------------------------------------------------------------------------------------------------------------------------------------------------
tx         read_lock(&journal->j_state_lock)
tx+1
jbd2_journal_lock_updates() (will be stuck at the next instance...)
tx+3        if(journal->j_barrier_count)
write_lock(journal->j_state_lock)
           /* journal->j_barrier_count = 0, so we proceed here*/
              /* stuck here till Task1 relinquishes the lock*/
tx+4     read_unlock(&journal->j_state_lock);
tx+5 now start_this_handle() returns successfully
     and associates this handle with the running
       transaction.
tx+6
jbd2_journal_lock_updates() succeeds
tx+7
jbd2_journal_flush() /* This shall flush the running transactions */
                                                                                              /* There are no
outstanding transcations. No new ext4_journal_start() will succeed by
this                                                                    point*/

--------------------------------
Thus:

Though a racy transaction can really be started after SB_FREEZE_WRITE,
either that transaction will be flushed by jbd2_journal_flush or it
cannot be started at all because of the barrier count.

So, I do not understand why the journaled writes should really have a
race window after this patch is applied?

Thanks!

Regards,
Surbhi







On Tue, Jan 10, 2012 at 4:13 PM, Surbhi Palande [off-list ref] wrote:

quoted

Hi Jan,

quoted


If all the write operations were journaled, then this patch would not allow
ext4 filesystem to have any dirty data after its frozen.
(as journal_start() would block).

 I think the only one candidate that creates dirty data without calling
ext4_journal_start() is mmapped?

 No, the problem is in any write path. The problem is with operations
that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
operations dirty the filesystem but running sync may easily miss them.
During this phase journal is not frozen so that does not help you in any
way.

                                                               Honza

Ok! No new transaction can really start after the journal is frozen.
But we can have dirty data after SB_FREEZE_WRITE and before
SB_FREEZE_TRANS.
I agree with you. However, can this be fixed by adding a
sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
over?


So then essentially, when freeze_super() returns, the page cache is clean?

I do definitely agree that the fix is to add a lock for mutual
exclusion between freeze filesystem and writes to a frozen filesystem.

Thanks!

Regards,
Surbhi.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help