Thread (13 messages) 13 messages, 4 authors, 2017-08-14

Re: quota: dqio_mutex design

From: Jan Kara <jack@suse.cz>
Date: 2017-08-03 14:23:23
Also in: linux-fsdevel

On Thu 03-08-17 16:55:40, Andrew Perepechko wrote:
Let me put it this way:

Under file creation from different threads, ext4 will generate a series of
dquot updates (incore and then ondisk, through journal):

dquot update1
dquot update2
dquot update3
...
dquot updateN

Either with my patch or without it, ondisk dquot update through journal
may miss dquot update1, dquot update2, ... dquot update{N-1}.

You can easily see that from the code of dquot_commit():

int dquot_commit(struct dquot *dquot)
{
        int ret = 0;
        struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);

        mutex_lock(&dqopt->dqio_mutex);
        spin_lock(&dq_list_lock);
        if (!clear_dquot_dirty(dquot)) {
                spin_unlock(&dq_list_lock);
                goto out_sem;
        }
...
}


If actual dquot_commit() wrote dquot update N, the threads commiting
updates 1 through N-1 will exit immediately once they get dqio_mutex
since the dquot will NOT be dirty.

My patch only avoids blocking on dqio_mutex when we know for sure
that another will NECESSARILY write the needed or a FRESHER dquot ondisk.
Yeah, I agree with Andrew. What they did is *almost* safe for ext4. The
only moment when it is not safe is when someone calls mark_dquot_dirty()
outside of a scope of a transaction which happens when doing Q_SETQUOTA
quotactl.

Another things which is subtle with Andrew's approach is that process
modifying quota information can return and stop its handle before quota
data gets copied to transaction buffer. This does not currently create any
real problem since nobody is relying on that however it relies on intimate
details of JBD2 transaction machinery and that could bite us in the future.

								Honza
quoted
quoted
This change mean if this dquot is dirty we skip, this
won't work because in this way, quota update is only kept in vfs dquota
memory and newer update is not wrote to journal file and not wrapped into
transaction too.
That's not true.

As I explained earlier, having DQ_MOD_B set at this point means another
thread is going to write dquot but hasn't yet started doing so. This thread
does not care whether it updates the ondisk dquot with its own data or with
fresher data which came from another thread. In-core dquot has no indication
of whose data in contains.

As I also explained earlier, the update cannot happen in the context of
another transaction because thread A which sees DQ_MOD_B set and thread
B which is running dquot_commit() both have journal handles to the same
transaction. There's only one running transaction at a time and thread B
does not switch to another transaction.

Please read the code carefully.
quoted
This is not what journal quota means to do.


Thanks,
Shilong
quoted
Thank you,
Andrew
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help