Thread (21 messages) 21 messages, 3 authors, 2011-11-18

Re: [PATCH v2 0/8] Filesystem io types statistic

From: Zheng Liu <hidden>
Date: 2011-11-16 08:43:11
Also in: linux-fsdevel

On Tue, Nov 15, 2011 at 10:34:20AM -0800, Aditya Kali wrote:
quoted hunk ↗ jump to hunk
On Mon, Nov 14, 2011 at 5:35 AM, Zheng Liu [off-list ref] wrote:
quoted
On Mon, Nov 14, 2011 at 10:23:01AM +0000, Steven Whitehouse wrote:
quoted
Hi,

On Fri, 2011-11-11 at 23:32 +0800, Zheng Liu wrote:
quoted
On Fri, Nov 11, 2011 at 10:55:26AM +0000, Steven Whitehouse wrote:
quoted
Hi,

On Thu, 2011-11-10 at 18:34 +0800, Zheng Liu wrote:
quoted
Hi all,

v1->v2: totally redesign this mechanism

This patchset implements an io types statistic mechanism for filesystem
and it has been added into ext4 to let us know how the ext4 is used by
applications. It is useful for us to analyze how to improve the filesystem
and applications. Nowadays, I have added it into ext4, but other filesytems
also can use it to count the io types by themselves.

A 'Issue' flag is added into buffer_head and will be set in submit_bh().
Thus, we can check this flag in filesystem to know that a request is issued
to the disk when this flag is set. Filesystems just need to check it in
read operation because filesystem should know whehter a write request hits
cache or not, at least in ext4. In filesystem, buffer needs to be locked in
checking and clearing this flag, but it doesn't cost much overhead.
Hi Steve,

Thank you for your attention.
quoted
There is already a REQ_META flag available which allows distinction
between data and metadata I/O (at least when they are not contained
within the same block). If that was to be extended to allow some
filesystem specific bits that would solve the problem that you appear to
be addressing with these patches in a fs independent way.
You are right. REQ_META flag quite can distinguish between metadata and
data. But it is difficulty to check this flag in filesystem because
buffer_head doesn't use it and most of filesystems still use buffer_head
to submit a IO request. This is the reason why I added a new flag into
buffer_head.
I don't understand what you mean here.... submit_bh() takes a bh and a
set of REQ flags, so there is no reason to not use REQ_META. Using a bh
doesn't prevent setting those flags. The issue is only that few bits
remain unused in those flags and solving the problem in a "nice" way, by
adding a few more flags, may be tricky.
Hi,

Please let me explain why a new flag is needed in buffer_head.

The goal of this patchset wants to provide a mechanism to let
filesystems can inspect how much different types of IOs are issued to
the disk. The types not only are divided into metadata and data. The
detailed types are needed, such as super_block, inode, EA and so on.
So filesystem needs to define some counters to save the result and
increase these counters when it makes a request. But filesystems couldn't
know whether or not this request is issued to the disk because the data
might be in page cache, at least read operation is like that. So we need
a solution to let filesystems know that. Meanwhile filesystems can free
choose whether or not providing the statistic result.

A new flag can be added into buffer_head and is set when the request is
really issued to the disk to let filesystem know that. But it seems that
REQ_META flag could not fit for us because REQ flags are used in bio.
Buffer_head couldn't use these flags. So filesystem cannot check this
flag that has been set or not. Further, AFAIK, some filesystems (e.g.
ext4) call sb_bread() and sb_breadahead() to do a read operation besides
submit_bh() and ll_rw_block(). It seems that there is no way to check
REQ_META flag from buffer_head too.
As part of some other work, I had added ext4's own submit_bh functions
and replaced all the calls to submit_bh() and ll_rw_block() with
these:

------ x ------
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
+void ext4_submit_bh_read_nowait(int rw, struct buffer_head *bh)
+{
+       BUG_ON(rw & WRITE);
+       BUG_ON(!buffer_locked(bh));
+       get_bh(bh);
+       bh->b_end_io = end_buffer_read_sync;
+       submit_bh(rw, bh);
+}
+
+int ext4_submit_bh_read(int rw, struct buffer_head *bh)
+{
+       BUG_ON(rw & WRITE);
+       BUG_ON(!buffer_locked(bh));
+
+       if (buffer_uptodate(bh)) {
+               unlock_buffer(bh);
+               return 0;
+       }
+
+       ext4_submit_bh_read_nowait(rw, bh);
+       wait_on_buffer(bh);
+       if (buffer_uptodate(bh))
+               return 0;
+       return -EIO;
+}
+
 struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
                               ext4_lblk_t block, int create, int *err)
 {
@@ -1572,11 +1598,9 @@ struct buffer_head *ext4_bread(handle_t
*handle, struct inode *inode,
        bh = ext4_getblk(handle, inode, block, create, err);
        if (!bh)
                return bh;
-       if (buffer_uptodate(bh))
+       if (bh_uptodate_or_lock(bh))
                return bh;
-       ll_rw_block(READ_META, 1, &bh);
-       wait_on_buffer(bh);
-       if (buffer_uptodate(bh))
+       if (!ext4_submit_bh_read(READ_META, bh))
                return bh;
        put_bh(bh);
        *err = -EIO;

------ x ------

I had made the change only for reads, but it should be easy to make it
do writes to. Also, this function can take ext4 specific flags and you
can do your accounting at a single place in ext4. So, you wont need
any more flags for buffer head.
Will this approach help in what you are trying to do?

Thanks,
Hi Aditya,

Thank you for your patch. It quite can help me to solve my problem. We
can define some wrapper functions to do our accounting in ext4. But it
seems that this approach is just suitable for ext4. I prefer to
provide a fs independent solution. Steven and I are talking about how to
implement it to let other filesystems can use this mechanism directly to
do their accouting. If you have some suggestions, feel free to tell me.

Regards,
Zheng
quoted
Hopefully the explaination is clear enough, and any comments or
suggestions are welcome. Thanks again. :-)

Regards,
Zheng
quoted
quoted
quoted
That would probably have already been done, except that the REQ_ flags
field is already almost full - so it might need the addition of an extra
field or some other solution.
In v1[1], a structure called ios is defined. This structure saves some
information (e.g. IO type) and a callback function. Some interfaces in
buffer layer are modifed to add a new argument that points to this
structure. When this request doesn't hit cache and is issued to the
disk, the callback function in this structure will be called. Filesystem
can define a function to do some operations. A defect in this solution
is that it needs to change some interfaces, such bread, breadahead and
so on. So in v2, I re-implement a new mechanism.
quoted
Either way, an fs independent solution to this problem would be worth
considering,
Yes, I am willing to implement an fs independent solution. This is my
original intention too. So any suggestions are welcome. Thank you.

[1] http://www.spinics.net/lists/linux-ext4/msg28608.html

Regards,
Zheng
Ok. Sounds good. GFS2 already sets REQ_META in all places where metadata
is being written. Now that REQ_META as been demerged from the REQ_PRIO
flag, there is no reason that filesystems cannot set it without fear of
side effects. Its only purpose is as a notification to blktrace now,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Aditya
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help