Thread (6 messages) 6 messages, 3 authors, 2012-07-05

Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem

From: Sage Weil <hidden>
Date: 2012-07-05 15:20:45
Also in: ceph-devel, linux-fsdevel, linux-mm, lkml

Possibly related (same subject, not in this thread)

On Wed, 4 Jul 2012, Sha Zhengju wrote:
On 07/02/2012 10:49 PM, Sage Weil wrote:
quoted
On Mon, 2 Jul 2012, Sha Zhengju wrote:
quoted
On 06/29/2012 01:21 PM, Sage Weil wrote:
quoted
On Thu, 28 Jun 2012, Sha Zhengju wrote:
quoted
From: Sha Zhengju<redacted>

Following we will treat SetPageDirty and dirty page accounting as an
integrated
operation. Filesystems had better use vfs interface directly to avoid
those details.

Signed-off-by: Sha Zhengju<redacted>
---
   fs/buffer.c                 |    2 +-
   fs/ceph/addr.c              |   20 ++------------------
   include/linux/buffer_head.h |    2 ++
   3 files changed, 5 insertions(+), 19 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index e8d96b8..55522dd 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -610,7 +610,7 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
    * If warn is true, then emit a warning if the page is not uptodate
and
has
    * not been truncated.
    */
-static int __set_page_dirty(struct page *page,
+int __set_page_dirty(struct page *page,
   		struct address_space *mapping, int warn)
   {
   	if (unlikely(!mapping))
This also needs an EXPORT_SYMBOL(__set_page_dirty) to allow ceph to
continue to build as a module.

With that fixed, the ceph bits are a welcome cleanup!

Acked-by: Sage Weil<redacted>
Further, I check the path again and may it be reworked as follows to avoid
undo?

__set_page_dirty();
__set_page_dirty();
ceph operations;                ==>                     if (page->mapping)
if (page->mapping)                                            ceph
operations;
     ;
else
     undo = 1;
if (undo)
     xxx;
Yep.  Taking another look at the original code, though, I'm worried that
one reason the __set_page_dirty() actions were spread out the way they are
is because we wanted to ensure that the ceph operations were always
performed when PagePrivate was set.
Sorry, I've lost something:

__set_page_dirty();                        __set_page_dirty();
ceph operations;
if(page->mapping)         ==>      if(page->mapping) {
       SetPagePrivate;                            SetPagePrivate;
else                                                      ceph operations;
    undo = 1;                                  }

if (undo)
    XXX;

I think this can ensure that ceph operations are performed together with
SetPagePrivate.
Yeah, that looks right, as long as the ceph accounting operations happen 
before SetPagePrivate.  I think it's no more or less racy than before, at 
least. 

The patch doesn't apply without the previous ones in the series, it looks 
like.  Do you want to prepare a new version or should I?

Thanks!
sage

 
quoted
It looks like invalidatepage won't get called if private isn't set, and
presumably it handles the truncate race with __set_page_dirty() properly
(right?).  What about writeback?  Do we need to worry about writepage[s]
getting called with a NULL page->private?
__set_page_dirty does handle racing conditions with truncate and
writeback writepage[s] also take page->private into consideration
which is done inside specific filesystems. I notice that ceph has handled
this in ceph_writepage().
Sorry, not vfs expert and maybe I've not caught your point...



Thanks,
Sha
quoted
Thanks!
sage


quoted

Thanks,
Sha
quoted
quoted
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8b67304..d028fbe 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -5,6 +5,7 @@
   #include<linux/mm.h>
   #include<linux/pagemap.h>
   #include<linux/writeback.h>	/* generic_writepages */
+#include<linux/buffer_head.h>
   #include<linux/slab.h>
   #include<linux/pagevec.h>
   #include<linux/task_io_accounting_ops.h>
@@ -73,14 +74,8 @@ static int ceph_set_page_dirty(struct page *page)
   	int undo = 0;
   	struct ceph_snap_context *snapc;

-	if (unlikely(!mapping))
-		return !TestSetPageDirty(page);
-
-	if (TestSetPageDirty(page)) {
-		dout("%p set_page_dirty %p idx %lu -- already
dirty\n",
-		     mapping->host, page, page->index);
+	if (!__set_page_dirty(page, mapping, 1))
   		return 0;
-	}

   	inode = mapping->host;
   	ci = ceph_inode(inode);
@@ -107,14 +102,7 @@ static int ceph_set_page_dirty(struct page *page)
   	     snapc, snapc->seq, snapc->num_snaps);
   	spin_unlock(&ci->i_ceph_lock);

-	/* now adjust page */
-	spin_lock_irq(&mapping->tree_lock);
   	if (page->mapping) {	/* Race with truncate? */
-		WARN_ON_ONCE(!PageUptodate(page));
-		account_page_dirtied(page, page->mapping);
-		radix_tree_tag_set(&mapping->page_tree,
-				page_index(page),
PAGECACHE_TAG_DIRTY);
-
   		/*
   		 * Reference snap context in page->private.  Also set
   		 * PagePrivate so that we get invalidatepage callback.
@@ -126,14 +114,10 @@ static int ceph_set_page_dirty(struct page
*page)
   		undo = 1;
   	}

-	spin_unlock_irq(&mapping->tree_lock);
-
   	if (undo)
   		/* whoops, we failed to dirty the page */
   		ceph_put_wrbuffer_cap_refs(ci, 1, snapc);

-	__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-
   	BUG_ON(!PageDirty(page));
   	return 1;
   }
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 458f497..0a331a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -336,6 +336,8 @@ static inline void lock_buffer(struct buffer_head
*bh)
   }

   extern int __set_page_dirty_buffers(struct page *page);
+extern int __set_page_dirty(struct page *page,
+		struct address_space *mapping, int warn);

   #else /* CONFIG_BLOCK */

-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help