Thread (18 messages) 18 messages, 7 authors, 2011-03-01

Re: [PATCH] btrfs file write debugging patch

From: Chris Mason <hidden>
Date: 2011-02-28 14:00:17

Possibly related (same subject, not in this thread)

Excerpts from Johannes Hirte's message of 2011-02-28 05:13:59 -0500:
On Monday 28 February 2011 02:46:05 Chris Mason wrote:
quoted
Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
quoted
Some clarification on my previous message...

After looking at my ftrace log more closely, I can see where Btrfs is
trying to release the allocated pages.  However, the calculation for
the number of dirty_pages is equal to 1 when "copied == 0".

So I'm seeing at least two problems:
(1)  It keeps looping when "copied == 0".
(2)  One dirty page is not being released on every loop even though
"copied == 0" (at least this problem keeps it from being an infinite
loop by eventually exhausting reserveable space on the disk).
Hi everyone,

There are actually tow bugs here.  First the one that Mitch hit, and a
second one that still results in bad file_write results with my
debugging hunks (the first two hunks below) in place.

My patch fixes Mitch's bug by checking for copied == 0 after
btrfs_copy_from_user and going the correct delalloc accounting.  This
one looks solved, but you'll notice the patch is bigger.

First, I add some random failures to btrfs_copy_from_user() by failing
everyone once and a while.  This was much more reliable than trying to
use memory pressure than making copy_from_user fail.

If copy_from_user fails and we partially update a page, we end up with a
page that may go away due to memory pressure.  But, btrfs_file_write
assumes that only the first and last page may have good data that needs
to be read off the disk.

This patch ditches that code and puts it into prepare_pages instead.
But I'm still having some errors during long stress.sh runs.  Ideas are
more than welcome, hopefully some other timezones will kick in ideas
while I sleep.
At least it doesn't fix the emerge-problem for me. The behavior is now the same 
as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no 
further interaction to get the emerge-process hang with a svn-process 
consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the 
spawned svn-process stays and it needs a reboot to get rid of it. 
I think your problem really is more enospc related.  Still working on
that as well.  But please don't try the patch without removing the
debugging hunk at the top (anything that mentions jiffies).

-chris
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help