Thread (6 messages) 6 messages, 3 authors, 2009-05-22

Re: [PATCH] md: Protecting mddev with barriers to avoid races

From: SandeepKsinha <hidden>
Date: 2009-05-22 13:41:12

On Fri, May 22, 2009 at 5:03 PM, NeilBrown [off-list ref] wrote:
On Fri, May 22, 2009 7:46 pm, SandeepKsinha wrote:
quoted
Hi Neil,


On Fri, May 22, 2009 at 2:28 PM, NeilBrown [off-list ref] wrote:
quoted
On Fri, May 22, 2009 6:36 pm, Andre Noll wrote:
quoted
On 13:48, SandeepKsinha wrote:
quoted
Signed-off-by: Sandeep K Sinha <redacted>
Date:   Fri May 22 13:46:56 2009 +0530

    Protecting mddev with barriers to avoid races.
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 9ad6ec4..e2dbda8 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -28,10 +28,12 @@
 static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector)
 {
        int lo, mid, hi;
-       linear_conf_t *conf = mddev_to_conf(mddev);
+       linear_conf_t *conf;

        lo = 0;
        hi = mddev->raid_disks - 1;
+       barrier();
+       conf = mddev_to_conf(mddev);
Are you sure that a compiler barrier is sufficient here? Maybe we
need smp_mb(), but I'm not familar with all the different memory
barrier types that exist in the kernel.
I think we just want a read barrier here:
   smb_rmb();
because it is a barrier between reading mddev->raid_disk and
reading mddev->private.

In linear_add, we want a write barrier
   smb_wmb();
before setting mdev->raid_disks, and I think we also want one
before setting mddev->private to make sure that all the assignments
to newconf->whatever are stable.
Do we really need one before setting mddev->private? Will newconf be
used by anyone before its assigned to mddev->private?
No. Then what makes you put a write barrier before it?
I'm not 100% certain we do need it, but the logic goes like that:

One processor does
 newconf->disks[0].end_sector = X;
 mddev->private = newconf;

and another processor does
  conf = mddev->private;
  foo = conf->disk[0].end_sector;

Due to the lack of memory ordering guarantees, foo could end up
with the value of end_sector *before* X was assigned to it.
Not sure but can be a possibility. But I feel if thats the case, we
might have to revisit the whole md code. I see similar instances at
lot many places.

In fact, rcu_assign_pointer (in include/linux/rcupdate.h) has
exactly this purpose: place an smp_wmb before assigning the address
of a structure to a pointer, to ensure that the content of the
structure is globally visible before the address of the structure
become visible.
So I would probably do

  rcu_assign_pointer(mddev->private, newconf);
  smb_wmb();
  mddev->raid_disks ++;

An alternative (which actually might be an improvement, I'm not
sure) is to have a conf->raid_disks field and use that value in
which_dev.  Then it would be more like

  mddev->raid_disk++;
  newconf->raid_disk = mddev->raid_disks;
  rcu_assign_pointer(mddev->private, newconf);

and we only have one memory barrier.
The which_dev code then becomes:

  conf = rcu_dereference(mddev->private);
  hi = conf->raid_disks = 1;
  .....
I like this approach more. And this will surely keep consistency
throughout the code.
and again we have only one memory barrier.  The significant benefit
of this is that instead of using the low level smp_[rw]mb() operations,
we use the higher level rcu_* interfaces which make it a lot more
obvious what we are doing.

It would then be a very small step use use rcu_dereference for
all mddev->private accesses in linear.c, wrap them in rcu_read_{,un}lock,
and then use call_rcu to arrange for the old conf_t to be freed.
I am going to try this.

Will keep you posted.

NeilBrown


-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help