Re: Possible leak during reshaping layout

From: NeilBrown <hidden>
Date: 2014-07-21 07:26:51

On Sat, 19 Jul 2014 22:27:00 -0700 Kenny Root [off-list ref] wrote:

I may have stumbled into a kernel memory leak during reshaping of a RAID 10
from offset to near layout:

I have a RAID 10 array which was previously in offset layout. I decided to
reshape to a near layout. Eventually the machine had become very sluggish,
the load average shot up, and the reshape slowed down to nearly nothing.

    md127 : active raid10 sdh1[2] sdk1[3] sdf1[0] sdg1[1]
          7813771264 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
          [=========>...........]  reshape = 49.5% (3872227840/7813771264) finish=63624.5min speed=1032K/sec

A look at slabtop appears to show that there is an allocation that is
larger than the physical RAM (16GB):

     Active / Total Objects (% used)    : 61551490 / 61918456 (99.4%)
     Active / Total Slabs (% used)      : 2209811 / 2209811 (100.0%)
     Active / Total Caches (% used)     : 76 / 99 (76.8%)
     Active / Total Size (% used)       : 15241504.92K / 15319798.41K (99.5%)
     Minimum / Average / Maximum Object : 0.01K / 0.25K / 15.69K

      OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
    60511744 60511219  29%    0.25K 2183366       32  17466928K kmalloc-256
    193408  82391  42%    0.06K   3022       64     12088K kmalloc-64
    154880 129949  83%    0.03K   1210      128      4840K kmalloc-32
    154624 152783  98%    0.01K    302      512      1208K kmalloc-8
    144160 143412  99%    0.02K    848      170      3392K fsnotify_event_holder
    125103  34053  27%    0.08K   2453       51      9812K selinux_inode_security

This very suspicious.
As you might imagine, it is not possible for a slab to use more memory than
is physically available.
It claims there are 60511219 active objects out of a total of 60511744.
I calculate that as 99.9999132%, but it suggests 29%.

If there were 32 OBJ/SLAB, then the slabs must be 8K.  This is possible, but
they are 4K on my machine, and all the other slabs you listed are too.

I've tried a similar reshape on 3.16-rc3 and there is no similar leak.

The only patch since 3.13 that could possibly be relevant is

commit cc13b1d1500656a20e41960668f3392dda9fa6e2
Author: NeilBrown [off-list ref]
Date:   Mon May 5 13:34:37 2014 +1000

    md/raid10: call wait_barrier() for each request submitted.

That might fix a leak.  However the leak it might fix was introduced in
3.14-rc1:
    commit 20d0189b1012a37d2533a87fb451f7852f2418d1
        block: Introduce new bio_split()

So unless Fedora backported one of those but not the other I don't see how
this can be caused by RAID10.

What does /proc/slabinfo contain?  Maybe "slabtop" is presenting it poorly.

NeilBrown

Output of mdadm -D:

/dev/md127:
        Version : 1.2
  Creation Time : Wed Dec 20 19:41:25 2013
     Raid Level : raid10
     Array Size : 7813771264 (7451.79 GiB 8001.30 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Sat Jul 19 22:20:55 2014
          State : active, reshaping
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : offset=2
     Chunk Size : 512K

 Reshape Status : 49% complete
     New Layout : near=2, far=1

           Name : local:home  (local to host local)
           UUID : 3102a888:f08888a8:da88e888:c6288888
         Events : 70841

    Number   Major   Minor   RaidDevice State
       0       8       81        0      active sync   /dev/sdf1
       1       8       97        1      active sync   /dev/sdg1
       2       8      113        2      active sync   /dev/sdh1
       3       8      161        3      active sync   /dev/sdk1

uname -r output:
3.13.6-200.fc20.x86_64
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachments

signature.asc [application/pgp-signature] 828 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help