Thread (13 messages) 13 messages, 4 authors, 2017-02-07

Re: [PATCH] powerpc/mm: Fix RECLAIM_DISTANCE

From: Gavin Shan <hidden>
Date: 2017-01-31 05:41:35

On Tue, Jan 31, 2017 at 04:01:05PM +1100, Gavin Shan wrote:
On Tue, Jan 31, 2017 at 08:11:39AM +1100, Michael Ellerman wrote:
quoted
Gavin Shan [off-list ref] writes:

I'd like to see some test results from multi-node systems.

I'd also like to understand what has changed since we changed
RECLAIM_DISTANCE in the first place, ie. why did it used to work and now
doesn't?
[Ccing Mel]

Michael, thanks for review. I would like to explain a bit more. The issue
addressed by the patch is irrelevant to the number of NUMA nodes. There
is one procfs entry ("/proc/sys/vm/zone_reclaim_mode") which corresponds
to variable @node_reclaim_mode (their names don't match!). it can have
belowing bits or any combination of them. Its default value is RECLAIM_OFF (0).
Note RECLAIM_ZONE was obsoleted and I will send one patch to remove it
later.

#define RECLAIM_OFF 0
#define RECLAIM_ZONE (1<<0)     /* Run shrink_inactive_list on the zone */
#define RECLAIM_WRITE (1<<1)    /* Writeout pages during reclaim */
#define RECLAIM_UNMAP (1<<2)    /* Unmap pages during reclaim */

When @node_reclaim_mode is set to (RECLAIM_WRITE | RECLAIM_UNMAP), node_reclaim()
isn't called on the preferred node as the condition is false: zone_allows_reclaim(
node-A, node-A). As I observed, the distance from node-A to node-A is 10, equal to
RECLAIM_DISTANCE.

static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
{
       return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <
                               RECLAIM_DISTANCE;
}


__alloc_pages_nodemask
  get_page_from_freelist     <- WATERMARK_LOW
     zone_watermark_fast     <- Assume the allocation is breaking WATERMARK_LOW
        node_reclaim         <- @node_reclaim_node isn't 0 and
                                zone_allows_reclaim(preferred_zone, current_zone) returns true
        __node_reclaim       <- SWAP, WRITEPAGE and UNMAP setting from @node_reclaim_node
        shrink_node
     buffered_rmqueue
  __alloc_pages_slowpath
     get_page_from_freelist        <- WATERMARK_MIN
     __alloc_pages_direct_compact  <- If it's costly allocation (order > 3)
     wake_all_kswapds
     get_page_from_freelist        <- NO_WATERMARK, CPU local node is set to preferred one
     __alloc_pages_direct_reclaim
        __perform_reclaim
        try_to_free_pages          <- WRITEPAGE + UNMAP + SWAP
        do_try_to_free_pages
        shrink_zones               <- Stop until priority (12) reaches to 0 or reclaimed enough
        shrink_node
     __alloc_pages_direct_compact


Also, RECLAIM_DISTANCE is set to 30 in include/linux/topology.h. It's used when arch
doesn't provide one. It's why I set this macro to 30 in this patch. This issue is 
introduced by commit 5f7a75acdb2 ("mm: page_alloc: do not cache reclaim distances").
In the patch, it had wrong replacement. So I would correct the wrong replacement
alternatively. Or both of them. Which way do you think is the best? Maybe Mel also
has thoughts.

    39  static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone)
    40  {
    41 -       return node_isset(local_zone->node, zone->zone_pgdat->reclaim_nodes);
    42 -}
    43 -
    44 -static void __paginginit init_zone_allows_reclaim(int nid)
    45 -{
    46 -       int i;
    47 -
    48 -       for_each_node_state(i, N_MEMORY)
    49 -               if (node_distance(nid, i) <= RECLAIM_DISTANCE)
    50 -                       node_set(i, NODE_DATA(nid)->reclaim_nodes);
    51 +       return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <
    52 +                               RECLAIM_DISTANCE;
    53  }
Sorry, to make it clear. The patch replaced "<=" with "<" wrongly :)

Thanks,
Gavin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help