Thread (4 messages) 4 messages, 3 authors, 2020-03-30

Re: [RFC PATCH v1] pseries/drmem: don't cache node id in drmem_lmb struct

From: Scott Cheloha <hidden>
Date: 2020-03-12 16:39:05

Hi Michal,

On Thu, Mar 12, 2020 at 06:02:37AM +0100, Michal Suchánek wrote:
On Wed, Mar 11, 2020 at 06:08:15PM -0500, Scott Cheloha wrote:
quoted
At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Signed-off-by: Scott Cheloha <redacted>
---
The linear search in powerpc's memory_add_physaddr_to_nid() has become a
bottleneck at boot on systems with many LMBs.

As described in this patch here:

https://lore.kernel.org/linuxppc-dev/20200221172901.1596249-2-cheloha@linux.ibm.com/ (local)

the linear search seriously cripples drmem_init().

The obvious solution (shown in that patch) is to just make the search
in memory_add_physaddr_to_nid() faster.  An XArray seems well-suited
to the task of mapping an address range to an LMB object.

The less obvious approach is to just call memory_add_physaddr_to_nid()
in fewer places.

I'm not sure which approach is correct, hence the RFC.
You basically revert the below which will likely cause the very error
that was fixed there:

commit b2d3b5ee66f2a04a918cc043cec0c9ed3de58f40
Author: Nathan Fontenot [off-list ref]
Date:   Tue Oct 2 10:35:59 2018 -0500

    powerpc/pseries: Track LMB nid instead of using device tree
    
    When removing memory we need to remove the memory from the node
    it was added to instead of looking up the node it should be in
    in the device tree.
    
    During testing we have seen scenarios where the affinity for a
    LMB changes due to a partition migration or PRRN event. In these
    cases the node the LMB exists in may not match the node the device
    tree indicates it belongs in. This can lead to a system crash
    when trying to DLPAR remove the LMB after a migration or PRRN
    event. The current code looks up the node in the device tree to
    remove the LMB from, the crash occurs when we try to offline this
    node and it does not have any data, i.e. node_data[nid] == NULL.
I'm aware of this patch and that this is a near-total revert.

I'm not reintroducing the original behavior, though.  Instead of going
to the device tree to recompute the expected node id I'm retrieving it
from the LMB's corresponding memory_block.

That crucial difference is this chunk:
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -376,25 +376,29 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
+	struct memory_block *mem_block;
 	unsigned long block_sz;
 	int rc;
 
 	if (!lmb_is_removable(lmb))
 		return -EINVAL;
 
+	mem_block = lmb_to_memblock(lmb);
+	if (mem_block == NULL)
+		return -EINVAL;
+
 	rc = dlpar_offline_lmb(lmb);
 	if (rc)
 		return rc;
 
 	block_sz = pseries_memory_block_size();
 
-	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+	__remove_memory(mem_block->nid, lmb->base_addr, block_sz);
 
 	/* Update memory regions for memory remove */
 	memblock_remove(lmb->base_addr, block_sz);
 
 	invalidate_lmb_associativity_index(lmb);
-	lmb_clear_nid(lmb);
 	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
 	return 0;
---
Unless it's possible that the call:

	__add_memory(my_nid, my_addr, my_size);

does not yield the following:

	memory_block.nid == my_nid

on success, then I think the solution in my patch is equivalent to and
simpler than the one introduced in the patch you quote.

Can you see a way the above would not hold?

Then again, maybe there's a good reason not to retrieve the node id in
this way.  I'm thinking David Hildenbrand and/or Nathan Fontenont may
have some insight on that.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help