Re: [PATCH] powerpc: Add POWER9 copy_page() loop
From: Nicholas Piggin <npiggin@gmail.com>
Date: 2017-03-21 04:21:53
On Tue, 21 Mar 2017 15:01:03 +1100 Anton Blanchard [off-list ref] wrote:
Hi Nick,quoted
I've got a patch that makes alternate feature patching a bit more flexible and not hit relocation limits when using big "else" parts. I was thinking of doing something like _GLOBAL_TOC(copy_page) BEGIN_FTR_SECTION_NESTED(50) #include "copypage_power9.S" FTR_SECTION_ELSE_NESTED(50) #include "copypage_power7.S" ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_300, 50)Good idea, I hadn't thought of embedding it all in a feature section.
It may not work currently because you get those ftr_alt_97 relocation errors with the "else" parts because relative branches to other code need to be direct and I think reachable from both places.
quoted
I guess POWER asm doesn't need this but it's good practice to prevent copy paste errors? It would be nice to have some macros to hide all these constants, but that's for another patch. The commenting is good.The .machine X macros? Unfortunately the format of dcbt is different for recent server chips. This wasn't a great idea in retrospect because if you do get the instruction layout wrong, you wont get a fault to warn you.
Is that embedded vs server, or pre-POWER4 vs POWER4 and up? Anyway no big deal.
quoted
I don't suppose the stream setup is costly enough to consider touching a cacheline or two ahead before starting it?Starting up software streams is a bit of an art - if the demand loads get ahead then a hardware stream gets started before the software one. Note all the eieios to try and avoid this happening. I've struggled with software prefetch on previous chips and sometimes I wonder if it is worth the pain.
Oh I see. Makes sense.
quoted
(Also for another day) We might be able to avoid the stack and call for some common cases. Pretty small overcall cost I guess, but it could be beneficial for memcpy if not copy_page.Definitely. Also the breakpoint for using vector should be much lower if we have already saved the user state in a previous call.
Yes agreed. Another problem is multiple small mem/string/crypto operations may never trip the limit even if it would make sense. Difficult to improve that (kernel could provide a hint to the arch maybe).