Thread (7 messages) 7 messages, 5 authors, 2018-01-27

Re: [PATCH] powerpc: Add POWER9 copy_page() loop

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2017-03-21 04:21:53

On Tue, 21 Mar 2017 15:01:03 +1100
Anton Blanchard [off-list ref] wrote:
Hi Nick,
quoted
I've got a patch that makes alternate feature patching a bit
more flexible and not hit relocation limits when using big "else"
parts. I was thinking of doing something like

_GLOBAL_TOC(copy_page)
BEGIN_FTR_SECTION_NESTED(50)
#include "copypage_power9.S"
FTR_SECTION_ELSE_NESTED(50)
#include "copypage_power7.S"
ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_300, 50)  
Good idea, I hadn't thought of embedding it all in a feature section.
It may not work currently because you get those ftr_alt_97 relocation
errors with the "else" parts because relative branches to other code
need to be direct and I think reachable from both places.

quoted
I guess POWER asm doesn't need this but it's good practice to prevent
copy paste errors? It would be nice to have some macros to hide all
these constants, but that's for another patch. The commenting is good.  
The .machine X macros? Unfortunately the format of dcbt is different
for recent server chips. This wasn't a great idea in retrospect because
if you do get the instruction layout wrong, you wont get a fault to warn
you.
Is that embedded vs server, or pre-POWER4 vs POWER4 and up? Anyway no
big deal.
quoted
I don't suppose the stream setup is costly enough to consider
touching a cacheline or two ahead before starting it?  
Starting up software streams is a bit of an art - if the demand loads
get ahead then a hardware stream gets started before the software one.
Note all the eieios to try and avoid this happening.

I've struggled with software prefetch on previous chips and sometimes I
wonder if it is worth the pain.
Oh I see. Makes sense.
quoted
(Also for another day) We might be able to avoid the stack and call
for some common cases. Pretty small overcall cost I guess, but it
could be beneficial for memcpy if not copy_page.  
Definitely. Also the breakpoint for using vector should be much
lower if we have already saved the user state in a previous call.
Yes agreed.

Another problem is multiple small mem/string/crypto operations may
never trip the limit even if it would make sense. Difficult to improve
that (kernel could provide a hint to the arch maybe).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help