Thread (33 messages) 33 messages, 5 authors, 2025-03-02

Re: [PATCH 0/5] Microwatt updates

From: Gabriel Paubert <hidden>
Date: 2025-03-02 10:35:51

[Sorry, I wanted to reply earlier, but it stayed in my drafts folder for a month]

On Sat, Feb 01, 2025 at 12:22:51PM +1100, Paul Mackerras wrote:
[snipped]
603 was a looong time ago, I don't recall the details.

Regarding broadcast TLBIEs, the protocols and mechanisms for doing
that are known to be complex and slow in the IBM Power processors (ask
Derek Williams about that :).  Anton found that in fact doing only
local TLBIEs and using IPIs gave *better* performance on IBM Power
systems than using hardware broadcast TLBIEs in many cases (the reason
being that software knows which other CPUs might have a given TLB
entry, often quite a small set, whereas hardware doesn't, and has to
send the invalidation to every CPU and wait for a response from every
CPU).  Add to that, that most other SMP-capable CPU architectures
don't do broadcast TLB invalidations, Intel x86 for example.
Actually it's coming to x86, at least on the AMD side:

https://lore.kernel.org/all/20250206044346.3810242-1-riel@surriel.com/ (local)

with performance numbers which look rather good.

I don't know how it looks like at the level of the hardware protocol,
but implementing it on a single chip/socket is likely relatively simple.

Gabriel
quoted
quoted
the kernel already has code to deal with this.  One of the patches in
this series provides a config option to allow platforms to select
unconditionally the behaviour where cross-CPU TLB invalidations are
handled using inter-processor interrupts.
Are there plans to broadcast the (SMP cache invalidation) messages?
Cache (i.e. instruction and data cache) - yes, they *are* coherent.
More precisely, the D caches are write-through, and all I and D caches
snoop writes to memory (including DMA writes) and invalidate any cache
lines being written to.
quoted
Will uwatt support some real bus protocol, for example?
"Real" meaning using tri-state bus drivers, like we did in the 90s? :)
quoted
Again, congrats on this great milestone!  Does this floating point
support do square roots as well (aka "gpopt"; does it do "gfxopt" for
that matter, fsel?)  fsqrt is kinda tricky to get to work fully
correctly :-)
Yes, fsqrt and fsel are implemented in hardware, and are accurate to
the last bit.  Also, the FPU handles denormalized values in hardware
(both input and output) and implements all exception handling as per
the ISA, including the trap-enabled overflow cases.  Feel free to run
whatever tests you like and report bugs.  But we're getting a bit
off-topic from the kernel patches. :)

Paul.
 

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help