Re: [PATCH 0/5] Microwatt updates
From: Gabriel Paubert <hidden>
Date: 2025-03-02 10:35:51
[Sorry, I wanted to reply earlier, but it stayed in my drafts folder for a month] On Sat, Feb 01, 2025 at 12:22:51PM +1100, Paul Mackerras wrote: [snipped]
603 was a looong time ago, I don't recall the details. Regarding broadcast TLBIEs, the protocols and mechanisms for doing that are known to be complex and slow in the IBM Power processors (ask Derek Williams about that :). Anton found that in fact doing only local TLBIEs and using IPIs gave *better* performance on IBM Power systems than using hardware broadcast TLBIEs in many cases (the reason being that software knows which other CPUs might have a given TLB entry, often quite a small set, whereas hardware doesn't, and has to send the invalidation to every CPU and wait for a response from every CPU). Add to that, that most other SMP-capable CPU architectures don't do broadcast TLB invalidations, Intel x86 for example.
Actually it's coming to x86, at least on the AMD side: https://lore.kernel.org/all/20250206044346.3810242-1-riel@surriel.com/ (local) with performance numbers which look rather good. I don't know how it looks like at the level of the hardware protocol, but implementing it on a single chip/socket is likely relatively simple. Gabriel
quoted
quoted
the kernel already has code to deal with this. One of the patches in this series provides a config option to allow platforms to select unconditionally the behaviour where cross-CPU TLB invalidations are handled using inter-processor interrupts.Are there plans to broadcast the (SMP cache invalidation) messages?Cache (i.e. instruction and data cache) - yes, they *are* coherent. More precisely, the D caches are write-through, and all I and D caches snoop writes to memory (including DMA writes) and invalidate any cache lines being written to.quoted
Will uwatt support some real bus protocol, for example?"Real" meaning using tri-state bus drivers, like we did in the 90s? :)quoted
Again, congrats on this great milestone! Does this floating point support do square roots as well (aka "gpopt"; does it do "gfxopt" for that matter, fsel?) fsqrt is kinda tricky to get to work fully correctly :-)Yes, fsqrt and fsel are implemented in hardware, and are accurate to the last bit. Also, the FPU handles denormalized values in hardware (both input and output) and implements all exception handling as per the ISA, including the trap-enabled overflow cases. Feel free to run whatever tests you like and report bugs. But we're getting a bit off-topic from the kernel patches. :) Paul.