Thread (26 messages) 26 messages, 7 authors, 2019-02-28

Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

From: Len Brown <lenb@kernel.org>
Date: 2019-02-28 15:59:40
Also in: lkml

On Tue, Feb 26, 2019 at 8:54 AM Peter Zijlstra [off-list ref] wrote:
quoted
quoted
It would've been nice to have the CPUID instruction 1F leaf reference
3B-3.9 in the SDM, and maybe mention this here too.
I didn't mention SDM sections because they change -- leaving stale
pointers in our commit messages.  The SDM is re-published 4 times per
year.
Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
8 of Jul'17, I can find it :-)
The SDM is like software -- usually (but not always) you are better
off with the latest version:-)
quoted
Cache enumeration in Leaf-4 is totally unchanged.
ACPI NUMA tables are totally unchanged.
Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
trying to get a feel for how these levels will interact with all that.

Before that SNC stuff, caches had never spanned NODEs (and I still
think that is 'creative' at best).
Yeah, SNC is sort of a curve ball.  I guess it made enough stuff run better that
it is available as an option.  But it doesn't help everything, so it is disabled
by default...

I think from a scheduler point of view, sticking with the output of
CPUID.4 for the cache topology, and the ACPI tables for the
node topology/distances, is the right strategy.
quoted
From a scheduler point of view, imagine that a SKX system with 4 die
in 4 packages was mechanically re-designed so that those 4 die resided
in 2 double-sized packages.

They may have tweaked the links between the die, but logically it is
identical and compatible, and the legacy kernel will function
properly.
This example has LLC in die and yes that works.

But I can imagine things like L2 in tile and L3 across tiles but within
DIE and then it _might_ make sense to still consider the tile for
scheduling.

Another option is having the LLC off die; also not unheard of.

And then there's many creative and slightly crazy ways this can all be
combined :/
If any of those crazy things happen,  CPUID.B/CPUID.1F are not
going to help software understand it -- CPUID.4 and the NUMA tables
are the tool of choice.
quoted
So the effect of Leaf B,1F is that it defines the scope of MSRs.  eg.
what processors does a die-scope MSR cover.  That is why the rest of
the patch is about sysfs topology, and about package MSR scope.

Yes, there will be more exotic MSR situations in future products --
the first ones are pretty simple -- something  called a
package-scope-MSR  in the SDM today becomes a die-scope-MSR in this
generation on a multi-die/package system.
Yes :-(
quoted
It also reflects how many packages appear in sysfs, and this can
effect licensing of some kinds of software.
That's just plain insanity and we should not let that affect our sysfs
interfaces.
This change isn't made for compatibility with per-package licensing.
Indeed, vendors, who license  based on package-count need to
be made aware that on a system with multi-die/package, they'll
see their package count go _down_ as a result of this change.
Thankfully, I'm told that per-package licensing is quite rare --
most stuff that cares has moved to per-CPU.

I think a good semantic side effect of this series is that it
maintains the invariant that a physical package and a socket are synonymous.
While we don't use the word "socket" in Linux anymore, the industry
broadly assume that the two are synonyms.  And people expect that a
physical package really is a physical package -- you can see it,
buy it in a box, and hold it in your hand.

Functionally, the bottom line is that it allows software to discover topology
levels that previously needed to be discovered by looking up family/model,
in the past, which was sort of annoying.  The things that care are
things that care about MSR scope.   Thankfully, the list of things that care
about MSR scope is quite finite.

thanks,
-Len




--
Len Brown, Intel Open Source Technology Center
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help