Thread (22 messages) 22 messages, 7 authors, 2021-10-13

Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory

From: Felix Kuehling <felix.kuehling@amd.com>
Date: 2021-10-12 23:04:45
Also in: amd-gfx, dri-devel, linux-ext4, linux-xfs

Am 2021-10-12 um 3:03 p.m. schrieb Andrew Morton:
On Tue, 12 Oct 2021 15:56:29 -0300 Jason Gunthorpe [off-list ref] wrote:
quoted
quoted
To what other uses will this infrastructure be put?

Because I must ask: if this feature is for one single computer which
presumably has a custom kernel, why add it to mainline Linux?
Well, it certainly isn't just "one single computer". Overall I know of
about, hmm, ~10 *datacenters* worth of installations that are using
similar technology underpinnings.

"Frontier" is the code name for a specific installation but as the
technology is proven out there will be many copies made of that same
approach.

The previous program "Summit" was done with NVIDIA GPUs and PowerPC
CPUs and also included a very similar capability. I think this is a
good sign that this coherently attached accelerator will continue to
be a theme in computing going foward. IIRC this was done using out of
tree kernel patches and NUMA localities.

Specifically with CXL now being standardized and on a path to ubiquity
I think we will see an explosion in deployments of coherently attached
accelerator memory. This is the high end trickling down to wider
usage.

I strongly think many CXL accelerators are going to want to manage
their on-accelerator memory in this way as it makes universal sense to
want to carefully manage memory access locality to optimize for
performance.
Thanks.  Can we please get something like the above into the [0/n]
changelog?  Along with any other high-level info which is relevant?

It's rather important.  "why should I review this", "why should we
merge this", etc.
Using Jason's input, I suggest adding this text for the next revision of
the cover letter:

DEVICE_PRIVATE memory emulates coherence between CPU and the device by
migrating data back and forth. An application that accesses the same
page (or huge page) from CPU and device concurrently can cause many
migrations, each involving device cache flushes, page table updates and
page faults on the CPU or device.

In contrast, DEVICE_COHERENT enables truly concurrent CPU and device
access to to ZONE_DEVICE pages by taking advantage of HW coherence
protocols.

As a historical reference point, the Summit supercomputer implemented
such a coherent memory architecture with NVidia GPUs and PowerPC CPUs.

The initial user for the DEVICE_COHERENT memory type will be the AMD GPU
driver on the Frontier supercomputer. CXL standardizes a coherent
peripheral interconnect, leading to more mainstream systems and devices
with that capability.

Best regards,
  Felix


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help