Thread (67 messages) 67 messages, 8 authors, 2026-03-20

Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder

From: Alejandro Lucero Palau <hidden>
Date: 2026-03-16 14:34:07
Also in: netdev

On 3/13/26 13:10, Alejandro Lucero Palau wrote:
On 3/13/26 02:03, Dan Williams wrote:
quoted
PJ Waskiewicz wrote:
[..]
quoted
quoted
Yes, I think you are right. This works in my tests and it is safe
because I check the region does exist before using it. But the error
inside sfc should then not be fatal for cxl sfc initialization and
fallback to the other cxl initialization possibility.
So I'm running into this situation I think.

When you're testing, are you surviving a reload of the driver?  Right
now, I can load and successfully create the region0 device. However,
following the same teardown path in SFC, I cannot reload my driver
afterwards and map the region.  I get:

cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
<driver> 0000:c1:00.0: CXL found committed decoder without a region
<driver> 0000:c1:00.0: CXL init failed

I'd be surprised if SFC in its current patch state would survive this
same insmod/rmmod/insmod test.
So over here [1] I reviewed Smita's patch to stop resetting decoders by
default if they were part of region auto-assembly. While that stops
resetting the decoders it does not allow the device to get a hint of
where it should place its HPAs if the decoders get reset while the
driver is detached.

That is already what type2 support is about, and what was from the 
beginning: to get an hpa from the root decoder. The HPA will be found 
when the driver loads and the memdev is created and when the related 
region is going to need such HPA, and based on what is free there. 
Before v22 that was the only case contemplated, assuming the BIOS 
would not configure the device decoders. v22 added support for getting 
the region from autodiscovery if the decoders were committed, and v23 
was for not resetting those decoders if that was the case when the 
driver unloads.


I'm pretty sure what Type2 pre-v22, v22 or v23 do in this regard is 
not perfect  (v23 was a quick hack for PJ to test the new 
functionality you demanded), in fact I'm changing the way hpa is 
allocated for Type2 because after Gregory's concurrency tests and pmem 
patchset, I really think the approach needs to change. But as I said 
in Smita's review, you are precluding the basic stuff with your 
never-ending "improvements". You are not in a better position than me 
to have an opinion of what Type2 drivers need, and your comment is 
this thread is just a lack of respect to me. Yes, it is a blunt 
assertion, and I will repeat it as many times as necessary.
After looking at the series proposing DVSEC save/restore for supporting 
device resets, I think I misunderstood your comment here, and if so,  
you want to address such a reset and not the HDM reset triggered by 
software ...

quoted
I am going to draft some patches to allow an accelerator to mark an
address range as "designated" so that it can recall the memory it was
assigned by boot firmware.

If you do so, I will start seriously about passing this work to 
another engineer, not necessarily from AMD.
so this rant is missing the point, and I have to apologize. Once I have 
said that, I neither understand why you are proposing something that 
reset series will avoid or would set the path to support the case you 
have in mind, nor why are you mention it in this thread. Is it because 
supporting that reset is a requirement for type2 support? I have been 
aware of having to deal with this but not as a priority or part of the 
basic support. If that is what you want, why did not you say so time ago?

quoted
This also dovetails with the conversation I had with Paul Blinzer at
Plumbers about an ability to designate Soft Reserve memory. So a generic
facility to designate memory allows accelerators to recall their address
range if the decoders ever lose their configuration. It also tells the
rest of the CXL subsystem "hands off, this range was accelerator
designated by platform firmware".

[1]: 
http://lore.kernel.org/69b1e0aacb9d0_2132100c5@dwillia2-mobl4.notmuch (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help