Thread (22 messages) 22 messages, 2 authors, 2021-11-02

Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

From: Lyude Paul <lyude@redhat.com>
Date: 2021-08-23 21:18:23
Also in: dri-devel

[snip]

I think I might still be misunderstanding something, some comments below

On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
quoted
quoted
Hi Lyude,

Really thankful for willing to explain in such details. Really
appreciate.

I'm trying to fix some problems that observed after these 2 patches
* 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
callback
* 72dc0f51591 drm/dp_mst: Remove
drm_dp_mst_topology_cbs.destroy_connector

With above patches, we now change to remove dc_sink when connector is
about to be destroyed. However, we found out that connectors won't get
destroyed after hotplugs. Thus, after few times hotplugs, we won't
create any new dc_sink since number of sink is exceeding our
limitation. As the result of that, I'm trying to figure out why the
refcount of connectors won't get zero.

Based on my analysis, I found out that if we connect a sst monitor to
a mst hub then connect the hub to the system, and then unplug the sst
monitor from the hub. E.g.
src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor

Within this case, we won't try to put refcount of the sst monitor.
Which is what I tried to resolve by [PATCH 3/4].
But here comes a problem which is confusing me that if I can destroy
connector in this case. By comparing to another case, if now mst hub
is connected with a mst monitor like this:
src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor

We will put the topology refcount of mst monitor's branching unit in
and
drm_dp_port_set_pdt() and eventually call
drm_dp_delayed_destroy_port() to unregister the connector of the
logical port. So following the same rule, I think to dynamically
unregister a mst connector is what we want and should be reasonable to
also destroy sst connectors in my case. But this conflicts the idea
what we have here. We want to create connectors for all output ports.
So if dynamically creating/destroying connectors is what we want, when
is the appropriate time for us to create one is what I'm considering.

Take the StartTech hub DP 1to4 DP output ports for instance. This hub,
internally, is constructed by  3 1-to-2 mst branch chips. 2 output
ports of 1st chip are hardwired to another 2 chips. It's how it makes
it to support 1-to-4 mst branching. So within this case, the internal
2 output ports of 1st chip is not connecting to a stream sink and will
never get connected to one.  Thus, I'm thinking maybe the best timing
to attach a connector to a port is when the port is connected, and the
connected PDT is determined as a stream sink.

Sorry if I misunderstand anything here and really thanks for your time
to shed light on this : ) Thanks Lyude.
It's no problem, it is my job after all! Sorry for how long my responses
have been taking, but my plate seems to be finally clearing up
for the foreseeable future.

That being said - it sounds like with this we still aren't actually clear
on where the topology refcount leak is happening - only when it's
happening, which says to me that's the issue we really need to be figuring
out the cause of as opposed to trying to workaround it.

Actually - refcount leaks is an issue I've ran into a number of times
before in the past, so a while back I actually added some nice
debugging features to assist with debugging leaks. If you enable the
following options in your kernel config:

CONFIG_EXPERT=y # This must be set first before the next option
CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y

Unfortunately, I'm suddenly realizing after typing this that apparently I
never bothered adding a way for us to debug the refcounts of
ports/mstbs that haven't been released yet - only the ones for ones that
have. This shouldn't be difficult at all for me to add, so I'll
send you a patch either today or at the start of next week to try
debugging with using this, and then we can figure out where this leak
is really coming from.
Thanks Lyude!

Sorry to bother you, but I would like to clarify this again.  So it sounds
It's no problem! It's my job and I'm happy to help :).
like you also agree that we should destroy associated connector
Not quite. I think a better way of explaining this might be to point out
that the lifetime of an MST port and its connector isn't supposed to be
determined by whether or not it has something plugged into it - its
lifetime is supposed to depend on whether there's a valid path from us
down the MST topology to the port we're trying to reach. So an MSTB with
ports that is unplugged would destroy all of its ports - but an
unplugged port should just be the same as a disconnected DRM connector -
even if the port itself is just hosting a branching device.

Additionally - we don't want to try "delaying" connector creation
either. In the modern world hotplugging is almost always reliable in
normal situations, but even so there's still use cases for wanting force
probing for analog devices on DP converters and just in general as it's
a feature commonly used by developers or users working around monitors
with problematic HPD issues or EDID issues.
when we unplug sst monitor from a mst hub in the case that I described? In
the case I described (unplug sst monitor), we only receive
CSN from the hub that notifying us the connection status of one of its
downstream output ports is changed to disconnected. There is no
topology refcount needed to be decreased on this disconnected port but the
malloc refcount. Since the output port is still declared by
Apologies - I misunderstood your original mail as implying that topology
refcounts were being leaked - but it sounds like it's actually malloc
refcounts being leaked instead? In any case - that means we're still
tracing down a leak, just a malloc ref leak.

But, this still doesn't totally make sense to me. Malloc refs only keep
the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in memory.
Nothing else is kept around, meaning the DRM connector (and I assume by
proxy, the dc_sink) should both be getting dropped still and the only
thing that should be leaked is a memory allocation. These things should
instead be dropped once there's no longer any topology references
around. So, are we _sure_ that the problem here is a missing
drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?

If we are unfortunately we don't have equivalent tools for malloc()
tracing. I'm totally fine with trying to add some if we have trouble
figuring out this issue, but I'm a bit suspicious of the commits you
mentioned that introduced this problem. If the problem doesn't
happen until those two commits, then it's something in the code changes
there that are causing this problem.

The main thing I'm suspicious of just from looking at changes in
09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
amdgpu_dm_update_freesync_caps() that was previously in
dm_dp_destroy_mst_connector() appears to be dropped and not re-added in
(oh dear, this is a /very/ confusingly similar function name!!!)
dm_dp_mst_connector_destroy(). I don't remember if this was intentional
on my part, but does adding a call back to
amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector()
right before the dc_link_remove_remote_sink() call fix anything?

As well, I'm far less suspicious of this one but does re-adding this
hunk:

	aconnector->dc_sink = NULL;
	aconnector->dc_link->cur_link_settings.lane_count = 0;

After dc_sink_release() fix anything either?
the mst hub,  I think we shouldn't destroy the port. Actually, no ports nor
mst branch devices should get destroyed in this case I think.
The result of LINK_ADDRESS is still the same before/after removing the sst
monitor except the
DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.

Hence, if you agree that we should put refcount of the connector of the
disconnected port within the unplugging sst monitor case to
release the allocated resource, it means we don't want to create connectors
for those disconnected ports. Which conflicts current flow
to create connectors for all declared output ports.

Thanks again for your time Lyude!
-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help