Thread (22 messages) 22 messages, 2 authors, 2021-11-02

Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device

From: Lyude Paul <lyude@redhat.com>
Date: 2021-08-31 22:48:20
Also in: dri-devel

(I am going to try responding to this tomorrow btw. I haven't been super busy
this week, but this has been a surprisingly difficult email to respond to
because I need to actually need to do a deep dive some of the MST helpers
tomorrow to figure out more of the specifics on why I realized we couldn't
just hot add/remove port->connector here).

On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
[Public]
quoted
-----Original Message-----
From: Lyude Paul <lyude@redhat.com>
Sent: Tuesday, August 24, 2021 5:18 AM
To: Lin, Wayne <redacted>; dri-devel@lists.freedesktop.org
Cc: Kazlauskas, Nicholas <redacted>; Wentland, Harry
[off-list ref]; Zuo, Jerry
[off-list ref]; Wu, Hersen [off-list ref]; Juston Li
[off-list ref]; Imre Deak [off-list ref];
Ville Syrjälä [off-list ref]; Daniel Vetter
[off-list ref]; Sean Paul [off-list ref]; Maarten Lankhorst
[off-list ref]; Maxime Ripard [off-list ref];
Thomas Zimmermann [off-list ref];
David Airlie [off-list ref]; Daniel Vetter [off-list ref]; Deucher,
Alexander [off-list ref]; Siqueira,
Rodrigo [off-list ref]; Pillai, Aurabindo
[off-list ref]; Bas Nieuwenhuizen
[off-list ref]; Cornij, Nikola [off-list ref]; Jani
Nikula [off-list ref]; Manasi Navare
[off-list ref]; Ankit Nautiyal [off-list ref];
José Roberto de Souza [off-list ref]; Sean
Paul [off-list ref]; Ben Skeggs [off-list ref];
stable@vger.kernel.org
Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
end device

[snip]

I think I might still be misunderstanding something, some comments below

On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
quoted
quoted
quoted
Hi Lyude,

Really thankful for willing to explain in such details. Really
appreciate.

I'm trying to fix some problems that observed after these 2
patches
* 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
callback
* 72dc0f51591 drm/dp_mst: Remove
drm_dp_mst_topology_cbs.destroy_connector

With above patches, we now change to remove dc_sink when connector
is about to be destroyed. However, we found out that connectors
won't get destroyed after hotplugs. Thus, after few times
hotplugs, we won't create any new dc_sink since number of sink is
exceeding our limitation. As the result of that, I'm trying to
figure out why the refcount of connectors won't get zero.

Based on my analysis, I found out that if we connect a sst monitor
to a mst hub then connect the hub to the system, and then unplug
the sst monitor from the hub. E.g.
src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor

Within this case, we won't try to put refcount of the sst monitor.
Which is what I tried to resolve by [PATCH 3/4].
But here comes a problem which is confusing me that if I can
destroy connector in this case. By comparing to another case, if
now mst hub is connected with a mst monitor like this:
src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor

We will put the topology refcount of mst monitor's branching unit
in and
drm_dp_port_set_pdt() and eventually call
drm_dp_delayed_destroy_port() to unregister the connector of the
logical port. So following the same rule, I think to dynamically
unregister a mst connector is what we want and should be
reasonable to also destroy sst connectors in my case. But this
conflicts the idea what we have here. We want to create connectors
for all output ports.
So if dynamically creating/destroying connectors is what we want,
when is the appropriate time for us to create one is what I'm
considering.

Take the StartTech hub DP 1to4 DP output ports for instance. This
hub, internally, is constructed by  3 1-to-2 mst branch chips. 2
output ports of 1st chip are hardwired to another 2 chips. It's
how it makes it to support 1-to-4 mst branching. So within this
case, the internal
2 output ports of 1st chip is not connecting to a stream sink and
will never get connected to one.  Thus, I'm thinking maybe the
best timing to attach a connector to a port is when the port is
connected, and the connected PDT is determined as a stream sink.

Sorry if I misunderstand anything here and really thanks for your
time to shed light on this : ) Thanks Lyude.
It's no problem, it is my job after all! Sorry for how long my
responses have been taking, but my plate seems to be finally
clearing up for the foreseeable future.

That being said - it sounds like with this we still aren't actually
clear on where the topology refcount leak is happening - only when
it's happening, which says to me that's the issue we really need to
be figuring out the cause of as opposed to trying to workaround it.

Actually - refcount leaks is an issue I've ran into a number of
times before in the past, so a while back I actually added some nice
debugging features to assist with debugging leaks. If you enable the
following options in your kernel config:

CONFIG_EXPERT=y # This must be set first before the next option
CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y

Unfortunately, I'm suddenly realizing after typing this that
apparently I never bothered adding a way for us to debug the
quoted
quoted
quoted
refcounts of ports/mstbs that haven't been released yet - only the
ones for ones that have. This shouldn't be difficult at all for me
to add, so I'll send you a patch either today or at the start of
next week to try debugging with using this, and then we can figure
out where this leak is really coming from.
Thanks Lyude!

Sorry to bother you, but I would like to clarify this again.  So it
sounds
It's no problem! It's my job and I'm happy to help :).
Thanks!
I would like to learn more from you as below : p
quoted
quoted
like you also agree that we should destroy associated connector
Not quite. I think a better way of explaining this might be to point out
that the lifetime of an MST port and its connector isn't supposed
to be determined by whether or not it has something plugged into it - its
lifetime is supposed to depend on whether there's a valid
path from us down the MST topology to the port we're trying to reach. So
an MSTB with ports that is unplugged would destroy all of
its ports - but an unplugged port should just be the same as a
disconnected DRM connector - even if the port itself is just hosting a
branching device.
This is the part a bit difficult to me. I treat DRM connector as the place
where we associate with a stream sink. So if the statement
is "All DP mst output ports are places we connect with stream sink", I would
say false to this since I can find the negative example when
output port is connected with mst branch device. Thus, looks like we could
only determine whether to create a connector for an output
port when the peer device type is known?
quoted
Additionally - we don't want to try "delaying" connector creation either.
In the modern world hotplugging is almost always reliable in
normal situations, but even so there's still use cases for wanting force
probing for analog devices on DP converters and just in general
as it's a feature commonly used by developers or users working around
monitors with problematic HPD issues or EDID issues.
I think I understand that why we want to create connectors for all output
ports here. But under these mentioned use cases, aren't we still
capable to force connector to enable stream? MST hub with muti-functon
capability, it will enumerate connected virtual DP peer device.
For problematic HPD issues or EDID issues, their connection status is also
connected.

My understanding of output port is it is an internal node to help construct
an end-to-end virtual channel between a stream source device
and a stream sink device. Creating connectors for internal nodes within a
virtual channel is a bit hard for me to get the idea. Please correct
me if I misunderstand anything here. Thanks Lyude!
quoted
quoted
when we unplug sst monitor from a mst hub in the case that I
described? In the case I described (unplug sst monitor), we only
receive CSN from the hub that notifying us the connection status of
one of its downstream output ports is changed to disconnected. There
is no topology refcount needed to be decreased on this disconnected
port but the malloc refcount. Since the output port is still declared
by
Apologies - I misunderstood your original mail as implying that topology
refcounts were being leaked - but it sounds like it's actually
malloc refcounts being leaked instead? In any case - that means we're
still tracing down a leak, just a malloc ref leak.

But, this still doesn't totally make sense to me. Malloc refs only keep
the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
memory.
Nothing else is kept around, meaning the DRM connector (and I assume by
proxy, the dc_sink) should both be getting dropped still
and the only thing that should be leaked is a memory allocation. These
things should instead be dropped once there's no longer any
topology references around. So, are we _sure_ that the problem here is a
missing
drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
Just my two cents, I don't think it's leak of malloc ref neither. As you
said, malloc ref is dealing with the last step to free port/mstb.
If there is still topology refcount on port/mstb in my case, we won't free
port/mstb.
quoted
If we are unfortunately we don't have equivalent tools for malloc()
tracing. I'm totally fine with trying to add some if we have trouble
figuring out this issue, but I'm a bit suspicious of the commits you
mentioned that introduced this problem. If the problem doesn't
happen until those two commits, then it's something in the code changes
there that are causing this problem.
I think we probably also have the problem before these commits, but we
didn't notice this before. Just when we change to clean up all
things in dm_dp_mst_connector_destroy(), I start to try to figure out all
these things out.
quoted
The main thing I'm suspicious of just from looking at changes in
09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
amdgpu_dm_update_freesync_caps() that was previously in
dm_dp_destroy_mst_connector() appears to be dropped and not re-added in
(oh dear, this is a /very/ confusingly similar function
Lol. I also have hard time on this..
quoted
name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was
intentional on my part, but does adding a call back to
amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector() right
before the dc_link_remove_remote_sink() call fix
anything?

As well, I'm far less suspicious of this one but does re-adding this
hunk:

      aconnector->dc_sink = NULL;
      aconnector->dc_link->cur_link_settings.lane_count = 0;

After dc_sink_release() fix anything either?
So the main problem is we don't have chance to call
dc_link_remove_remote_sink() in the unplugging SST case. We only have chance
to
remove the remote sink of a link when unplug a mstb.
quoted
quoted
the mst hub,  I think we shouldn't destroy the port. Actually, no
ports nor mst branch devices should get destroyed in this case I think.
The result of LINK_ADDRESS is still the same before/after removing the
sst monitor except the DisplayPort_Device_Plug_Status/
Legacy_Device_Plug_Status.

Hence, if you agree that we should put refcount of the connector of
the disconnected port within the unplugging sst monitor case to
release the allocated resource, it means we don't want to create
connectors for those disconnected ports. Which conflicts current flow
to create connectors for all declared output ports.

Thanks again for your time Lyude!
--
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
--
Regards,
Wayne
-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help