Thread (24 messages) 24 messages, 3 authors, 2025-01-06

Re: [PATCH RESEND V2 net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type

From: Jijie Shao <shaojijie@huawei.com>
Date: 2025-01-06 14:41:53
Also in: lkml

on 2024/12/19 20:26, Jijie Shao wrote:
on 2024/12/19 18:43, Paolo Abeni wrote:
quoted
On 12/19/24 11:11, Michal Swiatkowski wrote:
quoted
On Thu, Dec 19, 2024 at 10:41:53AM +0100, Paolo Abeni wrote:
quoted
On 12/18/24 10:02, Michal Swiatkowski wrote:
quoted
On Tue, Dec 17, 2024 at 09:08:33AM +0800, Jijie Shao wrote:
quoted
From: Hao Lan <redacted>

When a reset type that is not supported by the driver is input, a 
reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this 
type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.

The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized

Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
    default_reset_request = 0x20,
    reset_level = HNAE3_NONE_RESET,
    reset_pending = 0x100,
    reset_type = HNAE3_NONE_RESET,
...
};

Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in 
the hnae3_ae_ops")
Signed-off-by: Hao Lan <redacted>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I haven't signed-off this patch.

Still no need to repost (yet) for this if the following points are
solved rapidly (as I may end-up merging the series and really 
adding my
SoB), but please avoid this kind of issue in the future.
quoted
quoted
@@ -4227,7 +4240,7 @@ static bool hclge_reset_err_handle(struct 
hclge_dev *hdev)
          return false;
      } else if (hdev->rst_stats.reset_fail_cnt < 
MAX_RESET_FAIL_CNT) {
          hdev->rst_stats.reset_fail_cnt++;
-        set_bit(hdev->reset_type, &hdev->reset_pending);
+        hclge_set_reset_pending(hdev, hdev->reset_type);
Sth is unclear for me here. Doesn't HNAE3_NONE_RESET mean that 
there is
no reset? If yes, why in this case reset_fail_cnt++ is increasing?

Maybe the check for NONE_RESET should be done in this else if 
check to
prevent reset_fail_cnt from increasing (and also solve the problem 
with
pending bit set)
@Michal: I don't understand your comment above. 
hclge_reset_err_handle()
handles attempted reset failures. I don't see it triggered when
reset_type == HNAE3_NONE_RESET.
Maybe I missed sth. The hclge_set_reset_pending() is added to check if
reset type isn't HNAE3_NONE_RESET. If it is the set_bit isn't 
called. It
is the only place where hclge_set_reset_pending() is called with a
variable, so I assumed the fix is for this place.

This means that code can be reach here with HNAE3_NONE_RESET which is
unclear for me why to increment resets if rest_type in NONE. If it is
true that hclge_reset_err_handle() is never called with reset_type
HNAE3_NONE_RESET it shouldn't be needed to have the
hclge_set_reset_pending() function.
You are right, I felt off-track.

@Jijie: how can 'reset_type' be set to an unsupported value?!? I don't
see that in the code, short of a memory corruption on uninit problem.
Are you sure you are not papering over a different issue here? At least
some more info (either in the commit description or in a code comment)
is IMHO needed. Otherwise you should probably catch that before
hclge_reset_err_handle() time.
Thanks for reviewing the code.

In fact, we used hclge_set_reset_pendin() to check in entire reset path,
not just hclge_reset_err_handle().

But seems like overkill, I'll try to simplify the modification.
Hi:

I decided not to change it.

We use hclge_set_def_reset_request() to ensure that unsupported reset types can be processed
and hclge_set_reset_pending() to ensure that HNAE3_NONE_RESET can be cleared.

Although it looks like hclge_set_reset_pending() cleans up HNAE3_NONE_RESET in hclge_reset_err_handle(),
In VF, HNAE3_NONE_RESET is checked and cleared in advance..

Thank you.
Thank you.

Jiejie Shao
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help