Re: [PATCH RESEND V2 net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type
From: Jijie Shao <shaojijie@huawei.com>
Date: 2025-01-06 14:41:53
Also in:
lkml
on 2024/12/19 20:26, Jijie Shao wrote:
on 2024/12/19 18:43, Paolo Abeni wrote:quoted
On 12/19/24 11:11, Michal Swiatkowski wrote:quoted
On Thu, Dec 19, 2024 at 10:41:53AM +0100, Paolo Abeni wrote:quoted
On 12/18/24 10:02, Michal Swiatkowski wrote:quoted
On Tue, Dec 17, 2024 at 09:08:33AM +0800, Jijie Shao wrote:quoted
From: Hao Lan <redacted> When a reset type that is not supported by the driver is input, a reset pending flag bit of the HNAE3_NONE_RESET type is generated in reset_pending. The driver does not have a mechanism to clear this type of error. As a result, the driver considers that the reset is not complete. This patch provides a mechanism to clear the HNAE3_NONE_RESET flag and the parameter of hnae3_ae_ops.set_default_reset_request is verified. The error message: hns3 0000:39:01.0: cmd failed -16 hns3 0000:39:01.0: hclge device re-init failed, VF is disabled! hns3 0000:39:01.0: failed to reset VF stack hns3 0000:39:01.0: failed to reset VF(4) hns3 0000:39:01.0: prepare reset(2) wait done hns3 0000:39:01.0 eth4: already uninitialized Use the crash tool to view struct hclgevf_dev: struct hclgevf_dev { ... default_reset_request = 0x20, reset_level = HNAE3_NONE_RESET, reset_pending = 0x100, reset_type = HNAE3_NONE_RESET, ... }; Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops") Signed-off-by: Hao Lan <redacted> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>I haven't signed-off this patch. Still no need to repost (yet) for this if the following points are solved rapidly (as I may end-up merging the series and really adding my SoB), but please avoid this kind of issue in the future.quoted
quoted
@@ -4227,7 +4240,7 @@ static bool hclge_reset_err_handle(structhclge_dev *hdev) return false; } else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) { hdev->rst_stats.reset_fail_cnt++; - set_bit(hdev->reset_type, &hdev->reset_pending); + hclge_set_reset_pending(hdev, hdev->reset_type);Sth is unclear for me here. Doesn't HNAE3_NONE_RESET mean that there is no reset? If yes, why in this case reset_fail_cnt++ is increasing? Maybe the check for NONE_RESET should be done in this else if check to prevent reset_fail_cnt from increasing (and also solve the problem with pending bit set)@Michal: I don't understand your comment above. hclge_reset_err_handle() handles attempted reset failures. I don't see it triggered when reset_type == HNAE3_NONE_RESET.Maybe I missed sth. The hclge_set_reset_pending() is added to check if reset type isn't HNAE3_NONE_RESET. If it is the set_bit isn't called. It is the only place where hclge_set_reset_pending() is called with a variable, so I assumed the fix is for this place. This means that code can be reach here with HNAE3_NONE_RESET which is unclear for me why to increment resets if rest_type in NONE. If it is true that hclge_reset_err_handle() is never called with reset_type HNAE3_NONE_RESET it shouldn't be needed to have the hclge_set_reset_pending() function.You are right, I felt off-track. @Jijie: how can 'reset_type' be set to an unsupported value?!? I don't see that in the code, short of a memory corruption on uninit problem. Are you sure you are not papering over a different issue here? At least some more info (either in the commit description or in a code comment) is IMHO needed. Otherwise you should probably catch that before hclge_reset_err_handle() time.Thanks for reviewing the code. In fact, we used hclge_set_reset_pendin() to check in entire reset path, not just hclge_reset_err_handle(). But seems like overkill, I'll try to simplify the modification.
Hi: I decided not to change it. We use hclge_set_def_reset_request() to ensure that unsupported reset types can be processed and hclge_set_reset_pending() to ensure that HNAE3_NONE_RESET can be cleared. Although it looks like hclge_set_reset_pending() cleans up HNAE3_NONE_RESET in hclge_reset_err_handle(), In VF, HNAE3_NONE_RESET is checked and cleared in advance.. Thank you.
Thank you. Jiejie Shao