Re: [PATCH v2 1/1] net/mlx5: Added cond_resched() to crdump collection
From: Mohamed Khalfella <hidden>
Date: 2024-09-05 03:36:59
Also in:
linux-rdma, lkml
On 2024-09-03 14:14:58 +0200, Alexander Lobakin wrote:
From: Mohamed Khalfella <redacted> Date: Fri, 30 Aug 2024 11:01:19 -0700quoted
On 2024-08-30 15:07:45 +0200, Alexander Lobakin wrote:quoted
From: Mohamed Khalfella <redacted> Date: Thu, 29 Aug 2024 15:38:56 -0600quoted
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 6b774e0c2766..bc6c38a68702 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c@@ -269,6 +269,7 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, { unsigned int next_read_addr = 0; unsigned int read_addr = 0; + unsigned int count = 0; while (read_addr < length) { if (mlx5_vsc_gw_read_fast(dev, read_addr, &next_read_addr,@@ -276,6 +277,9 @@ int mlx5_vsc_gw_read_block_fast(struct mlx5_core_dev *dev, u32 *data, return read_addr; read_addr = next_read_addr; + /* Yield the cpu every 128 register read */ + if ((++count & 0x7f) == 0) + cond_resched();Why & 0x7f, could it be written more clearly? if (++count == 128) { cond_resched(); count = 0; } Also, I'd make this open-coded value a #define somewhere at the beginning of the file with a comment with a short explanation.This is still valid.
Done. See <1>.
quoted
What you are suggesting should work also. I copied the style from mlx5_vsc_wait_on_flag() to keep the code consistent. The comment above the line should make it clear.I just don't see a reason to make the code less readable.
<1> Now I am looking at mlx5_vsc_wait_on_flag() again, I realized the code does not want to reset retries to 0 because it needs to check when it reaches VSC_MAX_RETRIES. This is not the case here. I will update the code as suggested.
quoted
quoted
BTW, why 128? Not 64, not 256 etc? You just picked it, I don't see any explanation in the commitmsg or here in the code why exactly 128. Have you tried different values?This mostly subjective. For the numbers I saw in the lab, this will release the cpu after ~4.51ms. If crdump takes ~5s, the code should release the cpu after ~18.0ms. These numbers look reasonable to me.So just mention in the commit message that you tried different values and 128 gave you the best results.
I will update the commit message in v3.