Re: [PATCH net-next v3 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully
From: Paolo Abeni <pabeni@redhat.com>
Date: 2025-09-25 15:42:26
Also in:
linux-doc, linux-rdma, linux-s390, lkml
On 9/25/25 5:05 PM, Halil Pasic wrote:
On Thu, 25 Sep 2025 11:40:40 +0200 Paolo Abeni [off-list ref] wrote:quoted
quoted
+ do { + rc = smc_ib_create_queue_pair(lnk); + if (rc) + goto dealloc_pd; + rc = smc_wr_alloc_link_mem(lnk); + if (!rc) + break; + else if (rc != -ENOMEM) /* give up */ + goto destroy_qp; + /* retry with smaller ... */ + lnk->max_send_wr /= 2; + lnk->max_recv_wr /= 2; + /* ... unless droping below old SMC_WR_BUF_SIZE */ + if (lnk->max_send_wr < 16 || lnk->max_recv_wr < 48) + goto destroy_qp;If i.e. smc.sysctl_smcr_max_recv_wr == 2048, and smc.sysctl_smcr_max_send_wr == 16, the above loop can give-up a little too early - after the first failure. What about changing the termination condition to: lnk->max_send_wr < 16 && lnk->max_recv_wr < 48 and use 2 as a lower bound for both lnk->max_send_wr and lnk->max_recv_wr?My intention was to preserve the ratio (max_recv_wr/max_send_wr) because I assume that the optimal ratio is workload dependent, and that scaling both down at the same rate is easy to understand. And also to never dip below the old values to avoid regressions due to even less WR buffers than before the change. I get your point, but as long as the ratio is kept I think the problem, if considered a problem is there to stay. For example for smc.sysctl_smcr_max_recv_wr == 2048 and smc.sysctl_smcr_max_send_wr == 2 we would still give up after the first failure even with 2 as a lower bound. Let me also state that in my opinion giving up isn't that bad, because SMC-R is supposed to be an optimization, and we still have the TCP fallback. If we end up much worse than TCP because of back-off going overboard, that is probably worse than just giving up on SMC-R and going with TCP. On the other hand, making the ratio change would make things more complicated, less predictable, and also possibly take more iterations. For example smc.sysctl_smcr_max_recv_wr == 2048 and smc.sysctl_smcr_max_send_wr == 2000. So I would prefer sticking to the current logic.
Ok, makes sense to me. Please capture some of the above either in the commit message or in a code comment. Thanks, Paolo