Re: [PATCH v4 net-next 06/21] nvme-tcp: Add DDP offload control path
From: Boris Pismenny <hidden>
Date: 2021-02-21 11:29:58
Also in:
linux-nvme
On 17/02/2021 15:55, Or Gerlitz wrote:
On Sun, Feb 14, 2021 at 8:20 PM David Ahern [off-list ref] wrote:quoted
On 2/11/21 2:10 PM, Boris Pismenny wrote:quoted
@@ -223,6 +229,164 @@ static inline size_t nvme_tcp_pdu_last_send(struct nvme_tcp_request *req, return nvme_tcp_pdu_data_left(req) <= len; } +#ifdef CONFIG_TCP_DDP + +static bool nvme_tcp_resync_request(struct sock *sk, u32 seq, u32 flags); +static const struct tcp_ddp_ulp_ops nvme_tcp_ddp_ulp_ops = { + .resync_request = nvme_tcp_resync_request, +}; + +static int nvme_tcp_offload_socket(struct nvme_tcp_queue *queue) +{ + struct net_device *netdev = queue->ctrl->offloading_netdev; + struct nvme_tcp_ddp_config config = {}; + int ret; + + if (!(netdev->features & NETIF_F_HW_TCP_DDP))If nvme_tcp_offload_limits does not find a dst_entry on the socket then offloading_netdev may not NULL at this point.correct :( will look on that
That's only partially true. If nvme_tcp_offload_limits finds a dst_entry, but then the netdevice goes down, then the check here will catch it. This is needed because nvme_tcp_offload_limits doesn't hold a reference! We opted not to grab a reference on nvme_tcp_offload_limits because it doesn't create a context.
quoted
quoted
+ queue->ctrl->offloading_netdev = NULL; + return -ENODEV; + } + + if (netdev->features & NETIF_F_HW_TCP_DDP && + netdev->tcp_ddp_ops && + netdev->tcp_ddp_ops->tcp_ddp_limits) + ret = netdev->tcp_ddp_ops->tcp_ddp_limits(netdev, &limits); + else + ret = -EOPNOTSUPP; + + if (!ret) { + queue->ctrl->offloading_netdev = netdev;you save a reference to the netdev here, but then release the refcnt below. That device could be deleted between this point in time and the initialization of all queues.
That's true, and this is why we repeat the checks there. We avoid holding the reference here because there is no obvious complementary release point for nvme_tcp_offload_limits and there is no hardware context created here, so there is no real need to hold it at this stage.