Thread (39 messages) 39 messages, 7 authors, 2025-02-20

Re: [PATCH net-next v3 0/6] Device memory TCP TX

From: Samiullah Khawaja <hidden>
Date: 2025-02-05 00:47:21
Also in: kvm, linux-doc, linux-kselftest, lkml, virtualization

On Tue, Feb 4, 2025 at 11:43 AM Stanislav Fomichev [off-list ref] wrote:
On 02/04, Mina Almasry wrote:
quoted
On Tue, Feb 4, 2025 at 10:06 AM Stanislav Fomichev [off-list ref] wrote:
quoted
On 02/04, Mina Almasry wrote:
quoted
On Tue, Feb 4, 2025 at 4:32 AM Paolo Abeni [off-list ref] wrote:
quoted
On 2/3/25 11:39 PM, Mina Almasry wrote:
quoted
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].

Sending this series as RFC as the winder closure is immenient. I plan on
reposting as non-RFC once the tree re-opens, addressing any feedback
I receive in the meantime.
I guess you should drop this paragraph.
quoted
Full outline on usage of the TX path is detailed in the documentation
added in the first patch.

Test example is available via the kselftest included in the series as well.

The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
It looks like no additional device level support is required. That is
IMHO so good up to suspicious level :)
It is correct no additional device level support is required. I don't
have any local changes to my driver to make this work. I think Stan
on-list was able to run the TX path (he commented on fixes to the test
but didn't say it doesn't work :D) and one other person was able to
run it offlist.
For BRCM I had shared this: https://lore.kernel.org/netdev/ZxAfWHk3aRWl-F31@mini-arch/ (local)
I have similar internal patch for mlx5 (will share after RX part gets
in). I agree that it seems like gve_unmap_packet needs some work to be more
careful to not unmap NIOVs (if you were testing against gve).
Hmm. I think you're right. We ran into a similar issue with the RX
path. The RX path worked 'fine' on initial merge, but it was passing
dmabuf dma-addrs to the dma-mapping API which Jason later called out
to be unsafe. The dma-mapping API calls with dmabuf dma-addrs will
boil down into no-ops for a lot of setups I think which is why I'm not
running into any issues in testing, but upon closer look, I think yes,
we need to make sure the driver doesn't end up passing these niov
dma-addrs to functions like dma_unmap_*() and dma_sync_*().

Stan, do you run into issues (crashes/warnings/bugs) in your setup
when the driver tries to unmap niovs? Or did you implement these
changes purely for safety?
I don't run into any issues with those unmaps in place, but I'm running x86
with iommu bypass (and as you mention in the other thread, those
calls are no-ops in this case).
The dma_addr from dma-buf should never enter dma_* APIs. dma-bufs
exporters have their own implementation of these ops and they could be
no-op for identity mappings or when iommu is disabled (in a VM? with
no IOMMU enabled GPA=IOVA). so if we really want to map/unmap/sync
these addresses the dma-buf APIs should be used to do that. Maybe some
glue with a memory provider is required for these net_iovs? I think
the safest option with these is that mappings are never unmapped
manually by driver until the dma_buf_unmap_attachment is called during
unbinding? But maybe that complicates things for io_uring?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help