Thread (37 messages) 37 messages, 8 authors, 2018-11-27

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Jerome Glisse <hidden>
Date: 2018-11-19 21:33:32
Also in: linux-crypto, linux-doc, linux-rdma, lkml

On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote:
On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
quoted
On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
quoted
On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
quoted
quoted
?? How can O_DIRECT be fine but RDMA not? They use exactly the same
get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
be fine too?

AFAIK the only difference is the length of the race window. You'd have
to fork and fault during the shorter time O_DIRECT has get_user_pages
open.
Well in O_DIRECT case there is only one page table, the CPU
page table and it gets updated during fork() so there is an
ordering there and the race window is small.
Not really, in O_DIRECT case there is another 'page table', we just
call it a DMA scatter/gather list and it is sent directly to the block
device's DMA HW. The sgl plays exactly the same role as the various HW
page list data structures that underly RDMA MRs.

It is not a page table that matters here, it is if the DMA address of
the page is active for DMA on HW.

Like you say, the only difference is that the race is hopefully small
with O_DIRECT (though that is not really small, NVMeof for instance
has windows as large as connection timeouts, if you try hard enough)

So we probably can trigger this trouble with O_DIRECT and fork(), and
I would call it a bug :(
I can not think of any scenario that would be a bug with O_DIRECT.
Do you have one in mind ? When you fork() and do other syscall that
affect the memory of your process in another thread you should
expect non consistant results. Kernel is not here to provide a fully
safe environement to user, user can shoot itself in the foot and
that's fine as long as it only affect the process itself and no one
else. We should not be in the business of making everything baby
proof :)
Sure, I setup AIO with O_DIRECT and launch a read.

Then I fork and dirty the READ target memory using the CPU in the
child.

As you described in this case the fork will retain the physical page
that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.

The DMA completes, and the child gets the DMA'd to page. The parent
gets an unchanged copy'd page.

The parent gets the AIO completion, but can't see the data.

I'd call that a bug with O_DIRECT. The only correct outcome is that
the parent will always see the O_DIRECT data. Fork should not cause
the *parent* to malfunction. I agree the child cannot make any
prediction what memory it will see.

I assume the same flow is possible using threads and read()..

It is really no different than the RDMA bug with fork.
Yes and that's expected behavior :) If you fork() and have anything
still in flight at time of fork that can change your process address
space (including data in it) then all bets are of.

At least this is my reading of fork() syscall.

Cheers,
Jérôme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help