Thread (44 messages) 44 messages, 8 authors, 2018-11-27

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

From: Jason Gunthorpe <jgg@ziepe.ca>
Date: 2018-11-19 18:53:39
Also in: linux-crypto, linux-rdma, lkml, netdev

On Mon, Nov 19, 2018 at 01:42:16PM -0500, Jerome Glisse wrote:
On Mon, Nov 19, 2018 at 11:27:52AM -0700, Jason Gunthorpe wrote:
quoted
On Mon, Nov 19, 2018 at 11:48:54AM -0500, Jerome Glisse wrote:
quoted
Just to comment on this, any infiniband driver which use umem and do
not have ODP (here ODP for me means listening to mmu notifier so all
infiniband driver except mlx5) will be affected by same issue AFAICT.

AFAICT there is no special thing happening after fork() inside any of
those driver. So if parent create a umem mr before fork() and program
hardware with it then after fork() the parent might start using new
page for the umem range while the old memory is use by the child. The
reverse is also true (parent using old memory and child new memory)
bottom line you can not predict which memory the child or the parent
will use for the range after fork().

So no matter what you consider the child or the parent, what the hw
will use for the mr is unlikely to match what the CPU use for the
same virtual address. In other word:

Before fork:
    CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
    HARDWARE:   virtual addr ptr1 -> physical address = 0xCAFE

Case 1:
    CPU parent: virtual addr ptr1 -> physical address = 0xCAFE
    CPU child:  virtual addr ptr1 -> physical address = 0xDEAD
    HARDWARE:   virtual addr ptr1 -> physical address = 0xCAFE

Case 2:
    CPU parent: virtual addr ptr1 -> physical address = 0xBEEF
    CPU child:  virtual addr ptr1 -> physical address = 0xCAFE
    HARDWARE:   virtual addr ptr1 -> physical address = 0xCAFE
IIRC this is solved in IB by automatically calling
madvise(MADV_DONTFORK) before creating the MR.

MADV_DONTFORK
  .. This is useful to prevent copy-on-write semantics from changing the
  physical location of a page if the parent writes to it after a
  fork(2) ..
This would work around the issue but this is not transparent ie
range marked with DONTFORK no longer behave as expected from the
application point of view.
Do you know what the difference is? The man page really gives no
hint..

Does it sometimes unmap the pages during fork?

I actually wonder if the kernel is a bit broken here, we have the same
problem with O_DIRECT and other stuff, right?

Really, if I have a get_user_pages FOLL_WRITE on a page and we fork,
then shouldn't the COW immediately be broken during the fork?

The kernel can't guarentee that an ongoing DMA will not write to those
pages, and it breaks the fork semantic to write to both processes.
Also it relies on userspace doing the right thing (which is not
something i usualy trust :)).
Well, if they do it wrong they get to keep all the broken bits :)

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help