Thread (13 messages) 13 messages, 4 authors, 2019-04-02

Re: [PATCH] eal: add option to not store segment fd's

From: Burakov, Anatoly <hidden>
Date: 2019-03-29 14:21:06

On 29-Mar-19 1:34 PM, Thomas Monjalon wrote:
29/03/2019 14:24, Burakov, Anatoly:
quoted
On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
quoted
29/03/2019 13:05, Burakov, Anatoly:
quoted
On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
quoted
29/03/2019 11:33, Burakov, Anatoly:
quoted
On 29-Mar-19 9:50 AM, David Marchand wrote:
quoted
On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
<anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:

       Due to internal glibc limitations [1], DPDK may exhaust internal
       file descriptor limits when using smaller page sizes, which results
       in inability to use system calls such as select() by user
       applications.

       While the problem can be worked around using --single-file-segments
       option, it does not work if --legacy-mem mode is also used. Add a
       (yet another) EAL flag to disable storing fd's internally. This
       will sacrifice compability with Virtio with vhost-backend, but
       at least select() and friends will work.

       [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html


Sorry, I am a bit lost and I never took the time to look in the new
memory allocation system.
This gives the impression that we are accumulating workarounds, between
legacy-mem, single-file-segments, now no-seg-fds.
Yep. I don't like this any more than you do, but i think there are users
of all of these, so we can't just drop them willy-nilly. My great hope
was that by now everyone would move on to use VFIO so legacy mem
wouldn't be needed (the only reason it exists is to provide
compatibility for use cases where lots of IOVA-contiguous memory is
required, and VFIO cannot be used), but apparently that is too much to
ask :/
quoted
Iiuc, everything revolves around the need for per page locks.
Can you summarize why we need them?
The short answer is multiprocess. We have to be able to map and unmap
pages individually, and for that we need to be sure that we can, in
fact, remove a page because no one else uses it. We also need to store
fd's because virtio with vhost-user backend needs them to work, because
it relies on sharing memory between processes using fd's.
It's a pity adding an option to workaround a limitation of a corner case.
It adds complexity that we will have to support forever,
and it's even not perfect because of vhost.

Might there be another solution?
If there is one, i'm all ears. I don't see any solutions aside from
adding limitations.

For example, we could drop the single/multi file segments mode and just
make single file segments a default and the only available mode, but
this has certain risks because older kernels do not support fallocate()
on hugetlbfs.

We could further draw a line in the sand, and say that, for example,
19.11 (or 20.11) will not have legacy mem mode, and everyone should use
VFIO by now and if you don't it's your own fault.

We could also cut down on the number of fd's we use in single-file
segments mode by not using locks and simply deleting pages in the
primary, but yanking out hugepages from under secondaries' feet makes me
feel uneasy, even if technically by the time that happens, they're not
supposed to be used anyway. This could mean that the patch is no longer
necessary because we don't use that many fd's any more.
This last option is interesting. Is it realistic?
I can do it in current release cycle, but i'm not sure if it's too late
to do such changes. I guess it's OK since the validation cycle is just
starting? I'll throw something together and see if it crashes and burns.
OK let's try that.
Bear in mind though that this will not work for legacy mem mode, because 
it cannot use single file segments mode without significant rework of 
page allocation code. So, legacy mem mode will still have this issue, 
unless we make it non-compatible with virtio with vhost-user backend.

-- 
Thanks,
Anatoly
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help