Re: Huge mapping secondary process linux
From: Jonas Pfefferle1 <hidden>
Date: 2017-11-09 09:54:39
"Chao Zhu" [off-list ref] wrote on 11/09/2017 04:08:36 AM:
From: "Chao Zhu" <redacted> To: "'Jonas Pfefferle1'" <redacted> Cc: "'Burakov, Anatoly'" <redacted>, [off-list ref], [off-list ref] Date: 11/09/2017 04:08 AM Subject: RE: [dpdk-dev] Huge mapping secondary process linux From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] Sent: 2017年11月7日 18:16 To: Chao Zhu <redacted> Cc: 'Burakov, Anatoly' <redacted>; bruce.richardson@intel.com; dev@dpdk.org Subject: RE: [dpdk-dev] Huge mapping secondary process linux "Chao Zhu" [off-list ref] wrote on 11/07/2017 09:25:26 AM:quoted
From: "Chao Zhu" <redacted> To: "'Jonas Pfefferle1'" <redacted>, "'Burakov, Anatoly'" [off-list ref] Cc: <redacted>, <redacted> Date: 11/07/2017 11:00 AM Subject: RE: [dpdk-dev] Huge mapping secondary process linux From: Jonas Pfefferle1 [mailto:JPF@zurich.ibm.com] Sent: 2017年10月28日 3:23 To: Burakov, Anatoly <redacted> Cc: bruce.richardson@intel.com; chaozhu@linux.vnet.ibm.com;
dev@dpdk.org
quoted
Subject: Re: [dpdk-dev] Huge mapping secondary process linux "Burakov, Anatoly" [off-list ref] wrote on
27/10/201718:00:27:
quoted
quoted
From: "Burakov, Anatoly" <redacted> To: Jonas Pfefferle1 <redacted> Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,
dev@dpdk.org
quoted
quoted
Date: 27/10/2017 18:00 Subject: Re: [dpdk-dev] Huge mapping secondary process linux On 27-Oct-17 4:16 PM, Jonas Pfefferle1 wrote:quoted
"dev" [off-list ref] wrote on 10/27/2017 04:58:01 PM: > From: "Jonas Pfefferle1" [off-list ref] > To: "Burakov, Anatoly" [off-list ref] > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com,dev@dpdk.orgquoted
quoted
> Date: 10/27/2017 04:58 PM > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > Sent by: "dev" [off-list ref] > > > "Burakov, Anatoly" [off-list ref] wrote on
10/27/2017
quoted
quoted
quoted
04:44:52 > PM: > > > From: "Burakov, Anatoly" [off-list ref] > > To: Jonas Pfefferle1 [off-list ref] > > Cc: bruce.richardson@intel.com, chaozhu@linux.vnet.ibm.com, dev@dpdk.org > > Date: 10/27/2017 04:45 PM > > Subject: Re: [dpdk-dev] Huge mapping secondary process linux > > > > On 27-Oct-17 3:28 PM, Jonas Pfefferle1 wrote: > > > "Burakov, Anatoly" [off-list ref] wrote
on10/27/2017
quoted
quoted
quoted
> > > 04:06:44 PM: > > > > > > Â > From: "Burakov, Anatoly" [off-list ref] > > > Â > To: Jonas Pfefferle1 [off-list ref], dev@dpdk.org > > > Â > Cc: chaozhu@linux.vnet.ibm.com,
bruce.richardson@intel.com
quoted
quoted
quoted
> > > Â > Date: 10/27/2017 04:06 PM > > > Â > Subject: Re: [dpdk-dev] Huge mapping secondary process
linux
quoted
quoted
quoted
> > > Â > > > > Â > On 27-Oct-17 1:43 PM, Jonas Pfefferle1 wrote: > > > Â > > > > > Â > > > > > Â > > Hi @all, > > > Â > > > > > Â > > I'm trying to make sense of the hugepage memory
mappings in
quoted
quoted
quoted
> > > Â > > librte_eal/linuxapp/eal/eal_memory.c: > > > Â > > * In rte_eal_hugepage_attach (line 1347) when wetry to do aquoted
quoted
quoted
> private > > > Â > > mapping on /dev/zero (line 1393) why do we notuse MAP_FIXEDquoted
quoted
quoted
if we > > > > need the > > > Â > > addresses to be identical with the primary process? > > > Â > > * On POWER we have this weird business going onwhere we usequoted
quoted
quoted
> > > MAP_HUGETLB > > > Â > > because according to this commit: > > > Â > > > > > Â > > commit 284ae3e9ff9a92575c28c858efd2c85c8de6d440 > > > Â > > Author: Chao Zhu [off-list ref] > > > Â > > Date: Â Thu Apr 6 15:36:09 2017 +0530 > > > Â > > > > > Â > > Â Â Â eal/ppc: fix mmap for memory initialization > > > Â > > > > > Â > > Â Â Â On IBM POWER platform, when mapping /dev/zero file toquoted
quoted
quoted
> hugepage > > > memory > > > Â > > Â Â Â space, mmap will not respect the requested
address
quoted
quoted
quoted
hint.This > will > > > Â > > cause > > > Â > > Â Â Â the memory initialization for the secondprocess fails.quoted
This > > > patch adds > > > Â > > Â Â Â the required mmap flags to make it work.Beside this, usersquoted
> > > need to set > > > Â > > Â Â Â the nr_overcommit_hugepages to expand the VArange. Whenquoted
> > > Â > > Â Â Â doing the initialization, users need to set
both
quoted
quoted
quoted
nr_hugepages > and > > > Â > > Â Â Â nr_overcommit_hugepages to the samevalue, like 64,quoted
quoted
quoted
128, etc. > > > Â > > > > > Â > > mmap address hints are not respected. Looking at the
mmap
quoted
quoted
quoted
code in > the > > > Â > > kernel this is not true entirely however under some circumstances > > > the hint > > > Â > > can be ignored ( > > > Â > > https://urldefense.proofpoint.com/v2/url? > > > Â > > > > > > >
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_powerpc_mm_mmap.c-23L103&d=DwICaQ&c=jf_iaSHvJObTbx-
quoted
quoted
quoted
> > > > Â > siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN- > > > Â > pXjigIjRW0&m=cttQcHlAYixhsYS3lz- > > > Â > >
BAdEeg4dpbwGdPnj2R3I8Do0&s=Gp0TIjUtIed05Jgb7XnlocpCYZdFXZXiH0LqIWiNMhA&e=
quoted
quoted
quoted
> > > Â > > ). However I believe we can remove the extra caseforPPC if wequoted
quoted
> use > > > Â > > MAP_FIXED when doing the secondary processmappingsbecause wequoted
quoted
quoted
> need > > > them to > > > Â > > be identical anyway. We could also use MAP_FIXEDwhendoing thequoted
quoted
> primary > > > Â > > process mappings resp. get_virtual_area if we wantto have anyquoted
quoted
> > > guarantees > > > Â > > when specifying a base address. Any thoughts? > > > Â > > > > > Â > > Thanks, > > > Â > > Jonas > > > Â > > > > > Â > hi Jonas, > > > Â > > > > Â > MAP_FIXED is not used because it's dangerous, itunmaps anythingquoted
quoted
> that is > > > Â > already mapped into that space. We would rather knowthat we can'tquoted
> map > > > Â > something than unwittingly unmap something that wasmapped before.quoted
> > > > > > Ok, I see. Maybe we can add a check to the primaryprocess's memoryquoted
quoted
quoted
> > > mappings whether the hint has been respected or not? Atleast warn ifquoted
quoted
> it > > > hasn't. > > > > Hi Jonas, > > > > I'm unfamiliar with POWER platform, so i'm afraid you'dhave to explainquoted
quoted
> > a bit more what you mean by "hint has been respected" :) > > Hi Anatoly, > > What I meant was the mmap address hint: > > "If addr is not NULL, then the kernel takes it as a hint > Â about where to place the mapping; on Linux, the mapping will
be
quoted
quoted
quoted
> Â created at a nearby page boundary." > > This is actually not true on POWER. It can happen that the
address
quoted
quoted
quoted
hint is > ignored and you get any address back that fits your mapping. > > Thanks, > Jonas Actually looking through the kernel code this is also notguaranteed on x86.quoted
u=http-3A__elixir.free-2Delectrons.com_linux_latest_source_arch_x86_kernel_sys-5Fx86-5F64.c-23L165&d=DwID-
quoted
quoted
g&c=jf_iaSHvJObTbx-siA1ZOg&r=rOdXhRsgn8Iur7bDE0vgwvo6TC8OpoDN-
pXjigIjRW0&m=iqakzG7nSXLfvDHyS9IV5E9DWPnNcv19zcsl3MKMdvI&s=VqzZpcTaCUMmNieZ3WyUw-
quoted
quoted
jsnNP-hAcW487Mumv6xPw&e=)quoted
So in any case the address hint can be ignored by the kerneland you getquoted
quoted
quoted
any address that fits your mapping. My suggestion is to check when we do the initial mapping in get_virtual_area if the hint was respected or not, i.e. if
thereturned
quoted
quoted
quoted
address == PAGE_ALIGN(address_hint).I'm not sure i see the issue here. So, just to make sure i understand
quoted
quoted
things correctly: Whenever we don't request a specific base address through
base_address
quoted
quoted
EAL parameter, none of this matters - we always ask for memory in arbitrary memory locations, correct? It's also not an issue with secondary processes because we do check returned mmap address to see whether it's the same as werequested, correct?quoted
quoted
It's only whenever we *do* specify a base_address, we provide an
address
quoted
quoted
hint to mmap to, but we don't check if the address we got from mmap
is
quoted
quoted
one in the vicinity of our requested base address, correct? We don't check, and the kernel can ignore address hint, so we're not
guaranteed
quoted
quoted
to respect the base_address flag. I'm not sure this is a serious issue, because as far as i'm
concerned,
quoted
quoted
this flag is advisory - we only promise to *attempt* to map things at
quoted
quoted
that particular address, not that it will succeed. If the kernel
simply
quoted
quoted
cannot find an address to satisfy our address hint, or ignores it for
quoted
quoted
other reasons - well, tough, nothing we can do about that. I'm not
sure
quoted
quoted
putting a check like this, where we can't even predict an "expected" address is a good idea. Am i getting this right?The problem is when we specify a base address we want it to beused. If it isquoted
not respected we basically end up with the case like we would have never specified it. This very likely leads to not being able to run a secondary process
because
quoted
we will not be able to map the addresses from our primary process and that is why we introduced the base address parameter in the first place.quoted
-- Thanks, AnatolyThe reason why I put the patch there is that when mapping hugepage on POWER, the kernel will never respect the address hints when doing mmap unless we expand the address space or unmap all the hugepages. This is a big difference when compared with x86. And it affects the mapping of the secondary process. I agree that the hints is advisory. Just want to see if there are better solutions.This is not true. I looked through the kernel code and the address hint is treated almost the same on both platforms: PPC: https://elixir.free-electrons.com/linux/latest/source/arch/ powerpc/mm/mmap.c#L143 Line 169/170 x86: https://elixir.free-electrons.com/linux/latest/source/arch/x86/ kernel/sys_x86_64.c#L165 Line 189/190 The only thing that might differ is the virtual address layout (e.g. due to different page size etc) and that might lead to the same value for base-virtaddr not working on both x86 and POWER. However I tested with different address hints and you easily can find addresses where the address hint is indeed respected. That is also why I send in a patch to remove the HUGETLB flags on the mmap. Thanks, Jonas You can take a look at this. https://bugzilla.linux.ibm.com/ show_bug.cgi?id=141628 It’s quite interesting.
Interesting indeed. I misunderstood the problem I thought the get_virtual_area mmap adress hint is not respected when the real problem is the address hint when mapping the hugepages. Still I hope we can find a better solution. Aside from that I still believe warning on the address hint being respected or not is a good idea.