Re: [RFC PATCH 00/13] Introduce first class virtual address spaces
From: Andy Lutomirski <luto@amacapital.net>
Date: 2017-03-15 16:51:31
Also in:
linux-fsdevel, linux-mips, linux-mm
On Tue, Mar 14, 2017 at 9:12 AM, Till Smejkal [off-list ref] wrote:
On Mon, 13 Mar 2017, Andy Lutomirski wrote:quoted
On Mon, Mar 13, 2017 at 7:07 PM, Till Smejkal [off-list ref] wrote:quoted
On Mon, 13 Mar 2017, Andy Lutomirski wrote:quoted
This sounds rather complicated. Getting TLB flushing right seems tricky. Why not just map the same thing into multiple mms?This is exactly what happens at the end. The memory region that is des=
cribed by the
quoted
quoted
VAS segment will be mapped in the ASes that use the segment.So why is this kernel feature better than just doing MAP_SHARED manually in userspace?One advantage of VAS segments is that they can be globally queried by use=
r programs
which means that VAS segments can be shared by applications that not nece=
ssarily have
to be related. If I am not mistaken, MAP_SHARED of pure in memory data wi=
ll only work
if the tasks that share the memory region are related (aka. have a common=
parent that
initialized the shared mapping). Otherwise, the shared mapping have to be=
backed by a
file.
What's wrong with memfd_create()?
VAS segments on the other side allow sharing of pure in memory data by arbitrary related tasks without the need of a file. This becomes especial=
ly
interesting if one combines VAS segments with non-volatile memory since o=
ne can keep
data structures in the NVM and still be able to share them between multip=
le tasks. What's wrong with regular mmap?
quoted
quoted
quoted
Ick. Please don't do this. Can we please keep an mm as just an mm and not make it look magically different depending on which process maps it? If you need a trampoline (which you do, of course), just write a trampoline in regular user code and map it manually.Did I understand you correctly that you are proposing that the switchi=
ng thread
quoted
quoted
should make sure by itself that its code, stack, =E2=80=A6 memory regi=
ons are properly setup
quoted
quoted
in the new AS before/after switching into it? I think, this would make=
using first
quoted
quoted
class virtual address spaces much more difficult for user applications=
to the extend
quoted
quoted
that I am not even sure if they can be used at all. At the moment, swi=
tching into a
quoted
quoted
VAS is a very simple operation for an application because the kernel w=
ill just simply
quoted
quoted
do the right thing.Yes. I think that having the same mm_struct look different from different tasks is problematic. Getting it right in the arch code is going to be nasty. The heuristics of what to share are also tough -- why would text + data + stack or whatever you're doing be adequate? What if you're in a thread? What if two tasks have their stacks in the same place?The different ASes that a task now can have when it uses first class virt=
ual address
spaces are not realized in the kernel by using only one mm_struct per tas=
k that just
looks differently but by using multiple mm_structs - one for each AS that=
the task
can execute in. When a task attaches a first class virtual address space =
to itself to
be able to use another AS, the kernel adds a temporary mm_struct to this =
task that
contains the mappings of the first class virtual address space and the on=
e shared
with the task's original AS. If a thread now wants to switch into this at=
tached first
class virtual address space the kernel only changes the 'mm' and 'active_=
mm' pointers
in the task_struct of the thread to the temporary mm_struct and performs =
the
corresponding mm_switch operation. The original mm_struct of the thread w=
ill not be
changed. Accordingly, I do not magically make mm_structs look differently dependin=
g on the
task that uses it, but create temporary mm_structs that only contain mapp=
ings to the
same memory regions.
This sounds complicated and fragile. What happens if a heuristically shared region coincides with a region in the "first class address space" being selected? I think the right solution is "you're a user program playing virtual address games -- make sure you do it right". --Andy