Re: [PATCH v2 8/8] memcg: accounting for ldt_struct objects
From: Michal Hocko <hidden>
Date: 2021-03-15 16:32:42
Also in:
linux-mm
On Mon 15-03-21 08:48:26, Shakeel Butt wrote:
On Mon, Mar 15, 2021 at 6:27 AM Borislav Petkov [off-list ref] wrote:quoted
On Mon, Mar 15, 2021 at 03:24:01PM +0300, Vasily Averin wrote:quoted
Unprivileged user inside memcg-limited container can create non-accounted multi-page per-thread kernel objects for LDTI have hard time parsing this commit message. And I'm CCed only on patch 8 of what looks like a patchset. And that patchset is not on lkml so I can't find the rest to read about it, perhaps linux-mm. /me goes and finds it on lore I can see some bits and pieces, this, for example: https://lore.kernel.org/linux-mm/05c448c7-d992-8d80-b423-b80bf5446d7c-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org/ ( Btw, that version has your SOB and this patch doesn't even have a Signed-off-by. Next time, run your whole set through checkpatch please before sending. ) Now, this URL above talks about OOM, ok, that gets me close to the "why" this patch. From a quick look at the ldt.c code, we allow a single LDT struct per mm. Manpage says so too: DESCRIPTION modify_ldt() reads or writes the local descriptor table (LDT) for a process. The LDT is an array of segment descriptors that can be referenced by user code. Linux allows processes to configure a per-process (actually per-mm) LDT. We allow /* Maximum number of LDT entries supported. */ #define LDT_ENTRIES 8192 so there's an upper limit per mm. Now, please explain what is this accounting for?Let me try to provide the reasoning at least from my perspective. There are legitimate workloads with hundreds of processes and there can be hundreds of workloads running on large machines. The unaccounted memory can cause isolation issues between the workloads particularly on highly utilized machines.
It would be better to be explicit 8192 * 8 = 64kB * number_of_tasks so realistically this is in range of lower megabytes. Is this worth the memcg accounting overhead? Maybe yes but what kind of workloads really care? -- Michal Hocko SUSE Labs