Thread (20 messages) 20 messages, 8 authors, 2024-01-30

Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again

From: Shakeel Butt <hidden>
Date: 2024-01-22 17:38:38
Also in: linux-mm, lkml

Hi Linus,

On Sun, Jan 21, 2024 at 9:16 PM Linus Torvalds
[off-list ref] wrote:
On Wed, 17 Jan 2024 at 14:56, Shakeel Butt [off-list ref] wrote:
quoted
quoted
So I don't see how we can make it really cheap (say, less than 5% overhead)
without caching pre-accounted objects.
Maybe this is what we want. Now we are down to just SLUB, maybe such
caching of pre-accounted objects can be in SLUB layer and we can
decide to keep this caching per-kmem-cache opt-in or always on.
So it turns out that we have another case of SLAB_ACCOUNT being quite
a big expense, and it's actually the normal - but failed - open() or
execve() case.

See the thread at

    https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/ (local)

and to see the effect in profiles, you can use this EXTREMELY stupid
test program:

    #include <fcntl.h>

    int main(int argc, char **argv)
    {
        for (int i = 0; i < 10000000; i++)
                open("nonexistent", O_RDONLY);
    }

where the point of course is that the "nonexistent" pathname doesn't
actually exist (so don't create a file called that for the test).

What happens is that open() allocates a 'struct file *' early from the
filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
for it, failt the pathname open, and free it again, which uncharges
the accounting.

Now, in this case, I actually have a suggestion: could we please just
make SLAB_ACCOUNT be something that we do *after* the allocation, kind
of the same way the zeroing works?

IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
slab_post_alloc_hook() do all the "charge the memcg if required".

Obviously that means that now a failure to charge the memcg would have
to then de-allocate things, but that's an uncommon path and would be
marked unlikely and not be in the hot path at all.

Now, the reason I would prefer that is that the *second* step would be to

 (a) expose a "kmem_cache_charge()" function that takes a
*non*-accounted slab allocation, and turns it into an accounted one
(and obviously this is why you want to do everything in the post-alloc
hook: just try to share this code)

 (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
allocations start out unaccounted.

 (c) when we have *actually* looked up the pathname and open the file
successfully, at *that* point we'd do a

        error = kmem_cache_charge(filp_cachep, file);

    in do_dentry_open() to turn the unaccounted file pointer into an
accounted one (and if that fails, we do the cleanup and free it, of
course, exactly like we do when file_get_write_access() fails)

which means that now the failure case doesn't unnecessarily charge the
allocation that never ends up being finalized.

NOTE! I think this would clean up mm/slub.c too, simply because it
would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
rid of the need to carry the "struct obj_cgroup **objcgp" pointer
along until the post-alloc hook: everything would be done post-alloc.

The actual kmem_cache_free() code already deals with "this slab hasn't
been accounted" because it obviously has to deal with allocations that
were done without __GFP_ACCOUNT anyway. So there's no change needed on
the freeing path, it already has to handle this all gracefully.

I may be missing something, but it would seem to have very little
downside, and fix a case that actually is visible right now.
Thanks for the idea. Actually I had a discussion with Vlastimil at LPC
'22 on a similar idea but for a different problem i.e. allocation in
irq context or more specifically charging sockets for incoming TCP
connections. If you see inet_csk_accept() we solved this for TCP
memory by charging at accept() syscall but the kernel memory holding
socket is still uncharged.

So, I think having the framework you suggested would help more
use-cases. I will take a stab at this in the next couple of days
(sorry stuck in some other stuff atm).

thanks,
Shakeel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help