Thread (64 messages) 64 messages, 12 authors, 2023-03-06

Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible

From: Mateusz Guzik <hidden>
Date: 2023-03-03 20:39:16
Also in: linux-fsdevel, lkml

On 3/3/23, Linus Torvalds [off-list ref] wrote:
On Fri, Mar 3, 2023 at 11:37 AM Mateusz Guzik [off-list ref] wrote:
quoted
I mentioned in the previous e-mail that memset is used a lot even
without the problematic opt and even have shown size distribution of
what's getting passed there.
Well, I *have* been pushing Intel to try to fix memcpy and memset for
over two decades by now, but with FSRM I haven't actually seen the
huge problems any more.
rep *stos* remains crap with FSRM, but I don't have sensible tests
handy nor the ice lake cpu i tested on at the moment
I actually used to have the reverse of your hack for this - I've had
various hacks over the year that made memcpy and memset be inlined
"rep movs/stos", which (along with inlined spinlocks) is a great way
to see the _culprit_ (without having to deal with the call chains -
which always get done the wrong way around imnsho).
that's all hackery which makes sense pre tooling like bpftrace, people
can do better now (see the second part of the email)

I think there is a systemic problem which comes with the kzalloc API, consider:
static struct file *__alloc_file(int flags, const struct cred *cred)
{
	struct file *f;
	int error;

	f = kmem_cache_zalloc(filp_cachep, GFP_KERNEL);
	if (unlikely(!f))
		return ERR_PTR(-ENOMEM);
        [bunch of the struct gets initialized here]

the allocation routine does not have any information about the size
available at compilation time, so has to resort to a memset call at
runtime. Instead, should this be:

f = kmem_cache_alloc(...);
memset(f, 0, sizeof(*f));

... the compiler could in principle inititalize stuff as indicated by
code and emit zerofill for the rest. Interestingly, last I checked
neither clang nor gcc knew how to do it, they instead resort to a full
sized memset anyway, which is quite a bummer.

Personally i grew up on dtrace, bpftrace I can barely use and don't
know how to specifically get the caller, but kstack(2) seems like a
good enough workaround.

as an example here is a one-liner to show crappers which do 0-sized ops:
bpftrace -e 'kprobe:memset,kprobe:memcpy /arg2 == 0/ { @[probe,
kstack(2)] = count(); }'

one can trace all kinds of degeneracy like that without recompiling
anything, provided funcs are visible to bpftrace

sample result from the above one-liner while doing 'make clean' in the
kernel dir:
@[kprobe:memcpy,
    memcpy+5
    realloc_array+78
]: 1
@[kprobe:memcpy,
    memcpy+5
    push_jmp_history+125
]: 1
@[kprobe:memset,
    memset+5
    blk_mq_dispatch_rq_list+687
]: 3
@[kprobe:memcpy,
    memcpy+5
    mix_interrupt_randomness+192
]: 4
@[kprobe:memcpy,
    memcpy+5
    d_alloc_pseudo+18
]: 59
@[kprobe:memcpy,
    memcpy+5
    add_device_randomness+111
]: 241
@[kprobe:memcpy,
    memcpy+5
    add_device_randomness+93
]: 527
@[kprobe:memset,
    memset+5
    copy_process+2904
]: 2054
@[kprobe:memset,
    memset+5
    dup_fd+283
]: 6162

-- 
Mateusz Guzik <mjguzik gmail.com>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help