Re: [RFC PATCH 1/1] mm/slub: fix endless "No data" printing for alloc/free_traces attribute
From: Vlastimil Babka <hidden>
Date: 2021-11-22 15:02:33
Also in:
linux-s390, lkml
On 11/19/21 20:59, Gerald Schaefer wrote:
On Fri, 19 Nov 2021 11:41:38 +0100 Vlastimil Babka [off-list ref] wrote:quoted
On 11/17/21 20:39, Gerald Schaefer wrote:quoted
Reading from alloc/free_traces attribute in /sys/kernel/debug/slab/ results in an endless sequence of "No data". This is because slab_debugfs_start() does not check for a "past end of file" condition and return NULL.I still have no idea how that endless sequence happens. To get it, we would have to call slab_debugfs_show() repeatedly with such v that *v == 0. Which should only happen with slab_debugfs_start() with *ppos == 0. Which your patch won't change because you add a '*ppos > t->count' condition, so *ppos has to be at least 1 to trigger this.Yes, very strange. After a closer look to fs/seq_file.c, especially seq_read_iter(), it seems that op->next will only be called when m->count == 0, at least in the first while(1) loop. Printing "No data\n" sets m->count to 8, so it will continue after Fill:, then call op->next, which returns NULL and breaks the second while(1) loop, and also calls op->stop. Then it returns from seq_read_iter(), only to be called again, and again, ... Only when op->start returns NULL it will end it for good, probably because seq_read_iter() will then return 0 instead of 8.
Ah, thanks for investigating.
Not sure if there is a better way to fix this than by adding a second "return NULL" to op->start, which feels a bit awkward and makes you wonder why the "return NULL" from op->next is not enough.
I think it's fine to require op->start to return NULL, even if it didn't cause this infinite loop.
quoted
But yeah, AFAIK we should detect this in slab_debugfs_start() anyway. But I think the condition should be something like below, because we are past end of file already with *ppos == t->count. But if both are 0, we want to proceed for the "No data" output.Ah ok, I wasn't sure about the "t->count > 0" case, i.e. if the check for "*ppos > t->count" would still be correct there. So apparently it wouldn't, and we need two checks, like you suggestedquoted
// to show the No data if (!*ppos && !t->count) return ppos; if (*ppos >= t->count) return ppos;That should be return NULL here, right?
Doh, right.
quoted
return ppos;Will send a new patch, unless I find a better way after investigating the endless seq_read_iter() calls mentioned above. Is there an easy way to test the "t->count > 0" case, i.e. what would need to be done to get some other reply than "No data"?
Hm the debugfs files alloc_tracess/free_traces for any cache with non-zero objects (see /proc/slabinfo for that) should have t->count > 0. If the files are created for a cache, it means the related SLAB_STORE_USER debugging was enabled both during config and boot-time. If you see only a few caches with alloc_tracess/free_traces (because they are from e.g. some test module that adds SLAB_STORE_USER explicitly) and all happen to have 0 objects, boot with slub_debug=U parameter and then all caches will have this enabled and many will have >0 objects.