Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
From: Yury Norov <yury.norov@gmail.com>
Date: 2023-03-04 05:51:48
Also in:
linux-fsdevel, lkml
On Fri, Mar 03, 2023 at 07:42:36PM -0800, Linus Torvalds wrote:
ing: quoted-printable Status: O Content-Length: 1593 Lines: 41 On Fri, Mar 3, 2023 at 7:25 PM Yury Norov [off-list ref] wrote:quoted
Did you enable CONFIG_FORCE_NR_CPUS? If you pick it, the kernel will bind nr_cpu_ids to NR_CPUS at compile time, and the memset() call should disappear.I do not believe CONFIG_FORCE_NR_CPUS makes any sense, and I think I told you so at the time.
At that time you was OK with CONFIG_FORCE_NR_CPUS, only suggested to hide it behind CONFIG_EXPERT: https://lore.kernel.org/all/Yzx4fSmmr8bh6gdl@yury-laptop/T/#m92d405527636154c3b2000e0105379170d988315 (local)
This all used to just work *without* some kind of config thing, First removing the automatic "do the right thing", and then adding a config option to "force" doing the right thing seems more than a bit silly to me. I think CONFIG_FORCE_NR_CPUS should go away, and - once more - become just the "is the cpumask small enough to be just allocated directly" thing.
This all was just broken. For example, as I mentioned in commit message,
cpumask_full() was broken. I know because I wrote a test. There were no
a single user for the function, and nobody complained. Now we have one
in BPF code. So if we simply revert the aa47a7c215e, it will hurt real
users.
The pre-CONFIG_FORCE_NR_CPUS cpumask machinery would work only if you
set NR_CPUS to the number that matches to the actual number of CPUs as
detected at boot time.
In your example, if you have NR_CPUS == 64, and for some reason disable
hyper threading, nr_cpumask_bits will be set to 64 at compile time, but
nr_cpu_ids will be set to 32 at boot time, assuming
CONFIG_CPUMASK_OFFSTACK is disabled.
And the following code will be broken:
cpumask_t m1, m2;
cpumask_setall(m1); // m1 is ffff ffff ffff ffff because it uses
// compile-time optimized nr_cpumask_bits
for_each_cpu(cpu, m1) // 32 iterations because it relied on nr_cpu_ids
cpumask_set_cpu(cpu, m2); // m2 is ffff ffff XXXX XXXX
BUG_ON(!cpumask_equal(m1, m2)); // Bug because it will test all 64 bits
Today with CONFIG_FORCE_NR_CPUS disabled, kernel consistently relies
on boot-time defined nr_cpu_ids in functions like cpumask_equal()
with the cost of disabled runtime optimizations.
If CONFIG_FORCE_NR_CPUS is enabled, it wires nr_cpu_ids to NR_CPUS
at compile time, which allows compile-time optimization.
If CONFIG_FORCE_NR_CPUS is enabled, but actual number of CPUs doesn't
match to NR_CPUS, the kernel throws a warning at boot time - better
than nothing.
I'm not happy bothering people with a new config parameter in such a
simple case. I just don't know how to fix it better. Is there a safe
way to teach compiler to optimize against NR_CPUS other than telling
it explicitly?
Of course, the problem for others remain that distros will do that CONFIG_CPUMASK_OFFSTACK thing, and then things will suck regardless. I was *so* happy with our clever "you can have large cpumasks, and we'll just allocate them off the stack" long long ago, because it meant that we could have one single source tree where this was all cleanly abstracted away, and we even had nice types and type safety for it all. That meant that we could support all the fancy SGI machines with several thousand cores, and it all "JustWorked(tm)", and didn't make the normal case any worse. I didn't expect distros to then go "ooh, we want that too", and enable it all by default, and make all our clever "you only see this indirection if you need it" go away, and now the normal case is the *bad* case, unless you just build your own kernel and pick sane defaults. Oh well.
From distro people's perspective, 'one size fits all' is the best approach. It's hard to blame them. Thanks, Yury